I've already covered the Caesar cipher (one of my favourite toy problems) here in ruby, take a look if you're curious!

The challenge

Given a message to encode and a shift value (the number of positions to shift each letter by), encode a message.

As an added challenge:

  • lower case and upper case must be preserved
  • special characters must be preserved

The strategies (first thoughts)

  1. Perform the calculation using ascii integer values
  2. Create a lookup dict and use that
  3. ???
  4. Some sort of pythonic iterator?

The integer calculation option #1 is not suitable as it just ends up being very slow compared to other methods, and is more prone to mistakes in your code. We'll be skipping this approach, but you can see my Ruby blog post linked above if you're curious.

Creating lookup dicts

Essentially, what we need is a mapping of original -> encoded char. To do this, we can get two alphabet strings, rotate one, and then map them together.

def create_lookup(shift=1):
    'Creates a mapping of original -> caesar-encoded char'
    lookup = []
    for alpha in (string.ascii_lowercase, string.ascii_uppercase):
        lookup += zip(alpha, alpha[shift:] + alpha[0:shift])
    return dict(lookup)

Let's see how that turns out:

In [3]: create_lookup(5)
Out[3]:
{'A': 'F',
 'B': 'G',
 'C': 'H',
 'D': 'I',
 ...

Interface

Our method interface will be encode(msg, lookup), a method that is passed both a message to encode, and a lookup dict to use.

This means that we discount the time that it takes to create those lookup dicts, and examine the compute time for the encoding only.

The regular approach

With our lookup dict ready, let's create a method to iterate through the message and encode each character.

def encode_regular(msg, lookup):
    result = []
    for char in msg:
        try:
            result.append(lookup[char])
        except KeyError:
            result.append(char)
    return ''.join(result)

Here, KeyError is extremely useful as it allows us to preserve special characters.

Using map

map is a built-in Python function that applies another given function to each element in a collection.

For example:

In [1]: list(map(str, [1, 2]))
Out[1]: ['1', '2']

Each item in the list will be passed to the str method, (eg. str(1)), which produces a list of strings. The map method returns us a map object which we then need to turn into something useable.

def encode_map(msg, lookup):
    return ''.join(list(
        map(lambda char: encode_char(char, lookup), msg)
    ))

Using str.translate()

There's got to be a better way! I cast my mind back to solving this in ruby, where I discovered that tr was, by far, the fastest way to solve the problem.

The real thing that irks me with the previous solutions is all the type conversions: string -> map -> list -> string etc. There must be a way to operate on the string only, string in, string out!

Enter: str.translate()

First, you must create a translation table for the method. We can just use the same string slicing method we used initially.

The maketrans method takes in two strings, one which is the original, the other which is the new characters.

e.g. ('abc', 'xyz') would translate 'a' -> 'x', 'b' -> 'y', etc.

def create_trans(shift=1):
    return str.maketrans(
        string.ascii_lowercase,
        string.ascii_lowercase[shift:] + string.ascii_lowercase[0:shift]
    )

After we've defined our translation, we can now use it in spectacularly terse fashion:

def encode_trans(msg, trans):
    return msg.translate(trans)

Comparisons

Now, lets load everything up (in ipython) and run it!

Loading up the code:

Just paste this into your ipython session:

'Benchmark caesar cipher solutions'

import string

def create_lookup(shift=1):
    'Creates a mapping of original -> caesar-encoded char'
    lookup = []
    for alpha in (string.ascii_lowercase, string.ascii_uppercase):
        lookup += zip(alpha, alpha[shift:] + alpha[0:shift])
    return dict(lookup)

def create_trans(shift=1):
    return str.maketrans(
        string.ascii_lowercase,
        string.ascii_lowercase[shift:] + string.ascii_lowercase[0:shift]
    )

def encode_regular(msg, lookup):
    result = []
    for char in msg:
        try:
            result.append(lookup[char])
        except KeyError:
            result.append(char)
    return ''.join(result)

def encode_map(msg, lookup):
    'Encode a message using map & lambda'
    return ''.join(list(map(lambda char: encode_char(char, lookup), msg)))
    
def encode_char(char, lookup):
    try:
        return lookup[char]
    except KeyError:
        return char
        
def encode_trans(msg, trans):
    return msg.translate(trans)

Comparing lookup creation methods

In [2]: %timeit create_lookup(5)
10.2 µs ± 831 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [3]: %timeit create_trans(5)
5.65 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Cool, so our create_trans method is around twice as fast, but that doesn't mean much at this stage as this only needs to be run once per encoded string (or once per encoding object, if you were so inclined).

Comparing the encode methods

On to the main event!

In [4]: lookup = create_lookup(5)
   ...: trans = create_trans(5)
   ...: msg = 'hello world! Hello; World; 1234321 1234321'
   ...:

In [5]: %timeit encode_regular(msg, lookup)
16.9 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [6]: %timeit encode_map(msg, lookup)
22.7 µs ± 3.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit encode_trans(msg, trans)
1.63 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Debrief

Holy Moley! Our string translation method is once again the fastest, ~14 times faster than the regular method (and much more consistent).

There are a few possible reasons why the map method is slower:

  • There is additional overhead in calling the encode_char method. The encode_regular method works fastest when the lookup logic is inside the same method.
  • There is additional overhead in creating the lambda function.

Hopefully this inspires some lateral thinking! Until next time.