July 5, 2017

# The Caesar Cipher

The caesar cipher has always delighted me, and it's one of the first problems that I solve whenever I learn a new language.

## "Why the Caesar?"

A fair enough question. I like the caesar because it satisfies these conditions:

• Use of loops & string building
• Character/string operations
• It's not too hard!
• It has a few solutions

I'm going to go through three solutions that I've made for this in Ruby: one that's done the 'regular' way, another using hashes, and a final hail mary using the mighty `tr`

### 1. The regular solution

The regular solution, regardless of the language being used, is usually as follows

``````1. Get the string, and a shift value
2. Iterate through the string
2.a. For a basic solution, normalise the case to upper or lower
3. Use the rotation value along with the modulus operator (and the start/end ascii values for the alphabet) to create the encoded character
4. Put results into string & return
``````

Note: This solution normalises all character cases.

``````module Caesar
module_function

FIRST_CHAR_INT = "a".ord.freeze # "a" has ascii value of 97
ALPHABET_CHARS = [*"a".."z"]
ALPHABET_SIZE = ALPHABET_CHARS.size

def transform(str, num)
str.downcase.each_char.map do |char|
!letter?(char) ? char : shift(char, num)
end.join
end

# I've split the calls in this method up for clarity
def shift(char, num)
# Subtract 97 so the maths is easier
start = char.ord - FIRST_CHAR_INT
# Add the shift value, then wrap around with %
new_int_value = (start + num) % ALPHABET_SIZE
# Then add the 97 back on, and we're done!
encoded_int_value =  new_int_value + FIRST_CHAR_INT
tmp.chr
end

def letter?(char)
ALPHABET_CHARS.include?(char)
end
end
``````

I've illustrated the logic with some comments, this solution doesn't need any more attention as it's the least interesting one.

### 2. The hash solution

After having implemented this cipher a number of times, I began to wonder if it could be made better.

• The logic is a little unweildy.
• What if we needed to preserve case?
• What if we were encoding really long strings?
• What if the shift value remained the same, but we needed to encode a series of messages?
``````class CaesarByMap

# I've done a bit of trickery here to make these short, but you
# could just individually define all the characters in an array.
LOWER_ALPHABET = [*"a".."z"]
UPPER_ALPHABET = [*"A".."Z"]

def initialize(shift_n)
@shift_n = shift_n
@mapping = create_mapping(@shift_n)
end

def create_mapping(shift_n)
l_mapping = LOWER_ALPHABET.zip(LOWER_ALPHABET.rotate(shift_n))
u_mapping = UPPER_ALPHABET.zip(UPPER_ALPHABET.rotate(shift_n))
l_mapping.concat(u_mapping).to_h
end

def encode(message)
# Iterate with each char, return results with map. mwahaha
message.each_char.map do |char|
@mapping.key?(char) ? @mapping[char] : char
end.join
end

end
``````

Hurray for array rotation! A demonstration:

``````2.4.1 :007 > shift_n = 5
=> 5
2.4.1 :008 > LOWER_ALPHABET.rotate(shift_n)
=> ["f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q",
"r", "s", "t", "u", "v", "w", "x", "y", "z", "a", "b", "c", "d",
"e"]
2.4.1 :008 > LOWER_ALPHABET.zip(LOWER_ALPHABET.rotate(shift_n))
=> [["a", "f"], ["b", "g"], ["c", "h"], ["d", "i"], ["e", "j"],
["f", "k"], ...
...
2.4.1 :012 > l_mapping.concat(u_mapping).to_h
=> {"a"=>"f", "b"=>"g", "c"=>"h", "d"=>"i", "e"=>"j", "f"=>"k",
"g"=>"l", "h"=>"m", "i"=>"n", "j"=>"o", "k"=>"p" ...
``````

Now, we have a hash containing the original and encoded characters for the
entire upper and lowercase alphabet. This solution could also work more easily with
non-english character sets.

After the initial work of making our mapping hash, we can perform the actual
encoding operation with relative ease.

### 3. The mighty "tr"

This is my fastest and best solution (and also the shortest!), which I'm quite pleased with. This method uses the same mapping technique I used in the `CaesarMap` solution, but gives it to `tr` instead.

``````class CaesarFast

LOWER_ALPHABET = [*"a".."z"].join # As of recent ruby, tr takes
UPPER_ALPHABET = [*"A".."Z"].join # strings, not arrays.

def initialize(shift)
i = shift % LOWER_ALPHABET.size
@decrypt = LOWER_ALPHABET + UPPER_ALPHABET
@encrypt = LOWER_ALPHABET[i..-1] + LOWER_ALPHABET[0...i] \
+ UPPER_ALPHABET[i..-1] + UPPER_ALPHABET[0...i]
end

def encode(string)
string.tr(@decrypt, @encrypt)
end

end
``````

## Benchmark

I tried to benchmark three scenarios:

• Performance on short string
• Performance on long string
• Performance on strings when already initialised
``````require "benchmark"

n = 50_000
messages = [
"#2 Post Ever!",
<<-STR
;sldkfj;asr/g;le/ra g/ar.gn.
smnf goiwragj ;aergj;aA;LSKJEF A;LKSJF ;AS.NFG,.MDNSFG.,M;LIDSFJG;
LSKDJFG;MARG ;laksdjf;alksjdfa;slkdgj ;alfskgj.mnga.lksjdfalslskk
g/ar.gn.,smnf goiwragj ;aergj;aA;LSKJEF A;LKSJF ;AS.NFG,.MDNSFG.,
M;LIDSFJG;LSKDJFG;MARG ;laksdjf;alksjdfa;slkdgj ;alfskgj.mnga.lks
STR
]

shift_n = 10
messages.each do |message|
puts "\n\ncurrent message: #{message}"
Benchmark.bm do |x|
x.report "orig" do
n.times { Caesar.transform(message, shift_n) }
end
x.report "map" do
n.times { CaesarByMap.new(shift_n).encode(message) }
end
x.report "map2" do
caesar = CaesarByMap.new(shift_n)
n.times { caesar.encode(message) }
end
x.report "tr" do
n.times { CaesarFast.new(shift_n).encode(message) }
end
x.report "tr2" do
caesar = CaesarFast.new(shift_n)
n.times { caesar.encode(message) }
end
end
end
``````
``````=>\$ ruby bench_caesar.rb
# Short message
user     system      total        real
orig  0.630000   0.000000   0.630000 (  0.629501)
map   1.120000   0.000000   1.120000 (  1.121047)
map2  0.290000   0.000000   0.290000 (  0.292893)
tr    0.310000   0.000000   0.310000 (  0.320662)
tr2   0.130000   0.000000   0.130000 (  0.130036)

# Long message
user     system      total        real
orig 15.890000  0.050000  15.940000 ( 16.036038)
map  8.090000   0.060000   8.150000 (  8.390967)
map2 6.690000   0.010000   6.700000 (  6.726338)
tr   0.300000   0.000000   0.300000 (  0.306944)
tr2  0.170000   0.010000   0.180000 (  0.177684)
``````

As shown in the first set of results, the original method performs faster than my hash implementation, but then quickly falls behind as soon as the string becomes long. And `tr` is fast, and even faster when you've already initialised the information it needs.