The caesar cipher has always delighted me, and it's one of the first problems that I solve whenever I learn a new language.


"Why the Caesar?"

A fair enough question. I like the caesar because it satisfies these conditions:

  • Use of loops & string building
  • Character/string operations
  • It's not too hard!
  • It has a few solutions

I'm going to go through three solutions that I've made for this in Ruby: one that's done the 'regular' way, another using hashes, and a final hail mary using the mighty tr


1. The regular solution

The regular solution, regardless of the language being used, is usually as follows

1. Get the string, and a shift value
2. Iterate through the string
    2.a. For a basic solution, normalise the case to upper or lower
3. Use the rotation value along with the modulus operator (and the start/end ascii values for the alphabet) to create the encoded character
4. Put results into string & return

Note: This solution normalises all character cases.

Without further ado:

module Caesar
  module_function

  FIRST_CHAR_INT = "a".ord.freeze # "a" has ascii value of 97
  ALPHABET_CHARS = [*"a".."z"]
  ALPHABET_SIZE = ALPHABET_CHARS.size

  def transform(str, num)
    str.downcase.each_char.map do |char|
      !letter?(char) ? char : shift(char, num)
    end.join
  end

  # I've split the calls in this method up for clarity
  def shift(char, num)
    # Subtract 97 so the maths is easier
    start = char.ord - FIRST_CHAR_INT
    # Add the shift value, then wrap around with %
    new_int_value = (start + num) % ALPHABET_SIZE
    # Then add the 97 back on, and we're done!
    encoded_int_value =  new_int_value + FIRST_CHAR_INT
    tmp.chr
  end

  def letter?(char)
    ALPHABET_CHARS.include?(char)
  end
end

I've illustrated the logic with some comments, this solution doesn't need any more attention as it's the least interesting one.


2. The hash solution

After having implemented this cipher a number of times, I began to wonder if it could be made better.

  • The logic is a little unweildy.
  • What if we needed to preserve case?
  • What if we were encoding really long strings?
  • What if the shift value remained the same, but we needed to encode a series of messages?
class CaesarByMap

  # I've done a bit of trickery here to make these short, but you
  # could just individually define all the characters in an array.
  LOWER_ALPHABET = [*"a".."z"]
  UPPER_ALPHABET = [*"A".."Z"]

  def initialize(shift_n)
    @shift_n = shift_n
    @mapping = create_mapping(@shift_n)
  end

  def create_mapping(shift_n)
    l_mapping = LOWER_ALPHABET.zip(LOWER_ALPHABET.rotate(shift_n))
    u_mapping = UPPER_ALPHABET.zip(UPPER_ALPHABET.rotate(shift_n))
    l_mapping.concat(u_mapping).to_h
  end

  def encode(message)
    # Iterate with each char, return results with map. mwahaha
    message.each_char.map do |char|
      @mapping.key?(char) ? @mapping[char] : char
    end.join
  end

end

Hurray for array rotation! A demonstration:

2.4.1 :007 > shift_n = 5
 => 5
2.4.1 :008 > LOWER_ALPHABET.rotate(shift_n)
  => ["f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q",
  "r", "s", "t", "u", "v", "w", "x", "y", "z", "a", "b", "c", "d",
  "e"]
2.4.1 :008 > LOWER_ALPHABET.zip(LOWER_ALPHABET.rotate(shift_n))
=> [["a", "f"], ["b", "g"], ["c", "h"], ["d", "i"], ["e", "j"],
["f", "k"], ...
...
2.4.1 :012 > l_mapping.concat(u_mapping).to_h
=> {"a"=>"f", "b"=>"g", "c"=>"h", "d"=>"i", "e"=>"j", "f"=>"k",
"g"=>"l", "h"=>"m", "i"=>"n", "j"=>"o", "k"=>"p" ...

Now, we have a hash containing the original and encoded characters for the
entire upper and lowercase alphabet. This solution could also work more easily with
non-english character sets.

After the initial work of making our mapping hash, we can perform the actual
encoding operation with relative ease.


3. The mighty "tr"

This is my fastest and best solution (and also the shortest!), which I'm quite pleased with. This method uses the same mapping technique I used in the CaesarMap solution, but gives it to tr instead.

class CaesarFast

  LOWER_ALPHABET = [*"a".."z"].join # As of recent ruby, tr takes
  UPPER_ALPHABET = [*"A".."Z"].join # strings, not arrays.

  def initialize(shift)
    i = shift % LOWER_ALPHABET.size
    @decrypt = LOWER_ALPHABET + UPPER_ALPHABET
    @encrypt = LOWER_ALPHABET[i..-1] + LOWER_ALPHABET[0...i] \
      + UPPER_ALPHABET[i..-1] + UPPER_ALPHABET[0...i]
  end

  def encode(string)
    string.tr(@decrypt, @encrypt)
  end

end

Benchmark

I tried to benchmark three scenarios:

  • Performance on short string
  • Performance on long string
  • Performance on strings when already initialised
require "benchmark"

n = 50_000
messages = [
  "#2 Post Ever!",
<<-STR
a;lsdkfj;alskdfjlsad as;ldkfa;sdflkja
;sldkfj;asr/g;le/ra g/ar.gn.
smnf goiwragj ;aergj;aA;LSKJEF A;LKSJF ;AS.NFG,.MDNSFG.,M;LIDSFJG;
LSKDJFG;MARG ;laksdjf;alksjdfa;slkdgj ;alfskgj.mnga.lksjdfalslskk
dkddssssa;lsdkfj;alskdfjlsad as;ldkfa;sdflkja ;sldkfj;asr/g;le/ra
g/ar.gn.,smnf goiwragj ;aergj;aA;LSKJEF A;LKSJF ;AS.NFG,.MDNSFG.,
M;LIDSFJG;LSKDJFG;MARG ;laksdjf;alksjdfa;slkdgj ;alfskgj.mnga.lks
STR
]

shift_n = 10
messages.each do |message|
  puts "\n\ncurrent message: #{message}"
  Benchmark.bm do |x|
    x.report "orig" do
      n.times { Caesar.transform(message, shift_n) }
    end
    x.report "map" do
      n.times { CaesarByMap.new(shift_n).encode(message) }
    end
    x.report "map2" do
      caesar = CaesarByMap.new(shift_n)
      n.times { caesar.encode(message) }
    end
    x.report "tr" do
      n.times { CaesarFast.new(shift_n).encode(message) }
    end
    x.report "tr2" do
      caesar = CaesarFast.new(shift_n)
      n.times { caesar.encode(message) }
    end
  end
end
=>$ ruby bench_caesar.rb
# Short message
      user     system      total        real
orig  0.630000   0.000000   0.630000 (  0.629501)
map   1.120000   0.000000   1.120000 (  1.121047)
map2  0.290000   0.000000   0.290000 (  0.292893)
tr    0.310000   0.000000   0.310000 (  0.320662)
tr2   0.130000   0.000000   0.130000 (  0.130036)

# Long message
      user     system      total        real
orig 15.890000  0.050000  15.940000 ( 16.036038)
map  8.090000   0.060000   8.150000 (  8.390967)
map2 6.690000   0.010000   6.700000 (  6.726338)
tr   0.300000   0.000000   0.300000 (  0.306944)
tr2  0.170000   0.010000   0.180000 (  0.177684)

As shown in the first set of results, the original method performs faster than my hash implementation, but then quickly falls behind as soon as the string becomes long. And tr is fast, and even faster when you've already initialised the information it needs.