The python standard library 'collections' contains some of python's best and most performant features, but is often overlooked in solutions.

I'm going to be examining more collections in the future, but thought that Counter would be a good place to start.

Rather than exhaustively demonstrating the features of each module, I'm going to apply it to a singular problem instead, and leave further exploration up to the reader (docs will always be linked).

The problem:

For each word in a sentence, I need to calculate how many times it appears in that string.

The following snippet is from https://www.w3resource.com/python-exercises/string/python-data-type-string-exercise-12.php,

which is the third Google result when searching 'python count occurrences in string'

This method takes in a sentence string, and returns a dict containing word -> number of occurrences.

def word_count(str):
    counts = dict()
    words = str.split()

    for word in words:
        if word in counts:
            counts[word] += 1
        else:
            counts[word] = 1

    return counts

Then, to run:

print(word_count('the quick brown fox jumps over the lazy dog.'))

--

{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog.': 1}

Fairly straightforward, true? There's a problem with this solution though - it's not a 'pythonic' solution. It's the generic logical flow of

  • initialising a hash/dict
  • increment keys in the dict by 1, or initialise as 0

This solution can be implemented in pretty much any language, and it would look almost exactly the same. Whenever I solve a problem, I look to see if the language I'm using provides a unique or tailored solution.

Python has quite a few powerful features in its standard library, which I have become a huge fan of.

Enter, Counter!

The Counter docs can be found on the collections documentation page, the entirety of which is an essential read.

Counter is incredibly useful at counting all sorts of things. It can be used as an incremental counter, as below:

from collections import Counter

def word_counter(str):
  c = Counter()
  for word in str.split(' '):
    c[word] += 1
  return dict(c)

Usage and output are identical, using half the LOC and the workings of the method are perfectly clear! Mission complete.

print(word_count('the quick brown fox jumps over the lazy dog.'))

--

{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog.': 1}

But wait! After all that talk before, this can actually be solved in one very short line.

def word_counter(str):
  return dict(Counter(str.split()))

Counter can take in a list of words and count them on initialisation, so all that is needed is to

  • Split the string into a list
  • Count the occurrences of each element in the list by passing it to Counter()
  • Turn the Counter back into a regular dict

Epilogue

Don't trust the first articles that you see on the net! Learning to apply a healthy dose of skepticism to every answer that you see is one of the many steps in becoming a good programmer.