TDD Katas / Exercises: Soundex (4 / 5+)

by Jeff Langr

May 15, 2019

hober; License

The 4th in series of blog posts in which I describe TDD katas & exercises that I’ve used for training purposes.

Soundex

Soundex is a known algorithm for encoding last names into a 4-character string. The goal is to encode similar-sounding names to the same representation, so that searches with slightly misspelled names will still find appropriate matches. Langer, Langre, Langr, and Lungrub, for example, all end up encoded in Soundex as L526.

The rules for Soundex encoding come straight from Wikipedia:

  • Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w.

  • Replace consonants with digits as follows (after the first letter): * {: .outer} b, f, p, v → 1

    • c, g, j, k, q, s, x, z → 2

    • d, t → 3

    • l → 4

    • m, n → 5

    • r → 6

  • If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by ‘h’ or ‘w’ are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.

  • If you have too few letters in your word that you can’t assign three numbers, append with zeros until there are three numbers. If you have more than 3 letters, just retain the first 3 numbers.

Three of the rules (the first, second, and fourth) are straightforward and likely represent the first set of rules tackled. A first positive test could be as simple as ensuring that ‘A’ encodes to ‘A000’.

I take no credit for the third rule–I copied that glorious real-world prose verbatim from Wikipedia. Good luck with understanding it: I’ve run through the exercise several times, and each time I still am a little uncertain about just what it’s saying.

As a result of its good potential for confusion (the implementation can get just a little tricky too), I decided to abandon Soundex for use as a day-one TDD exercise. I’d rather students focus on things that don’t distract much from the TDD learning due to their complexity.

I do think Soundex remains a nice kata that demands a decent amount of “real” thinking.

The exercise is closed, in that the set of four rules above represent the complete functionality for the Soundex algorithm. There are no (non-contrived) ways to extend the exercise. However, there are variants to Soundex, including something known as Metaphone. One simple variant, “Reverse Soundex,” involves prefixing the last letter of the name rather than the first. For purposes of a slightly more interesting exercise, then, you could introduce a new requirement to allow for changing the algorithm based on some configuration value.

Duration: 30-45 minutes

Core themes:

  • Incremental growth of a solution

  • Translating requirements into a test list

  • Test-driving something “real”

  • Getting green on read: Adding tests for confidence?

I used the Soundex exercise for a number of training sessions. For the book Modern C++ With Test-Driven Development, I used Soundex as the introductory lesson for teaching the fundamentals of test-driven development. I also later included a chapter that demonstrated use of the transformation priority premise as applied to the derivation of Soundex.

** For pairing TDD novices. Impacts to duration can include:

  • Whether it’s their first TDD exercise or a subsequent one

  • General level of programming proficiency

  • Programming language used

  • Exercise learning mode: pairing, mobbing, or solo?

Pingback: TDD Katas / Exercises: Stock Portfolio (1 / 5+)

Pingback: TDD Katas / Exercises: Multimap (2 / 5+)

Pingback: TDD Katas / Exercises: Name Normalizer (3 / 5+)

Pingback: DD Katas / Exercises: Soundex (4 / 5+)

Pingback: TDD Katas / Exercises: Risk Card Sets (5 / 5+)

Share your comment

Jeff Langr

About the Author

Jeff Langr has been building software for 40 years and writing about it heavily for 20. You can find out more about Jeff, learn from the many helpful articles and books he's written, or read one of his 1000+ combined blog (including Agile in a Flash) and public posts.