Coding Names by Sound

by James J. Czuchra

When I began indexing parish records in the early 1980s, I was running across similar surnames that were spelled slightly differently from how my family members spelled their surnames. Yet if I said the names aloud, they still sounded alike. Indeed research showed these alternatively spelled surnames were part of my family. As I did census research at a National Archives branch, I learned of the Soundex code and how it grouped together similar sounding names even if they were spelled differently. Many of my alternatively spelled surnames yielded the same Soundex code-- precisely as they should. Working with the Soundex code was a must for using the census indexes which are based on Soundex. This was a good strategy because the census taker sometimes did not know how to spell a name correctly but could write down how it sounded to him. So even if the name was spelled "incorrectly" the Soundex index is a tool for finding the name based on how it sounds. Soundex is so common that database and programming software often have a built in function to compute the Soundex code.

In examining the Soundex rules, it became apparent that it was not designed with Polish names in mind. The rules are applied to the spelling of a name in the hope that the code corresponds to the pronunciation. Because Polish letters have different sounds from English, the Soundex code for many Polish names doesn't always represent the way they are pronounced.

Back then I dreamed of a new and improved Soundex that would account for the Polish sounds of letters. I put off developing such an improvement until the mid-1990s, when the World Wide Web was just taking off with the public. In doing a web search for linguistic guidance, I discovered someone had already developed a new coding system that could handle Polish sounds! And it had been developed in 1985! The system is known as the Daitch-Mokotoff system and is capable of handling more than just Polish sounds. Instead of the 4 character Soundex code, the Daitch-Mokotoff system codes 6 characters. This means it can usually do a better job distinguishing between names. Because some letters can be pronounced more than one way, some names can have more than one coding.

Enter a name into the field below.

The Soundex calculator above uses the built in function of the software to generate the code. The Polish letters are treated as regular letters. The Daitch-Mokotoff calculator used here was custom written. The Polish letters are generally converted to their sound-alike equivalents. While this calculator is believed to be faithful to their coding scheme, I make no guarantees. Only one code is provided using the most common Polish pronunciation. Remember that some letters have more than one sound making it possible to have more than one code.

Additional Sources
Soundex coding explained
The 1880, 1900, and 1910 census sometimes ignored one of the coding rules about consonant separators of similar sounds. If this rule applies, the calculator on the Polish Family Information website (above) will return two Soundex codes.
Daitch-Mokotoff coding explained
Daitch-Mokotoff Calculator
The above link provides both a regular Soundex and Daitch-Mokotoff calculator. What is more, the calculators provide alternative codings. Essentially this site provides "one stop shopping" as far as coding names by sound is concerned.