Common Indexing Errors

As someone who has indexed thousands of names and used numerous name indexes, I feel qualified to point out some of the errors one typically finds in indexed records. Experience in correctly* reading handwriting and familiarity with Polish names in general are definite assets. But too often many indexes were prepared by people who do not have much if any skill reading Polish records. There are three fundamental types of indexing errors we encounter.

The first of these is that the person who created the record made a factual error. Perhaps they wrote Jan when the person's name was actually Wojciech is a factual error. As a real example, I ran across the name Kucharzchieski in a census index. When I looked at the actual census image, the indexer was faithful to what was written. Since the family was Polish, Kucharzchieski to me was unlikely and I thought of Kucharzewski as being more reasonable. That is indeed the correct spelling. If you are aware of common indexing errors, you can try alternative strategies if your search initally comes up empty. You typically fill in a search form which I'll refer to as a query. Be cautious and assume the record will have a factual error. To hopefully get around this type of error, it's better to be stingy with what you put in the query. Unless they have a very common surname, you might want to search by surname only. Another reason for leaving out a given name is we can't always anticipitate how the given name was recorded-- Was it Joannes, Jan, John, or Johan? Even if the given name is wrong, at least you can compare other data like a date, spouse, parent, etc. and decide if the given name was wrong.

The rest of this article deals with the other two types of indexing errors commonly encountered. The second fundamental type of error is the sound-alike error. Remember, most of the records we use were not completed by the person the record is about. The recorder spelled the name the way it sounded to them. This is where a sound matching system like Soundex can be valuable. Unfortunately, not all databases provide a sound-alike search. In those instances, you might want to experiment with your own spelling variations of a name.

The third fundamental type of indexing error is the look-alike error. These errors seem to be the most common. An indexer, particularly an inexperienced one, will misread what is in the record. Let's talk about some of the most common errors I have encountered. There will be a short quiz at the end article to see if you were paying attention!

1. Confusion of the letter t and the Polish ł. For example, Pobiegto is really Pobiegło. This is probably the most common error indexers make. Try substituting one letter for the other in the query.
2. Vowels often get substituted for other vowels. This is usually a sound-alike error. Depending on the recorder, a script 'a' can easily be confused with an 'o'. For example, Ćwiok may be written as Ćwiak. There is also interchangeability between ó and u since they sound alike. Sound matching systems generally ignore vowels so this is usually not a major problem. A more serious look-alike problem is when the letter c gets confused with o or e (note their round shape) and vice versa because the sound of the c is not like a vowel. Try using a wild card character for a single letter in place of a vowel in a query.
3. Handwritten Z, S, and L get confused. Because the Z and S sound-alike, that confusion is common but be aware that the L is a good look-alike. An example of this was the name Zielinski which had been indexed as Lielinski.
4. The Polish z and American r often look alike. For example, Szadorski had been incorrectly indexed as Sradorski in one record.
5. The letters z and g can be confused. For example, Stenzel had been incorrectly indexed as Stengel. The descender of the z made it look like a g in the indexer's mind.
6. The letters y and j get substituted for one another. For example, Wojcik might be written as Woycik. This typically happens in older records.
7. The letter w in some adjectival names might be left out or put in where it shouldn't be. For example, Staniszewski might be written as Staniszeski. I have also seen someone with a bit of Polish familiarity turn Kolaski into Kolawski.
8. Sometimes r and l get mixed up. For example, Rolbiecki and Lorbiecki were found to refer to the same family. Usually only one letter is substituted (not two as in the example).
9. Single and double vowels. For example, the indexer said Baniel when it should have been Banul. See also # 13 below. In describing error # 2 above, I suggested using a single letter wildcard in place of a vowel. That won't work if there's a double vowel. You could use a different wildcard representing one or more letters, but that might give too many irrelevant matches.
10. Single and double consonants. For example, Kiolbassa and Kiolbasa. This is the same problem like error # 9 above.
11. The letters u and n may look alike. For example what was indexed as Szndzinski was really Szudzinski. While most often a handwriting issue, I have seen 'fuzzy' printed copy where you can't tell the difference.
12. The Polish ń has been seen substituted by the Polish j. For example, Gońka appears in some records as Gojka (this is a sound-alike of Gojnka/Gońka). This is a sound-alike error.
13. Sometimes letters overlap. An example of this is Belter being incorrectly indexed as Better because the crossing of the t makes the l look like another t. Or sometimes what is a single letter is interpretted as two letters. For example, what had been indexed as Okori was actually Okoń
14. Sometimes there is confusion of e and l. This is a look-alike issue. For example, what had been incorrectly indexed as Dzillewicz was really Dzielewicz.
15. While very rare, someone will substitute a Qu for a K as a sound-alike. For example, Kwasniewski might be written as Quasniewski.

Here's a few more problems I've seen and of how badly some indexes have been prepared. One indexer kept adding the letter v into names every time they saw what was really an r. Kvucaynski was actually Kruczynski. The indexer completely misread what was in the record as the z looked nothing like an a. One indexer indexed the letter g as the letter q. Oqurek was actually Ogurek. Another example of error # 13 above is the combination cz being indexed as g or ignoring the c and indexing it as a z. The cursive m and w or h and k sometimes get mixed up. Someone might argue and say that the handwriting was terrible so the indexer was guessing. The examples I have given are from decently handwritten records-- so why are there so many errors?

I have probably only scratched the surface of the subject of indexing errors. I provided some examples above. Now it's your turn to see if you can figure out what the following names actually are: Szerefran, Crestaw, Szozefrairski, Gtoivczewski, Szymoryk, Czaruccki, Korale. These are but a few of the many nonsensical names that I have corrected recently from a Chicago Polish neighborhood in the 1900 US Census. The enumerator in this instance was Polish and had good handwriting. The problem was the people who did the indexing didn't know what they were looking at and presumably the arbitrator wasn't any better. If you find this quiz too difficult, it is completely understandable because I am not showing you the actual record images from whence these names came.

OK. Ready for the answers? Szerefran is Szczepan (error # 2, 4, and # 13 above), Crestaw is Czesław (error # 1 and 4 above), Szozefrairski is Szczepański (error # 2 and # 13 above, the f and p were confused but is like the possible relatedness of Stefan and Stephen), Gtoivczewski is Główczewski (error # 1 and # 13 above), and Szymoryk is Szymczyk (error # 2 and 4 above), Czaruccki is Czarnecki (error # 11 and 2), Korale is Kozub (error # 4, 2, and 13).

* Someone may have the experience of indexing hundreds of thousands of English names and perhaps they did a good job. I have seen too many examples of terrible indexing of Polish names perhaps by people who are otherwise classified as 'experienced' but who do not have the right kind of experience. I think the LDS sometimes focuses more on quantity than on quality in their indexing endeavors.