Searching Databases Efficiently
Getting the Most Out of Databases
by James J. Czuchra
Have you used online databases in you family history research and thought you should have found what you are looking for in one but didn’t? This article, originally published in the PGSA's Rodziny and modified for online publication, addresses some strategies you might try for greater success. Your success depends on a lot of factors.
Was the information initially recorded accurately? Most records were not filled out by the party the record pertains to.
Has the record undergone any changes like smeared ink, torn pages, or scratches in the microfilm? If you are using copies, be they photocopies or even microfilm, how faithfully have they been reproduced? Perhaps significant information was cut off in the imaging process.
Has the information from the record been transcribed by the indexer accurately into the database? This is influenced by the quality of the writing and the skill of the indexer, among other factors.
So, there are a lot of places things can go wrong and your search may not turn up the results you had hoped for. Here are some strategies to help you be more successful as you conduct database searches.
The approach I take is to conduct a broad search to improve my chances of getting some useful data, but not an unmanageable amount. This often means you should search on the surname only. If the amount of data returned is not manageable, you can always narrow your search criteria. The surname-only approach has two benefits. You may know a person’s given name but you don’t always know how it has been recorded. Should you search on the Polish, English, Latin, German, Russian or some other version of the name (for example, Wojciech, Albert, or Adalbertus)? From the surname-only search results, you can pick out who you are interested in without having to guess ahead of time how the given name is indexed.
The other reason for a surname-only search is that you may find collateral family lines that provide clues to your direct family line. For example, my grandfather was born in Chicago but his parents were married in Poland before immigrating to the US. Where in Poland did they come from? The answer to that question came from searching marriage records of my grandfather’s older siblings who had been born in Poland. Focusing exclusively on direct line ancestors is what I call genealogical tunnel vision—missing information by not taking in the bigger picture.
Another example of genealogical tunnel vision is a belief that the spelling of names never change. In responding to researcher questions, I frequently suggest that they look up spelling variations of a name. Some have dismissed the suggestion because “my family ALWAYS spelled the name THIS way.” I chuckle to myself because they don’t realize that it was rarely their family member who created the written record. It was usually recorded by someone else the way it sounded to them. Furthermore, many of our peasant ancestors were illiterate or nearly so and not in a position to do “correct” spelling. It is helpful to know something about Polish letters and sounds and how differently spelled names can sound essentially the same. Add to this confusion the fact that other languages render the same sounds with different spellings. Consider for example the name Szmelter. Reasonable variations might be Schmelter or Smelter in German. This is where learning some basics of a language can be a real help in selecting spelling variations to search on. Let’s continue with some other suggestions.
Polish is a language that uses a lot of suffixes added onto root words and changes their grammatical function. The suffix issue comes in frequently in the difference between a man or woman’s name. Should I search for Maria Kowalski or Maria Kowalska or both? In most cases, the answer is both. Several of the databases I created were developed with the English speaking novice in mind. The novice might not be aware to search for two spellings so the index usually converted the feminine form to the masculine form. So a search on Kowalski is usually all that’s necessary. Additionally, all the “surname” databases on this website have the search options: exact, match first, and wild card. Exact match is for those whose family ALWAYS spelled the name a certain way.
The default search option though is “match first”. Use the match-first option for handling different suffixes. So instead of searching on Kowalski and then Kowalska, I would only enter Kowalsk in the search box. All the results would begin with those letters so I would catch Kowalski and any Kowalskas. If I was searching for Kwiatkowski, I might enter just Kwiat. I’ll get my Kwiatkowskis but also some Kwiateks who may be related. All I did was broaden the search.
Most of these indexes were prepared from microfilmed manuscripts and the writing quality varied widely. An ‘a’ can look like an ‘o’ or ‘e’ (I call them roundy letters). An ‘o’ can look like a ‘u’ if not fully closed, the Polish ‘y’ sounds like a short ‘i’, and the Polish ‘ó’ sounds like a ‘u’. The point is that vowels are the most difficult to transcribe because some look alike in bad writing or sound alike in any writing. Interestingly, if you pronounce names with different vowels, they still sound remarkably the same (This is the basis of the Soundex system which eliminates the vowels). Let’s say I’m searching the family Pietrowski. That “roundy” ‘e’ could be an ‘o’. I could do a separate search on Piotrowski to find them. But, here are a couple of database tricks that will work on this site provided your search option is ‘match first’ or ‘wild card’.
You can substitute any single letter with a ‘_’ (underscore character). So, what I type in the search box is Pi_trowsk. This will return all names beginning with ‘Pi’, ANY third letter, ‘trowsk’, and any ending (because I’m in match first mode). Underscore always substitutes for one character but your search can contain more than one of them. For example, you could search on Pi_tr_wsk. If that’s not cool enough for you, you can also use the ‘%’ (percent sign) character in your searches. You can substitute for zero or more characters with a ‘%’. If I was worried that the name got really mangled and got indexed as Patrowski or Petrowski, rather than searching on them individually, I could search on P%trowsk. This would return those variants if they existed in the database. The % and _ are called wildcard characters. Databases at other websites may have wildcard characters that you can use but they might not be the same ones used here.*
Summary of Strategies:
- Use surname-only searches when practical.
- Research collateral as well as direct family lines.
- Search on spelling variations of the surname.
- Use a match first option (if available) to catch root names with different endings.
- Use wildcard characters to substitute for letters you are not sure of, particularly vowels.
*Match first is implemented by adding % to the end of a name and wild card is implemented by adding % to the beginning and end of the name. So provided you are allowed to use wildcard characters, you can implement the search option you want.