It seems as if every day there are more and more historical sources available online. While it’s hard to replicate a day in the archives, online repositories are allowing for fast, efficient searching and discovering of valuable historical sources. I love that this is happening, particularly in databases and collections that are made freely available to the public (such as Voyages: The Trans-Atlantic Slave Trade Database and the Legacies of British Slave-Ownership project) rather than hidden within pay-per-view websites.
Personally, I prefer to see an image of the complete original document rather than a transcription, but when a relevant document isn’t indexed or transcribed, it becomes much harder to find using an online search and just about impossible to know what it contains. My discussion last fall about the frustrations of the online Loyalist records is one such example of the important difference between a source having been digitised and ‘dumped’ online, and a source that has been digitised and optimised for online research, typically through a combination of indexing, transcription, and the addition of metadata.
Indexing allows for all the documents or contents of a database to be organised and searched quickly and efficiently. An index will include such brief items of information contained within or about a document, such as names, dates, and locations. These then tend to become the fields of the search page.
Transcribing is the act of creating a written or typed copy of a source. Instead of choosing key characteristics of all of the items in question that can then be compared (including things that might not have been within the original document but instead describe it), a transcription is a copy of what is in the document. This means a lot more information is recorded for each item within the collection or database. Searching can take a lot longer, and there is much more room for error.
Metadata is the information provided about other information. In many cases, metadata is placed ‘on top’ of a digitised document. There are many different types of and uses for metadata. For digitised historical sources, it could be displayed as information bubbles that pop up when you hover over a word with your mouse, or links that have been embedded into the transcription to lead you to more information from another webpage, or explanatory pages and introductions, or the transcriptions themselves.
The historians, archivists, librarians, and genealogists who create the online databases provide much of this information. They make decisions about what documents are important enough to be included in a digital database (or if the database will include every item in the hardcopy collection), what defining elements will be chosen to go in the index and be searchable by an internal search function or popular search engine, whether full transcriptions will be created and provided, and what metadata will be added to the digitised sources.
Transcriptions may be carried out by individuals or through the use of a computer software programme that utilises optical character recognition (OCR) and can ‘read’ typed digitised documents. Computer software doesn’t necessarily improve accuracy, as, in my experience, the output tends to include misread characters that editors then need to correct, and handwriting still typically needs to be read by a real person.
I wouldn’t be able to do my work without digitised, indexed resources, and I know I’m not the only historian who feels this way. Next week I’ll talk about the emergence of the ‘citizen historian’ and the ways in which volunteers are now creating indexes and transcriptions of tens of thousands of online documents.