Digital Library Center
Smathers Libraries
University of Florida
P.O Box 117003
Gainesville, FL32611-7007 USA
P: 352.273.2900
F: 352.846.3702
DLC@uflib.ufl.edu
Going from letters on the printed page to online searchable text involves the following steps:
Once the digital scanning has been completed, along with the necessary quality control of the digital images for image quality and skew, Prime OCR conducts image zoning if the target data is arranged in columns or tables.
Plain-text files are created from the TIFF image files by means of optical character recognition (OCR). Alternative to OCR: lots of typing.
| Original Image File (TIFF) | Plain Text File (TXT) |
|---|---|
|
Shingles- Manufacturers of. DIXON NICHOLAS, First av c Miller (for ad. see index) Silver and Silver Plated Ware. AYRES C. L., Franklin c Jackson (for ad. see index) Skating Rinks- Roller. Jackson c Morgan Charles Parcell, prop. |
Applying markup to the textual product of OCR comprises three topics, in order of application:
Prime Recognition™'s output has greater than 99% accuracy, which reduces the amount of time required to spend on quality control. Still, we currently proofread the tables of contents in the SGML file.