Accuracy Comparison of OCR Engines

In this post we track the accuracy of different OCR engines (FineReader, Google Vision, OmniPage, Tesseract and Transkribus) in order to monitor, in an objective manner, the performance of our own solution, Glyph.

The images we use in these tests originate from our collaborators and consist of old documents written using different languages or fonts. The texts generated by these OCR engines are also supplied by the collaborators along with ground truth information.

Since many OCR engines specialize on particular tasks (e.g., Transkribus on handwritten text, Tesseract on typed text, etc.), a direct comparison between any 2 OCR engines is not always possible.

Input Formats

All images represent scanned documents and are assumed to be layout-analyzed and reduced to a single-column-of-text format before being passed to any OCR engine.

Besides the image, the following information is supplied (if possible):

language
font type

No assumptions are made with regard to the image’s coloring, brightness, skew, resolution, font size, possible artifacts or level of distortion.

Metric

We measure the quality as Character Level Accuracy (%), computed using the Levenshtein distance through the following formula:

\[ACC (\%) = 100 \cdot \Bigl[1 - min\Bigl(1, \frac{\sum_{i=1}^{i=N}{Levenshtein(\hat{text_i}, text_i)}}{\sum_{i=1}^{i=N}{|text_i|}} \Bigr)\Bigr]\]

where \(\hat{text}\) represents the recognized text and \(text\) is the ground truth, for \(N\) available pairs of <image, text>.

Datasets

We received images for multiple test cases and will present them grouped by langauge and font type. For each image, the best result (lowest Levenshtein distance) is bolded; all results are aggregated at the bottom of each table.

Accuracy Comparison of OCR Engines

Input Formats

Metric

Datasets

English Antiqua

French Antiqua

German Fraktur

German Antiqua

Romanian Antiqua

Ukrainian Handwritten