The big test: what's the best handwriting to text OCR in 2024?
Accurate handwriting to text conversion remains one of the toughest challenges in OCR today. But with the massive steps forward in AI over the last few years, OCR has also improved at even the most difficult challenges. In this article, we're going to look at the leaders in handwriting to text conversion, comparing their performance against a typical handwritten input.
Read on - or head straight to the results!
The Word Error Rate (WER) is a standard metric of accuracy when determining the performance of a transcription system, used in OCR, speech recognition and adjacent fields. It is calculated by counting the number of substitutions, deletions and insertions that are needed to make a transcription match the ideal "reference" text, and dividing that number by the total number of words in the text.
Our plan was to pass a sample image through a number of leading online handwriting to text converters. We would then compare the output from each against the reference, a model transcription of the image (see below). This would allow us to calculate the Word Error Rate (WER) achieved by each service against this sample, giving us an objective means of comparison for all.
We then planned to use an online WER calculator to determine the Word Error Rate delivered by each handwriting to text transcription.
We started by taking a sample image containing a short handwritten passage.
Transcribed manually, the reference text goes like this:
HANDWRITING: that action of emotion,
of thought, and of decision that has
recorded the history of mankind; revealed
the genius of invention, and disclosed
the inmost depths of the soulful heart.
It gives ideas tangible form through
written letters, photographs, symbols, and signs.
Handwriting forms a bond across millennia
and generations that not only ties us to
the thoughts and deeds of our forebears,
But also serves as an irrevocable link to
our humanity. Neither machines nor technology
can replace the contribution or continuing
importance of this inexpensive portable skill.
Necessary in every age, handwriting remains
just as vital to the enduring saga of
civilization as our next breath.
Results
HandwritingOCR.com
We started our tests with our own handwriting to text engine, HandwritingOCR.com. After uploading the test image though the UI, we received the following result a few moments later:
HANDWRITING: that action of emotion,
of thought, and of decision that has
recorded the history of mankind; revealed
the genius of invention, and disclosed
the inmost depths of the soulful heart.
It gives ideas tangible form through
written letters, photographs, symbols, and signs.
Handwriting forms a bond across millennia
and generations that not only this us to
the thoughts and deeds of our forebears,
But also serves as an irrevocable link to
our humanity. Neither machines nor technology
can replace the contribution or continuing
importance of this inexpensive portable skill.
Necessary in every age, handwriting remains
just as vital to the enduring saga of
civilization as our next breath.
We were very pleased with the result. There was only one transcription error compared to the reference text: on line 9, instead of the word "ties", HandwritingOCR recorded "this" instead.
Passing these results to the Word Error Rate (WER) calculator, this gave an overall WER of 0.9%.
Google Document AI
Next we tried Google Document AI. Google is a leader in artificial intelligence, and in document automation, so our expectations were high. Although Google has several products capable of understanding text from images and documents, its flagship service for OCR is Document AI, which offers various document processors.
Google Document AI does not provide a user-facing interface for day-to-day use, and is typically used as an API. However, it is possible to perform a test via the Document AI Workbench. Inside the Workbench, we created a Document OCR processor. From there, we were able to upload and process our test file.
The results were surprising.
a bond across
signs.
millennia
HANDWRITING: that action of emotion,
of thought, and of decision that has
recorded the history of markind; revealed
the genius of invention, and disclosed
the inmost depths of the soulful heart.
It
gives ideas tangible form theough
mitter letters, photographs, symbols, and
Handwriting forms
and generations that not only ties us to
the thoughts and deeds of our forebears,
but also serves as an irrevocable link to
humanity. Keither machines not technology
can replace the contribution or continuing
importance of this inexpensive portable skill.
Necessary in
in
every age, handwriting
handwriting
remains
just as vital to the enduring saga of
civilization
our
vas
our next breath.
We found many transcription errors but, more surprisingly, we found that the Google Document AI OCR engine was unable to read the text in the order it was written. Instead, some parts of the text were taken out of their place in the text and placed elsewhere in the text. This made the result difficult to read, and would have required a lot of manual editing post-processing to be useful.
We took this result and, comparing it with the reference text in the Word Error Rate (WER) calculator, got a WER of 23.3%. That is, one in five of all words in the transcription were incorrect.
Microsoft Azure Document AI
Along with Google, Azure is another leader in the field of document automation. For printed text, Azure's OCR offering is one of the best available. How would it compare when faced with handwritten text?
Like Google, Azure does not provide a user interface to upload and process documents in bulk. Most customers will access the service via an API. However, Azure provides a basic interface to test out the OCR capabilities of its Azure Document AI.
We uploaded our sample image to Azure Document AI Studio, selecting the flagship General Document OCR model, and started processing.
HANDWRITING: that action of emotion, of thought, and of decision that has recorded the history of mankind, revealed the genius of invention, and disclosed the inmost depths of the soulful heart. It gives ideas tangible form through written letters, photographs, symbols, and signs. Handwriting forms a bond across millennia and generation that not only this us to the thoughts and deeds of our forebears, but also serves as an irrevocable link to our humanity. Neither machines nos technology can replace the contribution or continuing importance of this inexpensive portable skill. Necessary in every age, handwriting remains just as vital to the enduring saga of civilization as our next breath.
Azure's Document AI model performed much better than its similarly-named rival from Google. The text flowed in the correct sequence, and transcription errors were not common.
Passing this result to a Word Error Rate (WER) calculator, and comparing it against the reference text, we found that 8.67% of words contained errors.
Amazon Web Services (AWS) Textract
Amazon is the third of the "big three", and claims to be a leader in handwriting to text OCR with its AWS Textract service.
We created an account at AWS and accessed the Textract demo interface in the US East 2 region. There, we could upload our test file and use Textract to convert the handwriting to text.
HANSWRITING: that action of emotion,
of thought, and of decision that has
recorded the history of markind, revealed
the genius of invention, and disclosed
the inmost depths of the soulful heart.
It gives ideas targible form through
written letters, photographs, symbols, and signs.
Handwriting forms a bond across millennia
and generation that not only this us to
the thoughts and deeds of our forebears,
but also serves as an irrevocable link to
our humanity. Neither machines nor technology
can replace the contribution or continuing
importance of this inexpensive postable skill.
Necessary in every age, handwriting remains
just as vital to the enduring saga of
civilization as our next breath.
The results were good, though not outstanding. The Word Error Rate (WER) for Textract's transcription was 10.5%.
As with Google and Azure, the user must develop their own interface to access the API for production use.
Transkribus
Transkribus is a European OCR service using AI to transcribe handwritten documents, with a particular focus in historical documents. Like our service, Transkribus is specialised in handwriting OCR, and we expected its results to be excellent.
We created an account at Transkribus, and began a trial that allowed us to use Transkribus' premier model The Text Titan I (Super Model), claiming an error rate of just 2.95%.
We uploaded our file, and selected English as the input language. We got the following result:
HANSONIITING: that action of enotion
of thought, and of decision that has
accorded the history of markird, revealed
the genus of inuertior and disclosed
the irnost depths of the soulful heart.
It gives ideas targible from theough
neitter letters,selogeaphs, symbol, ad sigis.
Handwertig jouns a bond accoss millerria
ard gereration that not only the us to
the thoughts andheeds of one forebess,
but aha secues as an inenocable lirk ta
our hun Gity tochines noe tubrology
car replacé the contribuhor be containing
importance of this oneyersine poitable skill.
Décissary ir enay age, hardewitig romains
fut as vital to the endurirg saga of
Civilization as our next krath.
This was not a usable result. Many, if not most, words were incorrectly transcribed and replaced with non-words.
Entering this result into a Word Error Rate (WER) calculator, the WER for Transkribus was 47.7%. Almost half of the transcript was incorrect, and this result would evidently be unusable in practice.
Tesseract
Next we turned to the leader in open source optical character recognition (OCR), Tesseract. Tesseract was originally developed by Hewlett-Packard in the 1980s, before being open sourced in 2005, and receiving support from Google since 2006. Tesseract can be installed to a local computer and has the advantage of being completely free to use. But is it any good for handwritten text?
We installed the latest version of Tesseract, version 5, on our Mac. This was easy. Just run the command brew install tesseract and, a few moments later, Tesseract was ready to run from the Terminal.
Then we ran the following command:
tesseract -l eng handwriting-sample13.jpg stdout
This instructs Tesseract to perform OCR on the JPEG file called handwriting-sample13.jpg, with English as the source language, printing the result to the Terminal.
Unfortunately this resulted in no output at all. The resolution of our source image was too low for Tesseract.
Since we wanted to have some kind of output to compare against the rest, we upscaled the source image by 4x. This resulted in a much sharper image that we once more passed to Tesseract:
tesseract -l eng handwriting-sample13-upscaled.jpg stdout
And here's the result:
ie |
fed nae = ete Baty
of t Vein ak Gina Ake pacar ) pa "Slicclaee
fe pee
atte Alters ; “hibgiga ) 1 toe, Jp BYE.
Feel FEMA HM Hed ane. millennia yd x Hhale not Hi. us, AB Ake Gorge and wand leeds °D oe forebears. .
Clearly this is not a usable result, and we find only a few words (e.g. millenia, forebears) from the source image to tell us that the two are in fact related!
Tesseract produced a Word Error Rate (WER) of 95.4%, according to the online calculator we used. Almost every word in the transcription was incorrect.
The results
We were delighted to see that HandwritingOCR produced the best results in this test of handwriting to text conversion services. The next best service produced three times as many errors.
Accurate handwriting to text conversion is one of the final challenges in document automation that could unlock efficiency gains throughout business, industry, and our personal lives. AI will help to drive innovation in this space at an accelerating pace, and we are pleased to be leading this progress.
We have documented this test with screenshots, and the verbatim output of each handwriting OCR service. These results can be reproduced using the same test image we used, which is linked to at the beginning of this article.