Wednesday, November 27, 2013

OCR - Optical Character Recognition Technology

A little over a year ago, Mocavo acquired ReadyMicro and the incredible mind known as Matt Garner. One of Matt’s lifelong passions and curiosities is to enable computers to read historical handwritten documents to bring genealogy search to the next level. It’s well known in the genealogy industry that historical handwriting recognition is the Holy Grail – the single largest technological advancement that would enable more content to become accessible online (except for maybe the invention of the Web). For the past year, we’ve joined with Matt to tackle this very hard problem, and have finally made enough progress that we can begin to report on it.

Historical handwriting recognition is one of the toughest technical challenges to solve. First, penmanship is entirely unique to the individual. Second, because it’s historical handwriting, it’s in cursive. All the letters run together, adding another layer of complexity. Third, the way we wrote cursive in the 1700′s is different than the cursive we write now. There are even variations between decades. Our mind has an incredible capability of seeing through incomplete sets of data (a missing character stroke, poor handwriting, an A that sort of looks like an O, etc). Our brains do all of this for us and we don’t even notice it. When you think about how to describe this to a computer, you begin to lose your mind! I believe some of the greatest problems mankind can solve are those that someone would never have started if they had known how hard the challenge was ahead of time. Matt fooled himself just enough to start on the problem and now he’s making real progress from which we are all going to benefit.

Here’s the exciting part: Our recognition technology is starting to work. With limited vocabularies (potential answers), we’re achieving 90-95% accuracy. Sometimes, the technology is able to read things we’re convinced are unreadable (but after getting the answer back from the computer, you realize what was actually written). We grow closer to the Holy Grail every day and can’t wait until we can use the technology to bring more content online, free forever.


- More Here

No comments: