To gather this set, we asked a small percentage of users whether they would donate some of their image files for us to improve our algorithms. We began by collecting a representative set of donated document images that match what users might upload, such as receipts, invoices, letters, etc. Our initial task was to see if we could even build a state of the art OCR system at all. We will take you through each of these steps in turn.
Perhaps the most important reason for building our own system is that it would give us more control over own destiny, and allow us to work on more innovative features in the future.
OCR FONT COMMERCIAL USE HOW TO
The last few years has seen the successful application of deep learning to numerous problems in computer vision that have given us powerful new tools for tackling OCR without having to replicate the complex processing pipelines of the past, relying instead on large quantities of data to have the system automatically learn how to do many of the previously manually-designed steps. The process to build these OCR systems was very specialized and labor intensive, and the systems could generally only work with fairly constrained imagery from flat bed scanners. Most methods rely on binarization of the input image as an early stage, and this can be brittle and discards important cues. For example, one module might find lines of text, then the next module would find words and segment letters, then another module might apply different techniques to each piece of a character to figure out what the character is, etc.
Traditionally, OCR systems were heavily pipelined, with hand-built and highly-tuned modules taking advantage of all kinds of conditions they could assume to be true for images captured using a flatbed scanner. In fact, a sea change has happened in the world of computer vision that gave us a unique opportunity. Thus, there might be an opportunity for us to improve recognition accuracy. Second, the commercial system was tuned for the traditional OCR world of images from flat bed scanners, whereas our operating scenario was much tougher, because mobile phone photos are far more unconstrained, with crinkled or curved documents, shadows and uneven lighting, blurriness and reflective highlights, etc. Once we confirmed that there was indeed strong user demand for the mobile document scanner and OCR, we decided to build our own in-house OCR system for several reasons.įirst, there was a cost consideration: having our own OCR system would save us significant money as the licensed commercial OCR SDK charged us based on the number of scans. This meant integrating the commercial system into our scanning pipeline, offering both features above to our business users to see if they found sufficient use from the OCR. When we built the first version of the mobile document scanner, we used a commercial off-the-shelf OCR library, in order to do product validation before diving too deep into creating our own machine learning-based OCR system.
This process extracts actual text from our doc-scanned image. Hence the need to apply Optical Character Recognition, or OCR. Our mobile document scanner only outputs an image - any text in the image is just a set of pixels as far as the computer is concerned, and can’t be copy-pasted, searched for, or any of the other things you can do with text. The document scanner makes it possible to use your mobile phone to take photos and "scan" items like receipts and invoices. In previous posts we have described how Dropbox’s mobile document scanner works.