What is OCR
OCR is short for Optical Character Recognition
. This process is used to recognize the visual representation of text
, for example in an image. Based on this, OCR can create actual text
that can then be edited, copied, changed, etc. It works very well with typed and printed text, and only on very rare occasions with handwritten text.
How optical character recognition works
OCR can work in two ways: one character at a time
or one word at a time. The former is the one most commonly used since the latter requires the language to separate words using a space.
In the beginning, OCR processors were trained to recognize single characters
in a specific font. By now, most sans and serif fonts are known to and can be recognized by OCR. Even crooked scans and images that are not 100% straight are interpreted fairly well. This is thanks to the pre-processing
many OCR programs do. It includes deskewing and despeckling, turning the scan or image into grayscale, and more.
Optical character recognition use cases
Why would you even need or want to use OCR? Here are a few common use cases:
- Create notes based on lecture and presentation slides you took a photo of
- Grab text from documents that were scanned as images
- Digitize your paperwork and make it searchable for invoice numbers or the like
How to use OCR
- Go to the PDF to Word converter of PDF2Go
- Upload your file via drag & drop or uploading it from your hard drive, Dropbox or Google Drive