Your email address belongs to a school and you are eligible for a free educational premium account.
Learn more

OCR - Optical Character Recognition

What is OCR

OCR is short for Optical Character Recognition. This process is used to recognize the visual representation of text, for example in an image. Based on this, OCR can create actual text that can then be edited, copied, changed, etc. It works very well with typed and printed text, and only on very rare occasions with handwritten text.

How optical character recognition works

OCR can work in two ways: one character at a time or one word at a time. The former is the one most commonly used since the latter requires the language to separate words using a space. In the beginning, OCR processors were trained to recognize single characters in a specific font. By now, most sans and serif fonts are known to and can be recognized by OCR. Even crooked scans and images that are not 100% straight are interpreted fairly well. This is thanks to the pre-processing many OCR programs do. It includes deskewing and despeckling, turning the scan or image into grayscale, and more.

Optical character recognition use cases

Why would you even need or want to use OCR? Here are a few common use cases:
  • Create notes based on lecture and presentation slides you took a photo of
  • Grab text from documents that were scanned as images
  • Digitize your paperwork and make it searchable for invoice numbers or the like

How to use OCR

  1. Go to the PDF to Word converter of PDF2Go
  2. Upload your file via drag & drop or uploading it from your hard drive, Dropbox or Google Drive
  3. CONTINUE