Passport Optical Character Recognition (OCR) refers to the set of software technology capable of transforming photos or scans of passports into meaningful machine-encoded data. It allows developers to build automated passport scanning features in software applications.
Meaningful information such as the name, passport number, or date of birth can be found in two different areas in the passport: the body and the MRZ. All the data are included in the body of the passport while the MRZ only contains a subset of them. The MRZ is the zone including two lines of characters. It includes only the most important passport data and historically, technologies have only relied on this area to capture information from passport images. Our passport OCR reads the two pieces of information in order to maximize the extraction performances.
Know Your Customer (KYC) has become an important part of business processes in many industries. Optimizing these processes is one of the key challenges of the decade because of the criticality of the task. OCR technologies reveal playing an important role in solving this challenge, as they can help reduce manual data entry, avoid mistakes and save time.
Passport scanning refers to the process of creating a digital image or pdf out of the physical passport and processing it. Most of the time, passport pictures are taken from a smartphone or scanned from a printer. In both cases, the processing technology has to deal with image noise like motion blur, scanner artifacts, or low-quality cameras.
The need of reading passports automatically didn't emerge with software, but long before. The first machine-readable passports were issued in the '80s. The font used was designed a decade before, with the constraint of being easily readable by machines. This monospace typeface enhances the distinctness of each character, making them easier to be recognized by machines.
Like humans, our algorithms don’t need to read all the document text in its language to extract the relevant information