Invoice OCR stands for Invoice Optical Character Recognition. Invoice OCRs are software technologies transforming unstructured invoices, such as pdfs or images, into structured data. These technologies are mainly used to automate invoice scanning in order to reduce the need of manual data entry.
OCR refers to technologies capable of detecting and reading text from images or documents in order to transform them into machine-readable format. More details in our blog.
The acronym OCR commonly refers to the generic problem of text detection and recognition. When associated with a type of document such as Invoice OCR, the meaning slightly changes, as it refers to technologies performing key information extraction and not generic text extraction. It's common to use this with any document type. Examples: Receipt OCR or Passport OCR.
Invoice scanning refers to the whole process of collecting and processing invoices using software. Sometimes, the processing of the invoice includes an Invoice OCR in order to extract data from the unstructured invoice, that can be validated by a human or not.
Automated invoice parsing is becoming essential in many industries, particularly financial services. Invoice OCRs tend to reduce errors, and optimize the processing time of invoices in accounting, accounts payable or receivable, procurement... More generally, in any workflow involving the payment, validation, or analysis of invoices.
Like humans, our algorithms donβt need to read all the document text in its language to extract the relevant information
Transform any scan, photos, or native pdf invoices into usable data in your software
Customer and Supplier information
Three data are extracted for both the customer and the Supplier:
Amounts
Each amount is returned in the currency of the invoice.
Invoice identifiers
Geography Information
Dates
Each date is returned in ISO format YYYY-MM-DD
Payment details
Supplier payment details are extracted as a list of objects that includes necessary information for payments:
Choosing the right Invoice OCR technology to use for your application can be a heavy task. In most of the use cases, criteria like extraction accuracy, precision, response time, integration time, pricing, scalability... should be taken into account in order to maximize the added value in your software. Feel free to contact us if you don't find the answers to your questions below.
The invoice OCR API available to any user having an account on our platform and includes a free plan.
To test our APIs, you only have to create a free account using this link, and you'll be able to upload invoices in our user interface to see invoice OCR in action, as well as the json output. A demo page is also available here.
A free plan is available to everyone and allows you to perform 250 pages processing per month for free. No credit card is required.
Above 250 pages per month, the price per invoice page processed starts at $0.10 and can decrease to $0.01 per page depending on the monthly volume. See the pricing page for more information.
Our invoice recognition API is based on our computer vision technology that doesn't rely on text to extract the invoice data, but only on the image. This removes language limitations.
The OCR was trained with invoices from over 50 countries, ensuring that you can extract data from your invoices regardless of where they were created.
Mindee's API follows HTTP standards in order to allow any developer to integrate the invoice OCR API into their applications easily.
We also offer a set of client libraries in all the main back-end languages, and an open-source UI toolkit that helps create front-end features. You can check out our open-source repository or our API documentation for more details.
Our invoice OCR's accuracy is above 90%, with precision above 95% for most of the fields. These performances are computed on a data set including more than 50 countries.
Testing our OCR API is free, all you need is an account. Feel free to drag and drop invoices in the live interface to see the OCR performance on your data.
The processing time is around 1.3 seconds per page for pdfs and 0.9 seconds for a invoice image.
We often improve this processing time and our target is below 500ms. Our goal is to make sure you can create real-time user experiences in your application.
Yes, we trained our Invoice OCR to process invoices from a large number of different layouts, even the ones with the most complex formatting.
We also use data augmentation to ensure that no blur or ink stains prevent our system from reading the data as long as it's readable.
We have a Slack community where you can ask your questions and chat with our team.
We don't do the integration in your infrastructure ourselves but e can set up a custom level of support on a per-case basis if needed.