Multi pages pdfs

 

You might want to know how to handle multi pages pdfs for many reasons:

 

  1. How to handle invoices pdfs with a high number of pages
  2. How to get a single invoice result for a multi pages pdf invoice
  3. How to get an invoice result for each page

 

 

How to handle invoices pdfs with a high number of pages

 

For performances purposes, Mindee APIs don't allow pdf files as input if they contains too many pages. The parse_invoice method includes a  helper for cutting pdf pages in order to perform requests whatever the number of pages in the pdf.

 

from mindee import Client

mindee_client = Client(invoice_token="your_invoices_api_token_here")

invoice_data = mindee_client.parse_invoice("./sample.pdf", cut_pdf=True)

 

 

cut_pdf: (Bool, default True) If set to True and the input file is a pdf with more than 3 pages, the pdf will be automatically reconstructed as a 3 pages pdf including the first and the two last pags. If set to False, if the pdf input file contains too many pages, an Exception will be thrown.

 

 

How to get a single invoice result for a multi pages pdf invoice

 

Each response from mindee parsing methods includes a reconstructed document from all the pages. To construct this object, the mindee.Response class will loop over each page's result and get for each field the maximum confidence score result. Finally it will construct the Invoice object and you can access it this way:

 

from mindee import Client

mindee_client = Client(invoice_token="your_invoices_api_token_here")

invoice_data = mindee_client.parse_invoice("./3pages.pdf", cut_pdf=True)

single_invoice_for_multipages = invoice_data.invoice

 

 

 

How to get an invoice result for each page

 

If you know that your multi pages pdf file contains a single invoice for each page, you can still access a mindee.Invoice object for each pages by doing:

 

from mindee import Client

mindee_client = Client(invoice_token="your_invoices_api_token_here")

invoice_data = mindee_client.parse_invoice("./3pages.pdf", cut_pdf=True)

for invoice in invoice_data.invoices:
    print(invoice)