Getting started
Get started with receipt parsing in python.
1. Install the mindee python helper library
Install from PyPi using pip, a package manager for Python.
pip install mindee
Don't have pip installed? Try installing it, by running this from the command line:
$ curl https://bootstrap.pypa.io/get-pip.py | python
Getting started with the Mindee API couldn't be easier. Create a Client
and you're ready to go.
2. Instantiate your Client
The mindee.Client
needs your API credentials. You can either pass these directly to the constructor (see the code below) or via environment variables.
Depending on what type of document you want to parse, you need to add specifics auth token for each endpoint.
from mindee import Client
mindee_client = Client(
expense_receipt_token="your_expense_receipts_api_token_here",
raise_on_error=True
)
- expense_receipt_token: (string) API key for expense_receipts endpoint
- raise_on_error: (boolean, default True) Specify wheter or not raising an Exception when HTTP errors occur. If set to False, when calling the parse_receipt method, the
Response.receipt
object returned will have be set to None and theResponse.receipts
list will be empty.
We suggest storing your credentials as environment variables. Why? You'll never have to worry about committing your credentials and accidentally posting them somewhere public.
3. Parse a receipt
Using the parse_receipt method from your client object, you can pass any image or pdf file to get the receipt data.
from mindee import Client
mindee_client = Client(
expense_receipt_token="your_expense_receipts_api_token_here",
raise_on_error=True
)
parsed_data = mindee_client.parse_receipt("/path/to/file")
Input types
You can pass your input file in three ways:
From file path
receipt_data = mindee_client.parse_receipt('/path/to/file', input_type="path")
From a file object
with open('/path/to/file', 'rb') as fp:
receipt_data = mindee_client.parse_receipt(fp, input_type="file")
From a base64
receipt_data = mindee_client.parse_receipt(base64_string, input_type="base64")
receipt_data structure
The receipt_data object returned by the parse_receipt
method contains the following elements:
receipt_data.receipt
The receipt attribute is the Receipt object constructed by gathering all the pages into a single document. The method used for creating a single receipt object with multiple pages relies on fields confidence scores. Basically, we iterate over each page and for each field, we keep the one that has the highest probability.
receipt_data.receipt # returns a unique object from class Receipt
receipt_data.receipts
For multi pages pdf, the receipts attribute contains one receipt object per input. For a single-page pdf or an image, the list will only contain 1 item, and for multi-page pdf, it will contain one item per page.
receipt_data.receipts # [Receipt, Receipt ...]
receipt_data.http_response
Contains the full Mindee API HTTP response object in JSON format
receipt_data.http_response # full HTTP request object
4. Display the results
You only have to print your receipt object to display the different extracted fields:
from mindee import Client
mindee_client = Client(
expense_receipt_token="your_expense_receipts_api_token_here",
raise_on_error=True
)
receipt_data = mindee_client.parse_receipt("/path/to/receipt")
print(receipt_data.receipt)