How to use the Predict endpoint with Postman

 

Objective

In the previous article, we reviewed how to set up Postman to use a Mindee API Builder API. Each custom API you build and deploy with Mindee has 2 endpoints: /predict and /feedbacks. In this article, we’ll deep dive into the /predict endpoint of your API. 

 

Prerequisites

See previous article

 

Endpoint overview

 

The /predict endpoint takes a document as an input and returns predictions (made by the backing machine learning model) of all fields as defined in your API data model.

 

Calling the Predict endpoint

 

In Postman, select the Predict endpoint and select the Body tab of the request.

 

Change the type of the file key from Text to File:

 

 

In the VALUE column, press the Select Files button and pick an image from the sample training set we used to train the Burger Stores and Menus model (try to use an image that wasn’t used to train the model):

 

Postman - File type in Body

 

Before calling the endpoint, we must perform one last update.

By default, all the Mindee custom APIs are configured to call version 0 of the /predict endpoint. Version 0 is the version that’s deployed right after you press the start training button, which means it has no backing machine learning model to make predictions of the most likely candidates for your configured fields. As a result, the /predict endpoint only returns candidates for each field, i.e. the list of potential values for that field, based on its type and optional constraints (such as “it’s an integer only”).

 

Since we’ve already generated a first model, we can therefore use a more recent version of our Burger Stores and Menus API. Where can you find the most recent version of the API you can use? Head over the Training section of your API and make a note of the active version:

 

Mindee API Builder - Active version

 

You can also find all the currently available versions in the Params section of your Predict request in Postman:

 

Mindee API Builder API currents available versions in Postman

In the screen above, update the VALUE column of the version key to v1 (instead of v0):

 

Mindee API Builder v1 version configured in Postman

 

Press the Send button and observe the response returned by the /burger_stores API (screenshot below with folded pages.candidates and predictions nodes):

 

Mindee API Builder - Predict response

Note: this is the response structure of the beta release, which will likely evolve in the RTM release.

 

Structure of the /predict endpoint response

 

The pages.candidates node is an array where each element of the array represents the page number. For instance, pages.candidates[0] represents candidates on the first page, pages.candidates[1] on the second page and so on (the beta thus outputs per-page candidates).

 

Candidate node structure

 

The structure of each candidate node is similar to the following one:

 

Mindee API Builder - Predict candidates in response

In our cast, there is only one such node (since there’s only one page in the document we sent to the API), but you can see that all the fields defined in the data model have an entry in each candidate node.

This makes sense since the ML model can identify candidates for a single field in multiple pages. And that's a desirable behavior.

 

Candidate field node structure

 

Let’s drill down further into a field, for instance the name field. Here is an example of a typical structure:

 

API Builder - Predict response - Name candiidate field

Each field is an array since there can be multiple candidates for that field on the same page.

Each field candidate has a content attribute, the value of which is the OCR’ed representation of the parsed text. 

The key attribute is the attribute expected by the /feedbacks endpoint when you want to pass a feedback back to Mindee (typically in case when the ML model made an incorrect prediction) so that it will learn from its mistakes and (hopefully) won’t make it again.

Last, the segmentation node represents the X,Y coordinates of the blue box (called the bounding_box in the API response).
 

For instance, the bounding box coordinates below:

 

Mindee Bounding Box

 

represent the following blue box (read as “III” when doing a 90-degrees rotation):

 

Bounding box example on document

 

Predictions node structure

Let’s switch to the predictions node, probably the most interesting one for you, since it holds the extracted values the ML model deems the most probable for the document you are trying to parse.

 

Here is a typical structure:

 

Mindee API Builder - Predict Response - Predictions node

As expected, the API response key of each field (as defined earlier in the Burger Stores and Menus API tutorial) shows up as an array in the predictions node (note that the predictions node is a singleton, not an array, because in the beta we return one set of predictions per document.

 

Note that the average_rating field is empty. This means the model wasn’t able to find any likely value for that field. Since it obviously exists in the document we submitted, we will see in the next article how we can use the /feedbacks endpoint to submit the correct value for that field.

 

Prediction field node structure

 

A typical structure for a multi-valued field (i.e. a field composed of multiple, consecutive boxes) is the following:

 

Mindee API Builder - Predict Response - Address field

 

In this case, the first 3 boxes of the address field (as detected by the ML model) are “American” “.” and “Fast” as available in the content keys above.

 

Each element in the predicate field array also contains:

  • a page_id attribute (referencing the 0-based page number where the element was found), 
  • a relative_vertices array of (X,Y) coordinates (similar to the bounding_box element discussed above),
  • a key attribute that uniquely identifies the element (and matches its corresponding candidate element in the candidates array discussed above)

 

Last but not the least, the input_uuidd attribute contains an id if you call the /predict endpoint with the feedback=true parameter:

 

Input UUID screenshot

The value of the input_uuid attribute should be used if you want to send feedback to your API.

That's it for this long but hopefully useful article. We’ll explore in the next tutorial how we can use our knowledge of the /predict endpoint to call the /feedbacks endpoint.