How to use the Predict endpoint with Postman
Objective
In the previous article, we reviewed how to set up Postman to use a Mindee API Builder API. Each custom API you build and deploy with Mindee has 2 endpoints: /predict
and /feedbacks
. In this article, we’ll deep dive into the /predict
endpoint of your API.
Prerequisites
See previous article
Endpoint overview
The /predict
endpoint takes a document as an input and returns predictions (made by the backing machine learning model) of all fields as defined in your API data model.
Calling the Predict endpoint
In Postman, select the Predict endpoint and select the Body tab of the request.
Change the type of the file key from Text to File:
In the VALUE column, press the Select Files button and pick an image from the sample training set we used to train the Burger Stores and Menus model (try to use an image that wasn’t used to train the model):
Before calling the endpoint, we must perform one last update.
By default, all the Mindee custom APIs are configured to call version 0 of the /predict
endpoint. Version 0 is the version that’s deployed right after you press the start training button, which means it has no backing machine learning model to make predictions of the most likely candidates for your configured fields. As a result, the /predict
endpoint only returns candidates for each field, i.e. the list of potential values for that field, based on its type and optional constraints (such as “it’s an integer only”).
Since we’ve already generated a first model, we can therefore use a more recent version of our Burger Stores and Menus API. Where can you find the most recent version of the API you can use? Head over the Training section of your API and make a note of the active version:
You can also find all the currently available versions in the Params section of your Predict request in Postman:
In the screen above, update the VALUE column of the version key to v1 (instead of v0):
Press the Send button and observe the response returned by the /burger_stores
API (screenshot below with folded pages.candidates and predictions nodes):
Note: this is the response structure of the beta release, which will likely evolve in the RTM release.
Structure of the /predict endpoint response
The pages.candidates node is an array where each element of the array represents the page number. For instance, pages.candidates[0] represents candidates on the first page, pages.candidates[1] on the second page and so on (the beta thus outputs per-page candidates).
Candidate node structure
The structure of each candidate
node is similar to the following one:
In our cast, there is only one such node (since there’s only one page in the document we sent to the API), but you can see that all the fields defined in the data model have an entry in each candidate
node.
This makes sense since the ML model can identify candidates for a single field in multiple pages. And that's a desirable behavior.
Candidate field node structure
Let’s drill down further into a field, for instance the name field. Here is an example of a typical structure:
Each field is an array since there can be multiple candidates for that field on the same page.
Each field candidate has a content
attribute, the value of which is the OCR’ed representation of the parsed text.
The key
attribute is the attribute expected by the /feedbacks
endpoint when you want to pass a feedback back to Mindee (typically in case when the ML model made an incorrect prediction) so that it will learn from its mistakes and (hopefully) won’t make it again.
Last, the segmentation node represents the X,Y coordinates of the blue box (called the bounding_box in the API response).
For instance, the bounding box coordinates below:
represent the following blue box (read as “III” when doing a 90-degrees rotation):
Predictions node structure
Let’s switch to the predictions
node, probably the most interesting one for you, since it holds the extracted values the ML model deems the most probable for the document you are trying to parse.
Here is a typical structure:
As expected, the API response key of each field (as defined earlier in the Burger Stores and Menus API tutorial) shows up as an array in the predictions
node (note that the predictions node is a singleton, not an array, because in the beta we return one set of predictions per document.
Note that the average_rating
field is empty. This means the model wasn’t able to find any likely value for that field. Since it obviously exists in the document we submitted, we will see in the next article how we can use the /feedbacks
endpoint to submit the correct value for that field.
Prediction field node structure
A typical structure for a multi-valued field (i.e. a field composed of multiple, consecutive boxes) is the following:
In this case, the first 3 boxes of the address field (as detected by the ML model) are “American” “.” and “Fast” as available in the content
keys above.
Each element in the predicate field array also contains:
- a
page_id
attribute (referencing the 0-based page number where the element was found), - a
relative_vertices
array of (X,Y) coordinates (similar to the bounding_box element discussed above), - a
key
attribute that uniquely identifies the element (and matches its corresponding candidate element in thecandidates
array discussed above)
Last but not the least, the input_uuidd
attribute contains an id if you call the /predict
endpoint with the feedback=true
parameter:
The value of the input_uuid
attribute should be used if you want to send feedback to your API.
That's it for this long but hopefully useful article. We’ll explore in the next tutorial how we can use our knowledge of the /predict
endpoint to call the /feedbacks
endpoint.