How to use the Predict endpoint with Postman
In the previous article, we reviewed how to set up Postman to use a Mindee API Builder API. Each custom API you build and deploy with Mindee has 2 endpoints:
/feedbacks. In this article, we’ll deep dive into the
/predict endpoint of your API.
See previous article
/predict endpoint takes a document as an input and returns predictions (made by the backing machine learning model) of all fields as defined in your API data model.
Calling the Predict endpoint
In Postman, select the Predict endpoint and select the Body tab of the request.
Change the type of the file key from Text to File:
In the VALUE column, press the Select Files button and pick an image from the sample training set we used to train the Burger Stores and Menus model (try to use an image that wasn’t used to train the model):
Before calling the endpoint, we must perform one last update.
By default, all the Mindee custom APIs are configured to call version 0 of the
/predict endpoint. Version 0 is the version that’s deployed right after you press the start training button, which means it has no backing machine learning model to make predictions of the most likely candidates for your configured fields. As a result, the
/predict endpoint only returns candidates for each field, i.e. the list of potential values for that field, based on its type and optional constraints (such as “it’s an integer only”).
Since we’ve already generated a first model, we can therefore use a more recent version of our Burger Stores and Menus API. Where can you find the most recent version of the API you can use? Head over the Training section of your API and make a note of the active version:
You can also find all the currently available versions in the Params section of your Predict request in Postman:
In the screen above, update the VALUE column of the version key to v1 (instead of v0):
Press the Send button and observe the response returned by the
/burger_stores API (screenshot below with folded pages.candidates and predictions nodes):
Note: this is the response structure of the beta release, which will likely evolve in the RTM release.
Structure of the /predict endpoint response
The pages.candidates node is an array where each element of the array represents the page number. For instance, pages.candidates represents candidates on the first page, pages.candidates on the second page and so on (the beta thus outputs per-page candidates).
Candidate node structure
The structure of each
candidate node is similar to the following one:
In our cast, there is only one such node (since there’s only one page in the document we sent to the API), but you can see that all the fields defined in the data model have an entry in each
This makes sense since the ML model can identify candidates for a single field in multiple pages. And that's a desirable behavior.
Candidate field node structure
Let’s drill down further into a field, for instance the name field. Here is an example of a typical structure:
Each field is an array since there can be multiple candidates for that field on the same page.
Each field candidate has a
content attribute, the value of which is the OCR’ed representation of the parsed text.
key attribute is the attribute expected by the
/feedbacks endpoint when you want to pass a feedback back to Mindee (typically in case when the ML model made an incorrect prediction) so that it will learn from its mistakes and (hopefully) won’t make it again.
Last, the segmentation node represents the X,Y coordinates of the blue box (called the bounding_box in the API response).
For instance, the bounding box coordinates below:
represent the following blue box (read as “III” when doing a 90-degrees rotation):
Predictions node structure
Let’s switch to the
predictions node, probably the most interesting one for you, since it holds the extracted values the ML model deems the most probable for the document you are trying to parse.
Here is a typical structure:
As expected, the API response key of each field (as defined earlier in the Burger Stores and Menus API tutorial) shows up as an array in the
predictions node (note that the predictions node is a singleton, not an array, because in the beta we return one set of predictions per document.
Note that the
average_rating field is empty. This means the model wasn’t able to find any likely value for that field. Since it obviously exists in the document we submitted, we will see in the next article how we can use the
/feedbacks endpoint to submit the correct value for that field.
Prediction field node structure
A typical structure for a multi-valued field (i.e. a field composed of multiple, consecutive boxes) is the following:
In this case, the first 3 boxes of the address field (as detected by the ML model) are “American” “.” and “Fast” as available in the
content keys above.
Each element in the predicate field array also contains:
page_idattribute (referencing the 0-based page number where the element was found),
relative_verticesarray of (X,Y) coordinates (similar to the bounding_box element discussed above),
keyattribute that uniquely identifies the element (and matches its corresponding candidate element in the
candidatesarray discussed above)
Last but not the least, the
input_uuidd attribute contains an id if you call the
/predict endpoint with the
The value of the
input_uuid attribute should be used if you want to send feedback to your API.
That's it for this long but hopefully useful article. We’ll explore in the next tutorial how we can use our knowledge of the
/predict endpoint to call the