Turn any type of document into structured data

Doug Sillars

Last updated on

Apr 21, 2021

min. read

Try Mindee for free

Mindee’s machine learning APIs are a very convenient and powerful way to extract key information from documents. Our APIs allow you to quickly and accurately extract information from invoices, receipts, driver licenses, and more! If you’re looking for resources on how to extract data from common files, our APIs are straightforward and easy to implement.

But, standardized documents are great – for standard processes. But how many of the files we use every day are standard? Every company and every file is slightly different.

As we spoke to more and more customers about our data extraction APIs, it turned out that most of the applications that would benefit from our APIs have unique data extraction needs, and that to fully incorporate our automation tools into your toolchain would require the creation of a unique data model for each of your documents. Sounds near impossible, but….

We’re very happy to announce our Document Builder API. We can use this to train a model for any document and in just a few hours, begin extracting the data from the model!!

Spoiler: It’s Magic, but it’s not TOTALLY magic

Since the release of the Document Builder API, we’ve had a lot of exciting conversations about creating a bespoke API for each process. But, when the wheels hit the road, there was a realization. This isn’t a magical tool – there is work that has to be done first.

It is magic….but it is not totally magic. If you are planning to run a marathon, you can’t just lace up your shoes and run 42km. There is a lot of training before you can complete a marathon. In fact, there are guides and plans to help you train. We cannot (yet) just give the algorithm one file, and some criteria, and have it all work. Just like marathon training, we have to put in some effort on the front end to see the benefits from the API.

Think of this post as your guide to training your Mindee Document Builder model. I promise that training your Mindee model will be a lot easier than training for a marathon, will take a lot less time, and won’t cause any aches in your knees.

The training process

Let’s start planning the API Builder model we’d like to build. For this example, we’ll train a model to read the W-9 Tax form from the USA. We’ll build the API to extract the name, address (street address, city, state, and Zipcode), and Social Security Number from each form.

To make this fun, I’ve generated 22 W-9 forms for characters from the Harry Potter series.

The Mindee API Builder requires 20 images to be trained before any predictions can be made. There will be an initial model training, and you can begin to use the API to get results. It may not be perfect, but it will begin to work. Think of this as your first marathon attempt – you’ll finish, but you’ll learn from it, and use that knowledge to improve. The API results will be good, but they won’t yet be perfect. As you continue to train the model, it will get more and more accurate. Every 20 images trained will have the model retrain itself (40, 60, 80, etc) and after each training, you’ll see a marked improvement in how the model works. Let’s see it in action!

Preparing for training

You can follow these steps, and also follow along in this video:

We have our list of documents, so let’s begin building our model:

Step 1: Create an account

Step 2: Create a new API

Step 3: Name your API, and give it a description and an image:

Step 4: Now we are getting to the fun part, defining the model.

The W-9 form is used to identify each person for who you will be withholding tax for. You’ll want to extract:

Name
Street address
City
State
Zip code
Social Security Number

Identifying and naming the fields

Now we will build the model for the training. You can use many types of fields for each entry:

For the W9 forms, we’ll call them all text fields.

Each text field has a name (and the API key), and you can define whether or not it has numbers and characters:

For example:

Name: never contains numerics
Street address (can have both alpha and numeric values)
City: never contains numerics
State: never contains numerics
Zipcode: never contains alpha characters
SSN: never contains alpha characters

Once these are entered, the data model looks like this:

And now we are ready to train the model

Training the Document Builder Model

If you’d like to watch a video of the training:

A wise runner once said, just keep putting one foot in front of the other, and you’ll make it to the finish line. Some runs are a slog and not fun. This is the ‘not fun’ part. We’ve got to do the training so that we (and our model) are ready for production.

You can upload images (jpg, png, webp) or pdf files. Or if you have a number of files ready, you can upload a zip file.

As the files load, you can begin your training. Here is the W-9 for Tom Riddle:

Each word that fits the parameters for the name field (highlighted on the right) is marked by a blue box. Zoom with your trackpad or mouse, and you can click on the boxes that contain the name. If you accidentally click the incorrect box, you’ll see an “x” that removes it from the field on the right.

Tip: the training does not care what order you click on the words. “Tom Riddle” and “riddle Tom” are treated exactly the same way.

Continue for each field in the document, and when you complete, click the “Validate” button.

Once you train the model with 20 documents, the API goes into training mode. You’ll get an email when it completes its training. (This will happen at 40, 60, 80 documents trained as well)

So – we train the model with 20 images, and Mindee tells us that the model training is occurring. Let’s see how the model does!

a little bit later…

Watch the video showing the results of the API training:

After a few minutes, you’ll get an email that the model has been trained on the first 20 images. We’ve trained and trained, and now we can try to see if we’re ready for our marathon.

The 21st image you upload will be tested against the model, and we can see how well it is doing. In this case we’re looking at the W-9 of Cornelius Fudge (senior, we all know that junior is rotting in Azkaban)

With training just 20 documents, our W-9 API is extracting all of the fields to a high degree of accuracy!

Conclusion

You can build your own Document Builder API for your unique form or document. With just a little bit of training, you will have created an API that can be used in production to a high degree of accuracy!

Give it a try. It is free to try out, and we’d love to hear what you think!

Get started with basics

About

From simple photos to complex PDFs or handwritten files, Mindee's API turn your document data into structured JSON with high‑reliability. Zero model training required. Any alphabets, any languages supported.

Explore platform

Turn any type of document into structured data

Table of Contents

Related Articles