Searchable OCR with Mindee x Algolia


Using Mindee and Algolia together can make any large collection of physical documents searchable and easy to consume. There are many contexts in which this can be useful at work– consider indexing your contracts, invoices, or your research papers– but for now, let’s talk about Pokémon cards.

Suppose you collect Pokémon cards: you buy, sell and actively compete in tournaments with them. Building a deck for competition takes a combination of strategy and knowledge about what cards you have available to you. If you have a large collection, say 10,000 cards, it’s impossible to know your inventory in your head– you might want to be able to search for answers to questions like, “how many of my Pokémon know the Fire Spin attack?”, or,  “If I have to sell one of my cards, which of them has the highest rarity index?”

Answering these types of questions requires an understanding of the game and a way to extract and index the right information from the cards. This is where Mindee shines.

1. Create and train your model with Mindee’s Document Builder API

Let’s assume that as a careful collector of Pokémon cards, you’ve taken photographs of all the cards in your collection. With those in hand, you’re ready to get started!

In your Mindee account, click “Create new API” and give it a name, e.g. “Pokémon Classifier.”

Next, you’ll need to identify and name the fields you’d like Mindee to scan for when reading your card. Mindee can recognize specific types of data, like emails, phone numbers, and URLs, as well as generic text fields, number, and dates. For our purposes, we might want to know the following about our Pokémon:

  • Name - text field
  • Hit points - number
  • Attack name - text field 
  • Attack effect - number
  • Rarity index - number

Using the Mindee API builder UI, you can define the data you’re looking for:

Next, we’ll click the button to start training the model with real Pokémon cards. You’ll need to use at least 20 images to begin training the API to recognize the fields on the cards.

The UI will prompt you to click through on fields that are potential matches. Here we see that while looking for the “attack effect” field, Mindee has found a few candidates: 40 (our HP), 20 (our attack effect) and other integers on the card like the copyright date. You can point the model in the right direction with a few clicks. After you’ve annotated 20, the training will go into effect, and you can start using the /predict endpoint for your API.

2. Use your API to extract the relevant text from your images

Every custom API you build and deploy with Mindee has one endpoint: /predict. Your endpoint might look something like this:

https://api.mindee.net/v1/products/username/pokemon_cards/v1/predict
 

This endpoint takes in form data as a file object, URL, or base64-encoded image. The API response when you pass in a Pokémon card will include something like this:

{
  ...
  "document": {
    "annotations": {
      "labels": []
    },
    "id": "cdbe3064-4d9e-4857-ab0f-bfd21db78632",
    "inference": {
      "finished_at": "2021-07-28T02:31:54+00:00",
      "pages": [
        {
          "id": 0,
          "prediction": {
            "attack_effect": {
              "confidence": 0.99,
              "values": [
                {
                  "confidence": 0.99,
                  "content": 75,
                  "polygon": [
                   ...
                  ]
                }
              ]
            },
            ...
        ...
}

Mindee sends you great information here about how your model is working, where on the card it’s predicting the content is, and its confidence. But we need to pare it down to make it useful for Algolia to ingest.

As our model gets better and better, we can more confidently cut out the information on how Mindee made its prediction, and extract just the simple set of key-value pairs from this response– e.g. "attack_effect": 75 - and construct a simple JSON file as output.

3. Send your data to Algolia

There are many ways to send data to Algolia– manually using their GUI uploader, programmatically using one of their API clients, or using their crawler. Since the card collection isn’t live on the web anywhere Algolia can’t fetch it (we’re very secretive about what cards we have), we’ll use one of the first two options.

Once you’ve signed up for an Algolia account and created an application, you’ll be able to use your dashboard to configure an index:

If you’ve saved your output from Mindee as a JSON file or set of JSON files, you can upload them here; alternatively you can use the saveObjects method in one of their client libraries to create a record on your index. Every card should be uploaded as a record.

After you’ve run an initial batch of cards through this process, additions are straightforward: 

  1. Send an image of a new card to your Mindee /predict endpoint
  2. Clean up the JSON response to match your Algolia indices
  3. Pass that response to Algolia using the saveObjects method

New additions will appear in search as you add them.

4. Profit

Now, to make this really useful, we want to be able to browse the search results. Algolia provides pre-built widgets that you can add to web and mobile apps using their client libraries. Now, our Pokémon card collector can easily search for cards to sell, trade, or use in decks based on what they need.

We’ve talked about Pokémon cards because they’re fun, but you can turn any type of document into structured data using Mindee and pass the results to Algolia for search– this method works just as well for enabling your finance team to search over expense report receipts, or [other use case here]. 
 

Join our Slack

 

Photo by Thimo Pedersen on Unsplash