Set up a string field

 

 

You want to extract string fields from your documents (full name, addresses, place..) ?

 

You can do that with the API Builder by creating a string field in the Data Model configuration step. See Data Model configuration tutorial.

 

This section will walk you through the implications of setting up a string field and the different parameters you can specify.

 

 

Prerequisites

 

  1. You’ll need a free beta account. Sign up and confirm your email to login.

 

 

 Let’s get started! 

 

After giving a name, description and cover image to your new API, you should land on the following page:

 

 

 

 

Build a string field by filling in the right-side form and selecting “String” in the Field Type drop-down menu.

 

 

Additional parameters

 

If you have additional context about this specific field and how the information is usually displayed in your documents, you can specify it during the field set up.

 

 

Feed parameters through UI

 

Through the UI, you may check two different checkboxes:

  • It never contains any alpha characters. By checking this, the engine will never be able to predict a word or a list of words containing alpha characters as the correct value for this field.
  • It never contains any numeric characters. By checking this, the engine will never be able to predict a word or a list of words containing numeric characters as the correct value for this field.


 

Feed parameters through config.json file

 

You may want to create a String field through by uploading a config.json file. See Data Model tutorial for more info

 

Doing this, you have access to more possible parameters for this String field than the two checkboxes on the UI. To do that, fill in the cfg objects in your config.json file with the following parameters:

 

alpha : Does it contain alpha characters

possible values: -1 means sometimes;  0 means never; 1 means always.

default value: -1

 

numeric : Does it contain numeric characters

possible values: -1 means sometimes;  0 means never; 1 means always.

default value: -1

 

lowercase : Does it contain lowercase characters.

possible values: -1 means sometimes;  0 means never; 1 means always.

default value: -1

 

uppercase : Does it contain uppercase characters.

possible values: -1 means sometimes;  0 means never; 1 means always.

default value: -1

 

capitalized : Is the first letter capitalized.

possible values: -1 means sometimes;  0 means never; 1 means always.

default value: -1

 

specialchars : Does it contain special characters.

possible values: -1 means sometimes;  0 means never; 1 means always.

default value: -1

 

min_legnth : What is the minimum length of the word.

possible values: any INT.

default value: 1

 

max_length : What is the maximum length of the word.

possible values: Any INT.

default value: 999

 

A valid entry for a String field in a config.json should look like this:

{
  "cfg": {
    "alpha": -1,
    "numeric": 0,
    "lowercase": -1,
    "uppercase": -1,
    "max_length": 999,
    "min_length": 1,
    "capitalized": -1,
    "specialchars": -1
  },
  "name": "name",
  "semantics": "word",
  "public_name": "name"
}

 

 

Impact for training and making predictions

 

By default, a String field allows every information read on the document to be considered as an eligible candidate.

 

Specifying the different parameters will have consequences for you in the training phase and when making predictions later on.

 

Basically, it provides the engine with a list of criteria that a word or a list of words should match in order to be considered as a valid candidate or an eligible possible correct answer.

 

Warning: This means that, for this specific field, we will never be able to extract any information in your document that does not match these criteria.

 

Don’t worry, the String data type is the less filtering one and the default parameters allow everything.

 

This field type should be your go-to data type if you do not have sufficient context about your documents or if the information you are looking for is displayed in inconsistent ways between different documents.