Define your custom Data Model

 

 

In order to train your custom parsing API, you need to define a data model, i.e. a list of fields and their corresponding data types you want to extract from your documents.

 

This tutorial will walk you through the steps of defining such a Data Model.

 

For further information on specific data types, see arborescence on the left.

 

 

Prerequisites

  1. You’ll need a free beta account. Sign up and confirm your email to login.

 

 

Let’s get started! 

 

 

After giving a name, description and cover image to your new API, you should land on the following page

 

 

From there you have two options:

 

1- Manually add fields one by one by filling the form on the right side of the screen

2- Upload a Data Model config file

 

Let’s start with the manual option.

 

 

Manually add a field to your Data Model

 

You can add a new field by filling in the right-side form with the following information:

 

Field Name: The straightforward name that will appear on the annotation interface later on. Use a name that means something to you when reading it

 

API response key: The name of the key used in the API response scheme

 

Field type: The Field Type specifies the type of information we are going to look for on the document and defines the data type that will be returned in the API response.

 

You can choose among a drop-down list of pre-built data types between :

 

 

Update or Delete a field

 

Now that you’ve created you first field, you should see something like this

 

 

From there you can: 

 

  1. Add a field : Manually add a new field to your Data Model following the same process. You can repeat this step as many times as you want, there is no limit is the number of fields you can extract from a document.
  2. Edit a field : Edit a specific field. You can change its Field name, API response key and Field type
  3. Delete a field : Delete a specific field from your data model
  4. Start training : When your Data Model is ready. Click here to automatically deploy your API and start training it. (Hyperlink :Go to the training tuto)

 

 

Upload a Data Model file config  

 

Alternatively, you can create a whole set of fields at once by uploading a config json file in the left-side section. 

 

 

 

For instance, with our started example, the config json file looks like this

 

 

[
  {
    "cfg": {
      "alpha": -1,
      "numeric": 0,
      "lowercase": -1,
      "uppercase": -1,
      "max_length": 999,
      "min_length": 1,
      "capitalized": -1,
      "specialchars": -1
    },
    "name": "name",
    "semantics": "word",
    "public_name": "name"
  },
  {
    "cfg": {
      "is_integer": -1
    },
    "name": "rating",
    "semantics": "amount",
    "public_name": "rating"
  },
  {
    "cfg": {
      "alpha": -1,
      "numeric": 0,
      "lowercase": -1,
      "uppercase": -1,
      "max_length": 999,
      "min_length": 1,
      "capitalized": -1,
      "specialchars": -1
    },
    "name": "first_burger",
    "semantics": "word",
    "public_name": "first burger"
  },
  {
    "cfg": {
      "is_integer": -1
    },
    "name": "first_burger_price",
    "semantics": "amount",
    "public_name": "first burger price"
  },
  {
    "cfg": {
      "is_integer": 1
    },
    "name": "number_of_vote",
    "semantics": "amount",
    "public_name": "number of votes"
  },
  {
    "cfg": {
      "alpha": -1,
      "numeric": -1,
      "lowercase": -1,
      "uppercase": -1,
      "max_length": 999,
      "min_length": 1,
      "capitalized": -1,
      "specialchars": -1
    },
    "name": "address",
    "semantics": "word",
    "public_name": "address"
  }
]

 

Each field is introduce by a cfg key and has the following attributes:

 

name : the API response key. Mandatory.

semantics : the Field Type. Mandatory.

public_name : the Field Name. Mandatory.

cfg : An object of additional data constraints depending on the Field Type. Mandatory.

 

 

The possible parameters for the cfg object are all mandatory and are the following:

 

semantics = word (String):

alpha : Does it contain alpha characters. -1 = sometimes, 0 = never, 1 = always. Default value = -1

numeric : Does it contain numeric characters. -1 = sometimes, 0 = never, 1 = always. Default value = -1

lowercase : Does it contain lowercase characters. -1 = sometimes, 0 = never, 1 = always. Default value = -1

uppercase : Does it contain uppercase characters. -1 = sometimes, 0 = never, 1 = always. Default value = -1

capitalized : Is the first letter capitalized. -1 = sometimes, 0 = never, 1 = always. Default value = -1

specialchars : Does it contain special characters. -1 = sometimes, 0 = never, 1 = always. Default value = -1

min_legnth : What is the minimum length of the word. Accepts INT. Default value = 1

max_length : What is the maximum length of the word. Accepts INT. Default value = 999

 

semantics = amount :

is_integer : Is it an INT. -1 = sometimes, 1 = always. Default value = -1