French Payslip OCR
Bulletins de salaire

 

 

This article explains how to build an OCR API that automatically extracts data from French payslips (bulletins de salaire)

 

 

Prerequisites

  1. You’ll need a free beta account. Sign up and confirm your email to login.
  2. You’ll need at least 20 French payslips images or PDFs to train your OCR.

 

 

 

Define your French payslips use case

 

First, we need to specify the fields we want to extract from our payslips.

 

 

French Payslip OCR

 

 

For our example, we are going to extract the following list of fields from our French payslips:

 

  • Employee full name: First and last names of the employee
  • Employee SSN: Employee social security number
  • Employer SIRET: Employer SIRET number 
  • Payslip period:  Payslip month and year 
  • Net paid: Total net paid 
  • Gross salary: Total gross salary before taxes

 

Feel free to add any data you'd like the OCR to extract.

 

 

Deploy your API

 

Once you have defined the fields you want to extract, head over to the platform and press the ‘build a new endpoint’ button.

 

You now land now on the setup page. Here is the image you can use to set up the API, and my setup looks like this:

 

 

setup french pyslip ocr

 

 

We're ready! Press the “next step” button. We are going to build our data model in the next section.


 

At this point, you can manually add each field as described below or you can download this json config and upload it in the left section of the screen.

 

 

Employee full name: type String with no numeric characters

 

employee full name payslip ocr

 

 

Employee SSN: type String. Note that we haven't checked the "It never contains alpha characters" as social security numbers can contain 'a' or 'b' for Corsican.

 

employee ssn payslip ocr

 

 

Employer company SIRET: type String that never contains alpha characters.

 

company siret payslip ocr

 

 

Payslip period: type Date 

 

payslip date ocr

 

 

Net paid: type Amount

 

Net paid payslip ocr

 

 

Gross salary: type Amount

 

Gross salary payslip OCR

 

 

Train your Payslip OCR

 

 

You’re all set! 

 

Now is the time to train your custom Payslip deep learning model. To get more information about the training phase, please refer to the Getting Started tutorial. And if you have any questions regarding your use case, feel free to reach out to us on our Mindee Community on Slack!