French Payslip OCR
Bulletins de salaire
This article explains how to build an OCR API that automatically extracts data from French payslips (bulletins de salaire).
Prerequisites
- You’ll need a free beta account. Sign up and confirm your email to login.
- You’ll need at least 20 French payslips images or PDFs to train your OCR.
Define your French payslips use case
First, we need to specify the fields we want to extract from our payslips.
For our example, we are going to extract the following list of fields from our French payslips:
- Employee full name: First and last names of the employee
- Employee SSN: Employee social security number
- Employer SIRET: Employer SIRET number
- Payslip period: Payslip month and year
- Net paid: Total net paid
- Gross salary: Total gross salary before taxes
Feel free to add any data you'd like the OCR to extract.
Deploy your API
Once you have defined the fields you want to extract, head over to the platform and press the ‘build a new endpoint’ button.
You now land now on the setup page. Here is the image you can use to set up the API, and my setup looks like this:
We're ready! Press the “next step” button. We are going to build our data model in the next section.
At this point, you can manually add each field as described below or you can download this json config and upload it in the left section of the screen.
Employee full name: type String with no numeric characters
Employee SSN: type String. Note that we haven't checked the "It never contains alpha characters" as social security numbers can contain 'a' or 'b' for Corsican.
Employer company SIRET: type String that never contains alpha characters.
Payslip period: type Date
Net paid: type Amount
Gross salary: type Amount
Train your Payslip OCR
You’re all set!
Now is the time to train your custom Payslip deep learning model. To get more information about the training phase, please refer to the Getting Started tutorial. And if you have any questions regarding your use case, feel free to reach out to us on our Mindee Community on Slack!