US Pay Stubs OCR
This article lays out the process recommended to build an OCR API that extracts data from US pay stubs using Mindee's deep learning engine.
Prerequisites
- You’ll need a free beta account. Sign up and confirm your email to login.
- You’ll need at least 20 US pay stubs (images or PDFs) to train your OCR.
Define your Pay Stub use case
You might need to automatically extract data from pay stubs to improve your user experience in payroll or loan eligibility workflows. This article will guide you over the few steps required to deploy your Pay Stubs data extraction API.
First, we’re going to define the fields we want to extract from your pay stubs.
Here is the list of fields we are going to extract using our OCR API:
- Employer: The full name of the employer issuing the pay stub
- Net pay: Total net paid to the employee
- Pay date: Date of wage payment
- Period beginning: Pay stub start date
- Period ending: Pay stub end date
- Gross pay: Total gross pay before taxes and deductions
- Total tax: Total tax deducted
You can add as many relevant fields as you need to better fit your requirements.
Deploy your API
Once you have defined what fields you want to extract, head over to the platform and press the ‘build a new endpoint’ button.
You land now on the setup page. Here is the image you can use to set up the API. For instance, my setup is as follows:
Once you’re ready, click on the “next step” button. We are going to specify the data types for each of the fields we want our API to extract.
To move forward, you can download this json config to set up your data model, or you can do it manually.
Employer: type String
Net pay: type Amount
Pay date: type Date
Period beginning: type Date
Period ending: type Date
Gross pay: type Amount
Total tax: Total tax deducted
Train your Pay Stub OCR
You’re all set!
Now is the time to train your US Pay Stub deep learning model in the Training section of your API.
To get more information about the training phase, please refer to the Getting Started tutorial.
If you have any questions regarding your use case, feel free to reach out on the Mindee Community on Slack!