docTR

Open-source python document understanding library for developers and data scientists

Trainable deep learning OCR enabling the most advanced document understanding use cases

State-of-the-art

Benefit from the latest computer vision breakthroughs to solve the most complex document processing use cases.

Open source

Build a tailor-made OCR capability that can be hosted in your environment to comply with your data privacy policy.

Trainable

Train docTR’s text detection, recognition, or classification algorithms on your data and achieve maximum performances.

A fully packaged document understanding library for developers and data scientists


Pretrained OCR

Plug and play python OCR trained on millions of latin alphabet documents


See docs

End-to-end Pipeline

Two-stages OCR pipeline using text detection and recognition algorithms


See more

Training

Text detection and recognition training scripts for PyTorch and TensorFlow


See references

Public Datasets

Built-in support for the most famous OCR challenges public datasets


See datasets

Artefact Detection

Detection algorithms for QR codes, bar codes, signatures, faces...


See docs

Benchmark

Recall, precision and FPS benchmark between different models


See benchmarks

TensorFlow JS

OCR inference in web browser powered by TFJS


See demo app

Local Demo App

Local demo UI generator powered by Streamlit


See docs

Model Compression

Half-precision and quantization support for model optimization


See docs


Our experts are here for you!

Let our team of deep learning researchers, ML ops, and software engineers help you build the most advanced document understanding technology


Developers


Community

Our growing community of data scientists and developers can help you



Enterprise


Support

Need help to train or implement state-of-the-art OCR models? We have your back!

Hosting

Use your models in production thanks to our highly scalable and optimized infrastructure