

Open-source python document understanding library for developers and data scientists

This is a preview of docTR

Trainable deep learning OCR enabling the most advanced document understanding use cases

this is a docTR illustration


Benefit from the latest computer vision breakthroughs to solve the most complex document processing use cases.

Open source

Build a tailor-made OCR capability that can be hosted in your environment to comply with your data privacy policy.


Achieve high extraction performances at scale on US, Europe, or any latin alphabet receipts, from various industries and sectors.

A fully packaged document understanding library for developers and data scientists

Pretrained OCR

Plug and play python OCR trained on millions of latin alphabet documents

See docs


Two-stages OCR pipeline using text detection and recognition algorithms

See more


Text detection and recognition training scripts for PyTorch and TensorFlow

See references

Public Datasets

Built-in support for the most famous OCR challenges public datasets

See datasets

Artefact Detection

Detection algorithms for QR codes, bar codes, signatures, faces...

See docs


Recall, precision and FPS benchmark between different models

See benchmarks

TensorFlow JS

OCR inference in web browser powered by TFJS

See demo app

Local Demo App

Local demo UI generator powered by Streamlit

See docs

Model Compression

Half-precision and quantization support for model optimization

See docs