When you want to extract digits from images—think ZIP codes, meter readings, or totals on invoices—there are two main paths in Python: the low-level route (using OpenCV) and the high-level route (using prebuilt OCR tools like Mindee).
This post is for developers who want to understand both: we'll walk through building a basic digit recognizer using OpenCV, and then compare it with modern OCR solutions using Mindee’s Python SDK and the docTR library. We'll also cover practical image preprocessing, contour sorting, and actual use cases.
Why Digit Recognition?
Digit recognition is a useful subtask in OCR. It shows up in:
- Bank checks
- Utility meter readings
- Forms and invoices
- Parking tickets
- Product barcodes
- Sudoku solvers (yes, really)
The challenge? Digits are often distorted, handwritten, scanned, or photographed in less-than-ideal conditions. That’s why robust recognition is essential.
Part 1 – Building a Digit Recognizer from Scratch with OpenCV
Let’s begin with the classic computer vision route using OpenCV and k-Nearest Neighbors.
Step 1 – Training on OpenCV’s digits.png
OpenCV ships with a pre-labeled image called digits.png
—a 50x100 grid of 20x20 digit images, totaling 5000 handwritten digits.
We’ll split this into training and testing sets.
import numpy as np
import cv2 as cv
img = cv.imread(cv.samples.findFile('digits.png'))
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# Split into individual digit cells (20x20 pixels)
cells = [np.hsplit(row, 100) for row in np.vsplit(gray, 50)]
x = np.array(cells)
train = x[:, :50].reshape(-1, 400).astype(np.float32)
test = x[:, 50:100].reshape(-1, 400).astype(np.float32)
# Labels: 250 of each digit 0–9
labels = np.repeat(np.arange(10), 250)[:, np.newaxis]
train_labels = labels.copy()
test_labels = labels.copy()
Train and test a basic kNN model:
knn = cv.ml.KNearest_create()
knn.train(train, cv.ml.ROW_SAMPLE, train_labels)
ret, result, neighbours, dist = knn.findNearest(test, k=5)
accuracy = (result == test_labels).mean() * 100
print(f"Test accuracy: {accuracy:.2f}%")
Pretty good—especially given the simplicity of the model.
Step 2 – Recognizing Digits in a New Image
Now let’s simulate a real use case: you have a photo with a sequence of digits and want to recognize them.
A) Preprocessing and Contour Detection
We’ll convert the image to grayscale, threshold it, and extract individual digits using contours.
def extract_digits_from_image(img_path):
img = cv.imread(img_path)
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
blur = cv.GaussianBlur(gray, (5, 5), 0)
_, thresh = cv.threshold(blur, 0, 255, cv.THRESH_BINARY_INV + cv.THRESH_OTSU)
contours, _ = cv.findContours(thresh, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
digit_regions = []
bounding_boxes = []
for cnt in contours:
x, y, w, h = cv.boundingRect(cnt)
if h > 10 and w > 5:
roi = thresh[y:y+h, x:x+w]
resized = cv.resize(roi, (20, 20))
digit_regions.append(resized)
bounding_boxes.append((x, y, w, h))
# Sort digits left to right
sorted_digits = [x for _, x in sorted(zip(bounding_boxes, digit_regions), key=lambda b: b[0][0])]
return sorted_digits
B) Recognizing the Digits
Once the digits are extracted and sorted, we can classify them:
def recognize_digits(digit_images, knn_model):
results = []
for digit_img in digit_images:
sample = digit_img.reshape((1, 400)).astype(np.float32)
ret, result, _, _ = knn_model.findNearest(sample, k=5)
results.append(int(result[0][0]))
return results
Usage:
digits = extract_digits_from_image("sample_digits.png")
predictions = recognize_digits(digits, knn)
print("Detected digits:", predictions)
Part 2 – The Limits of Classic OCR
Building from scratch gives you control and a better understanding of the pipeline, but:
- The model only recognizes digits
- You must manually segment the characters
- It’s sensitive to noise, skew, and font changes
- Accuracy is decent but not production-grade
If you want to recognize real-world text in different fonts, layouts, or languages—you need better tools.
Part 3 – Using Mindee’s OCR Tools
Mindee provides modern OCR APIs and open-source tools like docTR
that work out of the box for printed and scanned documents, receipts, invoices, and more.
You don’t need to build your own classifier or segmentation logic—just feed in an image.
Option 1 – Using the Mindee Python SDK
Install the SDK:
pip install mindee
Recognize structured fields like totals, dates, and tax amounts:
import mindee
client = mindee.Client(api_key="your_api_key")
doc = client.source_from_path("path/to/receipt.jpg")
result = client.parse(mindee.product.ReceiptV4, doc)
print(result.document)
✅ Example Output:
total_amount: 42.80
date: 2025-04-08
tax: 4.28
This is great for developers who want structured, field-level extraction with no ML work.
Option 2 – Using docTR for General Text & Digit OCR
Install with PyTorch backend:
pip install "python-doctr[torch]"
OCR on any image (PDF, photo, scan):
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_images(["path/to/image.jpg"])
result = model(doc)
# View predictions
for page in result.pages:
for block in page.blocks:
for line in block.lines:
print(" ".join([word.value for word in line.words]))
✅ Output:
ORDER NUMBER: 123456
TOTAL: $89.99
docTR handles layout detection, line grouping, and character recognition—all in a few lines of code.
Conclusion
OpenCV is a great educational tool and works well for tightly controlled use cases, like digit recognition from scanned forms. But if you're building anything customer-facing—or just want to skip the hassle of preprocessing and model tuning—Mindee's SDK and docTR are ready-to-use solutions.
You still get full control as a developer, just with better accuracy and fewer headaches.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere. uis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.