Blog
How To

Digit Recognition in Python: From OpenCV Fundamentals to Mindee’s OCR Tools

Reading time:
5
min
Published on:
Apr 8, 2025

When you want to extract digits from images—think ZIP codes, meter readings, or totals on invoices—there are two main paths in Python: the low-level route (using OpenCV) and the high-level route (using prebuilt OCR tools like Mindee).

This post is for developers who want to understand both: we'll walk through building a basic digit recognizer using OpenCV, and then compare it with modern OCR solutions using Mindee’s Python SDK and the docTR library. We'll also cover practical image preprocessing, contour sorting, and actual use cases.

Why Digit Recognition?

Digit recognition is a useful subtask in OCR. It shows up in:

  • Bank checks
  • Utility meter readings
  • Forms and invoices
  • Parking tickets
  • Product barcodes
  • Sudoku solvers (yes, really)

The challenge? Digits are often distorted, handwritten, scanned, or photographed in less-than-ideal conditions. That’s why robust recognition is essential.

Part 1 – Building a Digit Recognizer from Scratch with OpenCV

Let’s begin with the classic computer vision route using OpenCV and k-Nearest Neighbors.

Step 1 – Training on OpenCV’s digits.png

OpenCV ships with a pre-labeled image called digits.png—a 50x100 grid of 20x20 digit images, totaling 5000 handwritten digits.

We’ll split this into training and testing sets.

import numpy as np
import cv2 as cv

img = cv.imread(cv.samples.findFile('digits.png'))
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

# Split into individual digit cells (20x20 pixels)
cells = [np.hsplit(row, 100) for row in np.vsplit(gray, 50)]
x = np.array(cells)

train = x[:, :50].reshape(-1, 400).astype(np.float32)
test = x[:, 50:100].reshape(-1, 400).astype(np.float32)

# Labels: 250 of each digit 0–9
labels = np.repeat(np.arange(10), 250)[:, np.newaxis]
train_labels = labels.copy()
test_labels = labels.copy()

Train and test a basic kNN model:

knn = cv.ml.KNearest_create()
knn.train(train, cv.ml.ROW_SAMPLE, train_labels)

ret, result, neighbours, dist = knn.findNearest(test, k=5)

accuracy = (result == test_labels).mean() * 100
print(f"Test accuracy: {accuracy:.2f}%")

Output: Test accuracy: ~91.76%

Pretty good—especially given the simplicity of the model.

Step 2 – Recognizing Digits in a New Image

Now let’s simulate a real use case: you have a photo with a sequence of digits and want to recognize them.

A) Preprocessing and Contour Detection

We’ll convert the image to grayscale, threshold it, and extract individual digits using contours.

def extract_digits_from_image(img_path):
    img = cv.imread(img_path)
    gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
    blur = cv.GaussianBlur(gray, (5, 5), 0)
    _, thresh = cv.threshold(blur, 0, 255, cv.THRESH_BINARY_INV + cv.THRESH_OTSU)

    contours, _ = cv.findContours(thresh, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)

    digit_regions = []
    bounding_boxes = []

    for cnt in contours:
        x, y, w, h = cv.boundingRect(cnt)
        if h > 10 and w > 5:
            roi = thresh[y:y+h, x:x+w]
            resized = cv.resize(roi, (20, 20))
            digit_regions.append(resized)
            bounding_boxes.append((x, y, w, h))

    # Sort digits left to right
    sorted_digits = [x for _, x in sorted(zip(bounding_boxes, digit_regions), key=lambda b: b[0][0])]
    return sorted_digits

B) Recognizing the Digits

Once the digits are extracted and sorted, we can classify them:

def recognize_digits(digit_images, knn_model):
    results = []
    for digit_img in digit_images:
        sample = digit_img.reshape((1, 400)).astype(np.float32)
        ret, result, _, _ = knn_model.findNearest(sample, k=5)
        results.append(int(result[0][0]))
    return results

Usage:

digits = extract_digits_from_image("sample_digits.png")
predictions = recognize_digits(digits, knn)
print("Detected digits:", predictions)

Example Output: Detected digits: [3, 1, 4, 1, 5]

Part 2 – The Limits of Classic OCR

Building from scratch gives you control and a better understanding of the pipeline, but:

  • The model only recognizes digits
  • You must manually segment the characters
  • It’s sensitive to noise, skew, and font changes
  • Accuracy is decent but not production-grade

If you want to recognize real-world text in different fonts, layouts, or languages—you need better tools.

Part 3 – Using Mindee’s OCR Tools

Mindee provides modern OCR APIs and open-source tools like docTR that work out of the box for printed and scanned documents, receipts, invoices, and more.

You don’t need to build your own classifier or segmentation logic—just feed in an image.

Option 1 – Using the Mindee Python SDK

Install the SDK:

pip install mindee

Recognize structured fields like totals, dates, and tax amounts:

import mindee

client = mindee.Client(api_key="your_api_key")
doc = client.source_from_path("path/to/receipt.jpg")
result = client.parse(mindee.product.ReceiptV4, doc)

print(result.document)

✅ Example Output:

total_amount: 42.80
date: 2025-04-08
tax: 4.28

This is great for developers who want structured, field-level extraction with no ML work.

Option 2 – Using docTR for General Text & Digit OCR

Install with PyTorch backend:

pip install "python-doctr[torch]"

OCR on any image (PDF, photo, scan):

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_images(["path/to/image.jpg"])
result = model(doc)

# View predictions
for page in result.pages:
    for block in page.blocks:
        for line in block.lines:
            print(" ".join([word.value for word in line.words]))

✅ Output:

ORDER NUMBER: 123456
TOTAL: $89.99

docTR handles layout detection, line grouping, and character recognition—all in a few lines of code.

Comparison Table: OpenCV kNN vs. Mindee SDK / docTR
Feature OpenCV kNN Mindee SDK / docTR
Setup Manual One-line install
Preprocessing required Yes Optional
Layout handling No Yes
Recognition targets Only digits Any text
Language support None English, French, etc.
Confidence scores No Yes
Real-world accuracy Moderate High

Conclusion

OpenCV is a great educational tool and works well for tightly controlled use cases, like digit recognition from scanned forms. But if you're building anything customer-facing—or just want to skip the hassle of preprocessing and model tuning—Mindee's SDK and docTR are ready-to-use solutions.

You still get full control as a developer, just with better accuracy and fewer headaches.

How To

Next steps

Try out our products for free. No commitment or credit card required. If you want a custom plan or have questions, we’d be happy to chat.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
0 Comments
Author Name
Comment Time

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere. uis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

FAQ

Can I build a digit recognition system using only OpenCV?

Yes, you can build a digit recognizer using OpenCV with techniques like contour detection and k-Nearest Neighbors. However, accuracy and flexibility are limited for real-world applications.

What’s the advantage of using Mindee over OpenCV for digit recognition?

Mindee’s OCR tools handle layout analysis, text detection, and recognition out of the box—no manual preprocessing or training required. It’s faster to implement and more accurate on real documents.

Can Mindee’s OCR work with handwritten digits?

Mindee’s docTR is primarily designed for printed text. While it can sometimes recognize clean handwriting, it’s best suited for typed or scanned text in documents like invoices, receipts, and forms.