Karamelmar 5bf9e065e4 Add Mistral AI OCR script with test data and documentation

- ocr.php: two-step pipeline (mistral-ocr-latest + mistral-small-latest)
  extracts Serial Number, Model Number, and Date from part label photos
- input/: 5 test images of industrial part labels
- output/: corresponding YAML results
- README.md: full usage, setup, and troubleshooting docs
- .gitignore: excludes .env only
- .env.example: API key template

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-04 18:29:07 +01:00

4.5 KiB

Raw Blame History

ckOCR

PHP CLI tool that photographs part identification labels and extracts structured data using Mistral AI OCR.

Reads images from input/, calls the Mistral API, and writes YAML files to output/ containing the Serial Number, Model Number, and Date.

Requirements

PHP 8.1 – 8.5 with the curl extension enabled (no Composer required)
A Mistral AI account with API access

Arch Linux / CachyOS — enable the curl extension after installing PHP:

sudo pacman -S php
# uncomment "extension=curl" in /etc/php/php.ini
php -m | grep curl   # verify

Installation

git clone <repo-url> ckOCR
cd ckOCR
cp .env.example .env

Edit .env and insert your Mistral API key:

MISTRAL_API_KEY=your_api_key_here

Alternatively, export it as an environment variable:

export MISTRAL_API_KEY=your_api_key_here

Usage

Place one or more label photos in the input/ folder, then run:

php ocr.php

Results are written to output/ as YAML files — one per image, same filename stem.

Options

Flag	Description
`--force`	Re-process images that already have an output file
`--verbose`	Print the raw OCR text and API request details
`--help`	Show usage information

Examples

# Process all new images
php ocr.php

# Re-run everything, show full detail
php ocr.php --force --verbose

# Just see options
php ocr.php --help

Input

Supported image formats: JPG, JPEG, PNG, WebP, GIF

Maximum file size: 5 MB per image (Mistral API limit)

input/
├── part-label-01.jpg
├── motor-sn.png
└── board-sticker.jpg

Output

Each processed image produces a YAML file in output/:

output/
├── part-label-01.yaml
├── motor-sn.yaml
└── board-sticker.yaml

YAML structure

---
serial_number: SN-20241234
model_number: "XYZ-4K/B"
date: 2024-01
source_file: part-label-01.jpg
processed_at: 2026-03-04 15:30:00
raw_ocr: |
  Full text extracted from the label by the OCR model,
  preserved exactly as returned.

Field	Description
`serial_number`	Serial Number — labelled S/N, SN, Serial No., etc.
`model_number`	Model or Part Number — labelled Model, M/N, P/N, MPN, etc.
`date`	Any date on the label — MFG date, DOM, expiry, etc.
`source_file`	Original image filename
`processed_at`	Timestamp of processing
`raw_ocr`	Full OCR text returned by Mistral before extraction

Fields not found on the label are written as null.

How it works

Processing runs in two API calls per image:

Image file
    │
    ▼
[1] POST /ocr  (mistral-ocr-latest)
    │  base64-encoded image → markdown text
    │
    ▼
[2] POST /chat/completions  (mistral-small-latest)
    │  OCR text + extraction prompt → JSON with the three fields
    │
    ▼
YAML file written to output/

OCR step — the image is base64-encoded and sent to mistral-ocr-latest, which returns the full label text as markdown.
Extraction step — the OCR text is passed to mistral-small-latest with a structured prompt. The model returns a JSON object (response_format: json_object) containing serial_number, model_number, and date.

Already-processed images are skipped automatically unless --force is used.

Project structure

ckOCR/
├── ocr.php          # Main script
├── .env             # API key (not committed, see .env.example)
├── .env.example     # Template
├── .gitignore
├── input/           # Label photos (test data included)
└── output/          # YAML results (test data included)

Troubleshooting

MISTRAL_API_KEY not set Set the key in .env or export it as an environment variable.

Mistral API 401 Your API key is invalid or expired. Check it at console.mistral.ai.

File too large Resize the image below 5 MB before placing it in input/.

No text found The label may be blurry, low contrast, or too small. Try a clearer photo. The output YAML is still written with null fields so the file won't be re-processed accidentally — use --force --verbose to retry and inspect the raw OCR output.

Fields are null but text was extracted Run with --verbose to see the raw OCR text and check whether the label uses non-standard abbreviations. The extraction prompt covers the most common label formats.

4.5 KiB Raw Blame History Unescape Escape