Add Mistral AI OCR script with test data and documentation

- ocr.php: two-step pipeline (mistral-ocr-latest + mistral-small-latest)
  extracts Serial Number, Model Number, and Date from part label photos
- input/: 5 test images of industrial part labels
- output/: corresponding YAML results
- README.md: full usage, setup, and troubleshooting docs
- .gitignore: excludes .env only
- .env.example: API key template

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Elmar Sönser 2026-03-04 18:29:07 +01:00
commit 5bf9e065e4
14 changed files with 682 additions and 0 deletions

185
README.md
View file

@ -1,2 +1,187 @@
# ckOCR
PHP CLI tool that photographs part identification labels and extracts structured data using **Mistral AI OCR**.
Reads images from `input/`, calls the Mistral API, and writes YAML files to `output/` containing the **Serial Number**, **Model Number**, and **Date**.
---
## Requirements
- PHP **8.1 8.5** with the `curl` extension enabled (no Composer required)
- A [Mistral AI](https://console.mistral.ai/) account with API access
**Arch Linux / CachyOS** — enable the curl extension after installing PHP:
```bash
sudo pacman -S php
# uncomment "extension=curl" in /etc/php/php.ini
php -m | grep curl # verify
```
---
## Installation
```bash
git clone <repo-url> ckOCR
cd ckOCR
cp .env.example .env
```
Edit `.env` and insert your Mistral API key:
```env
MISTRAL_API_KEY=your_api_key_here
```
Alternatively, export it as an environment variable:
```bash
export MISTRAL_API_KEY=your_api_key_here
```
---
## Usage
Place one or more label photos in the `input/` folder, then run:
```bash
php ocr.php
```
Results are written to `output/` as YAML files — one per image, same filename stem.
### Options
| Flag | Description |
|---|---|
| `--force` | Re-process images that already have an output file |
| `--verbose` | Print the raw OCR text and API request details |
| `--help` | Show usage information |
### Examples
```bash
# Process all new images
php ocr.php
# Re-run everything, show full detail
php ocr.php --force --verbose
# Just see options
php ocr.php --help
```
---
## Input
Supported image formats: **JPG, JPEG, PNG, WebP, GIF**
Maximum file size: **5 MB** per image (Mistral API limit)
```
input/
├── part-label-01.jpg
├── motor-sn.png
└── board-sticker.jpg
```
---
## Output
Each processed image produces a YAML file in `output/`:
```
output/
├── part-label-01.yaml
├── motor-sn.yaml
└── board-sticker.yaml
```
### YAML structure
```yaml
---
serial_number: SN-20241234
model_number: "XYZ-4K/B"
date: 2024-01
source_file: part-label-01.jpg
processed_at: 2026-03-04 15:30:00
raw_ocr: |
Full text extracted from the label by the OCR model,
preserved exactly as returned.
```
| Field | Description |
|---|---|
| `serial_number` | Serial Number — labelled S/N, SN, Serial No., etc. |
| `model_number` | Model or Part Number — labelled Model, M/N, P/N, MPN, etc. |
| `date` | Any date on the label — MFG date, DOM, expiry, etc. |
| `source_file` | Original image filename |
| `processed_at` | Timestamp of processing |
| `raw_ocr` | Full OCR text returned by Mistral before extraction |
Fields not found on the label are written as `null`.
---
## How it works
Processing runs in two API calls per image:
```
Image file
[1] POST /ocr (mistral-ocr-latest)
│ base64-encoded image → markdown text
[2] POST /chat/completions (mistral-small-latest)
│ OCR text + extraction prompt → JSON with the three fields
YAML file written to output/
```
1. **OCR step** — the image is base64-encoded and sent to `mistral-ocr-latest`, which returns the full label text as markdown.
2. **Extraction step** — the OCR text is passed to `mistral-small-latest` with a structured prompt. The model returns a JSON object (`response_format: json_object`) containing `serial_number`, `model_number`, and `date`.
Already-processed images are skipped automatically unless `--force` is used.
---
## Project structure
```
ckOCR/
├── ocr.php # Main script
├── .env # API key (not committed, see .env.example)
├── .env.example # Template
├── .gitignore
├── input/ # Label photos (test data included)
└── output/ # YAML results (test data included)
```
---
## Troubleshooting
**`MISTRAL_API_KEY not set`**
Set the key in `.env` or export it as an environment variable.
**`Mistral API 401`**
Your API key is invalid or expired. Check it at [console.mistral.ai](https://console.mistral.ai/).
**`File too large`**
Resize the image below 5 MB before placing it in `input/`.
**`No text found`**
The label may be blurry, low contrast, or too small. Try a clearer photo. The output YAML is still written with `null` fields so the file won't be re-processed accidentally — use `--force --verbose` to retry and inspect the raw OCR output.
**Fields are `null` but text was extracted**
Run with `--verbose` to see the raw OCR text and check whether the label uses non-standard abbreviations. The extraction prompt covers the most common label formats.