ckOCR/README.md
Karamelmar 5bf9e065e4 Add Mistral AI OCR script with test data and documentation
- ocr.php: two-step pipeline (mistral-ocr-latest + mistral-small-latest)
  extracts Serial Number, Model Number, and Date from part label photos
- input/: 5 test images of industrial part labels
- output/: corresponding YAML results
- README.md: full usage, setup, and troubleshooting docs
- .gitignore: excludes .env only
- .env.example: API key template

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 18:29:07 +01:00

187 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ckOCR
PHP CLI tool that photographs part identification labels and extracts structured data using **Mistral AI OCR**.
Reads images from `input/`, calls the Mistral API, and writes YAML files to `output/` containing the **Serial Number**, **Model Number**, and **Date**.
---
## Requirements
- PHP **8.1 8.5** with the `curl` extension enabled (no Composer required)
- A [Mistral AI](https://console.mistral.ai/) account with API access
**Arch Linux / CachyOS** — enable the curl extension after installing PHP:
```bash
sudo pacman -S php
# uncomment "extension=curl" in /etc/php/php.ini
php -m | grep curl # verify
```
---
## Installation
```bash
git clone <repo-url> ckOCR
cd ckOCR
cp .env.example .env
```
Edit `.env` and insert your Mistral API key:
```env
MISTRAL_API_KEY=your_api_key_here
```
Alternatively, export it as an environment variable:
```bash
export MISTRAL_API_KEY=your_api_key_here
```
---
## Usage
Place one or more label photos in the `input/` folder, then run:
```bash
php ocr.php
```
Results are written to `output/` as YAML files — one per image, same filename stem.
### Options
| Flag | Description |
|---|---|
| `--force` | Re-process images that already have an output file |
| `--verbose` | Print the raw OCR text and API request details |
| `--help` | Show usage information |
### Examples
```bash
# Process all new images
php ocr.php
# Re-run everything, show full detail
php ocr.php --force --verbose
# Just see options
php ocr.php --help
```
---
## Input
Supported image formats: **JPG, JPEG, PNG, WebP, GIF**
Maximum file size: **5 MB** per image (Mistral API limit)
```
input/
├── part-label-01.jpg
├── motor-sn.png
└── board-sticker.jpg
```
---
## Output
Each processed image produces a YAML file in `output/`:
```
output/
├── part-label-01.yaml
├── motor-sn.yaml
└── board-sticker.yaml
```
### YAML structure
```yaml
---
serial_number: SN-20241234
model_number: "XYZ-4K/B"
date: 2024-01
source_file: part-label-01.jpg
processed_at: 2026-03-04 15:30:00
raw_ocr: |
Full text extracted from the label by the OCR model,
preserved exactly as returned.
```
| Field | Description |
|---|---|
| `serial_number` | Serial Number — labelled S/N, SN, Serial No., etc. |
| `model_number` | Model or Part Number — labelled Model, M/N, P/N, MPN, etc. |
| `date` | Any date on the label — MFG date, DOM, expiry, etc. |
| `source_file` | Original image filename |
| `processed_at` | Timestamp of processing |
| `raw_ocr` | Full OCR text returned by Mistral before extraction |
Fields not found on the label are written as `null`.
---
## How it works
Processing runs in two API calls per image:
```
Image file
[1] POST /ocr (mistral-ocr-latest)
│ base64-encoded image → markdown text
[2] POST /chat/completions (mistral-small-latest)
│ OCR text + extraction prompt → JSON with the three fields
YAML file written to output/
```
1. **OCR step** — the image is base64-encoded and sent to `mistral-ocr-latest`, which returns the full label text as markdown.
2. **Extraction step** — the OCR text is passed to `mistral-small-latest` with a structured prompt. The model returns a JSON object (`response_format: json_object`) containing `serial_number`, `model_number`, and `date`.
Already-processed images are skipped automatically unless `--force` is used.
---
## Project structure
```
ckOCR/
├── ocr.php # Main script
├── .env # API key (not committed, see .env.example)
├── .env.example # Template
├── .gitignore
├── input/ # Label photos (test data included)
└── output/ # YAML results (test data included)
```
---
## Troubleshooting
**`MISTRAL_API_KEY not set`**
Set the key in `.env` or export it as an environment variable.
**`Mistral API 401`**
Your API key is invalid or expired. Check it at [console.mistral.ai](https://console.mistral.ai/).
**`File too large`**
Resize the image below 5 MB before placing it in `input/`.
**`No text found`**
The label may be blurry, low contrast, or too small. Try a clearer photo. The output YAML is still written with `null` fields so the file won't be re-processed accidentally — use `--force --verbose` to retry and inspect the raw OCR output.
**Fields are `null` but text was extracted**
Run with `--verbose` to see the raw OCR text and check whether the label uses non-standard abbreviations. The extraction prompt covers the most common label formats.