- ocr.php: two-step pipeline (mistral-ocr-latest + mistral-small-latest) extracts Serial Number, Model Number, and Date from part label photos - input/: 5 test images of industrial part labels - output/: corresponding YAML results - README.md: full usage, setup, and troubleshooting docs - .gitignore: excludes .env only - .env.example: API key template Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
187 lines
4.5 KiB
Markdown
187 lines
4.5 KiB
Markdown
# ckOCR
|
||
|
||
PHP CLI tool that photographs part identification labels and extracts structured data using **Mistral AI OCR**.
|
||
|
||
Reads images from `input/`, calls the Mistral API, and writes YAML files to `output/` containing the **Serial Number**, **Model Number**, and **Date**.
|
||
|
||
---
|
||
|
||
## Requirements
|
||
|
||
- PHP **8.1 – 8.5** with the `curl` extension enabled (no Composer required)
|
||
- A [Mistral AI](https://console.mistral.ai/) account with API access
|
||
|
||
**Arch Linux / CachyOS** — enable the curl extension after installing PHP:
|
||
|
||
```bash
|
||
sudo pacman -S php
|
||
# uncomment "extension=curl" in /etc/php/php.ini
|
||
php -m | grep curl # verify
|
||
```
|
||
|
||
---
|
||
|
||
## Installation
|
||
|
||
```bash
|
||
git clone <repo-url> ckOCR
|
||
cd ckOCR
|
||
cp .env.example .env
|
||
```
|
||
|
||
Edit `.env` and insert your Mistral API key:
|
||
|
||
```env
|
||
MISTRAL_API_KEY=your_api_key_here
|
||
```
|
||
|
||
Alternatively, export it as an environment variable:
|
||
|
||
```bash
|
||
export MISTRAL_API_KEY=your_api_key_here
|
||
```
|
||
|
||
---
|
||
|
||
## Usage
|
||
|
||
Place one or more label photos in the `input/` folder, then run:
|
||
|
||
```bash
|
||
php ocr.php
|
||
```
|
||
|
||
Results are written to `output/` as YAML files — one per image, same filename stem.
|
||
|
||
### Options
|
||
|
||
| Flag | Description |
|
||
|---|---|
|
||
| `--force` | Re-process images that already have an output file |
|
||
| `--verbose` | Print the raw OCR text and API request details |
|
||
| `--help` | Show usage information |
|
||
|
||
### Examples
|
||
|
||
```bash
|
||
# Process all new images
|
||
php ocr.php
|
||
|
||
# Re-run everything, show full detail
|
||
php ocr.php --force --verbose
|
||
|
||
# Just see options
|
||
php ocr.php --help
|
||
```
|
||
|
||
---
|
||
|
||
## Input
|
||
|
||
Supported image formats: **JPG, JPEG, PNG, WebP, GIF**
|
||
|
||
Maximum file size: **5 MB** per image (Mistral API limit)
|
||
|
||
```
|
||
input/
|
||
├── part-label-01.jpg
|
||
├── motor-sn.png
|
||
└── board-sticker.jpg
|
||
```
|
||
|
||
---
|
||
|
||
## Output
|
||
|
||
Each processed image produces a YAML file in `output/`:
|
||
|
||
```
|
||
output/
|
||
├── part-label-01.yaml
|
||
├── motor-sn.yaml
|
||
└── board-sticker.yaml
|
||
```
|
||
|
||
### YAML structure
|
||
|
||
```yaml
|
||
---
|
||
serial_number: SN-20241234
|
||
model_number: "XYZ-4K/B"
|
||
date: 2024-01
|
||
source_file: part-label-01.jpg
|
||
processed_at: 2026-03-04 15:30:00
|
||
raw_ocr: |
|
||
Full text extracted from the label by the OCR model,
|
||
preserved exactly as returned.
|
||
```
|
||
|
||
| Field | Description |
|
||
|---|---|
|
||
| `serial_number` | Serial Number — labelled S/N, SN, Serial No., etc. |
|
||
| `model_number` | Model or Part Number — labelled Model, M/N, P/N, MPN, etc. |
|
||
| `date` | Any date on the label — MFG date, DOM, expiry, etc. |
|
||
| `source_file` | Original image filename |
|
||
| `processed_at` | Timestamp of processing |
|
||
| `raw_ocr` | Full OCR text returned by Mistral before extraction |
|
||
|
||
Fields not found on the label are written as `null`.
|
||
|
||
---
|
||
|
||
## How it works
|
||
|
||
Processing runs in two API calls per image:
|
||
|
||
```
|
||
Image file
|
||
│
|
||
▼
|
||
[1] POST /ocr (mistral-ocr-latest)
|
||
│ base64-encoded image → markdown text
|
||
│
|
||
▼
|
||
[2] POST /chat/completions (mistral-small-latest)
|
||
│ OCR text + extraction prompt → JSON with the three fields
|
||
│
|
||
▼
|
||
YAML file written to output/
|
||
```
|
||
|
||
1. **OCR step** — the image is base64-encoded and sent to `mistral-ocr-latest`, which returns the full label text as markdown.
|
||
2. **Extraction step** — the OCR text is passed to `mistral-small-latest` with a structured prompt. The model returns a JSON object (`response_format: json_object`) containing `serial_number`, `model_number`, and `date`.
|
||
|
||
Already-processed images are skipped automatically unless `--force` is used.
|
||
|
||
---
|
||
|
||
## Project structure
|
||
|
||
```
|
||
ckOCR/
|
||
├── ocr.php # Main script
|
||
├── .env # API key (not committed, see .env.example)
|
||
├── .env.example # Template
|
||
├── .gitignore
|
||
├── input/ # Label photos (test data included)
|
||
└── output/ # YAML results (test data included)
|
||
```
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
**`MISTRAL_API_KEY not set`**
|
||
Set the key in `.env` or export it as an environment variable.
|
||
|
||
**`Mistral API 401`**
|
||
Your API key is invalid or expired. Check it at [console.mistral.ai](https://console.mistral.ai/).
|
||
|
||
**`File too large`**
|
||
Resize the image below 5 MB before placing it in `input/`.
|
||
|
||
**`No text found`**
|
||
The label may be blurry, low contrast, or too small. Try a clearer photo. The output YAML is still written with `null` fields so the file won't be re-processed accidentally — use `--force --verbose` to retry and inspect the raw OCR output.
|
||
|
||
**Fields are `null` but text was extracted**
|
||
Run with `--verbose` to see the raw OCR text and check whether the label uses non-standard abbreviations. The extraction prompt covers the most common label formats.
|