As discussed in a previous post, existing OCR benchmarks are not especially useful for discriminating between models on the kinds of documents that social scientists actually work with. Most benchmarks, like OmniDocBench v1.5, over-index on modern printed text, clean scans, and well-resourced languages. Handwritten census records, historical logbooks, degraded administrative forms, and other ``messy" real-world data are not well represented.

socOCRbench is a small (private) benchmark designed with this gap in mind. It evaluates OCR models on hundreds of samples across two broad task types: handwriting recognition and table extraction. Within each, results are split into (A) and (B) sub-categories that roughly correspond to how well-covered the material is by existing training data and benchmarks: (A) being more conventional, (B) being more challenging or underrepresented. The metric used is Normalized Edit Similarity (NES), where 1.0 represents a perfect transcription. Note that details about what data constitutes each category are intentionally unforthcoming to protect the sanctity of the evaluation.

Future iterations of the benchmark will include more diverse document types, larger sample sizes, model efficiency, and performance after fine-tuning.

Model Overall HW [A] HW [B] Tables [A] Tables [B]
Gemini 3 Pro (low) VLM 0.5472 0.6445 0.5289 0.8936 0.2667
Gemini 3 Flash (high) VLM 0.4808 0.6049 0.4750 0.7551 0.1862
Gemini 3 Flash (low) VLM 0.4596 0.5390 0.3689 0.8734 0.2541
Qwen3-VL-235B VLM 0.4428 0.6174 0.1969 0.9398 0.2520
Qwen3-VL-30B VLM 0.4022 0.5716 0.1699 0.9104 0.1966
GPT-5.2 (low) VLM 0.3759 0.5140 0.1693 0.8445 0.2014
Gemini 3 Pro (high) VLM 0.3700 0.5336 0.3506 0.4097 0.1564
GPT-5.2 (high) VLM 0.2841 0.4190 0.0564 0.8312 0.0995
dots.ocr VLM 0.3328 0.4193 0.0751 0.9150 0.2299
GLM-OCR VLM 0.3247 0.4369 0.0693 0.8200 0.2283
PaddleOCR-VL-1.5 VLM 0.2490 0.5019 0.1279 0.3614 0.0000
PaddleOCR-VL-0.9B VLM 0.2221 0.2673 0.0446 0.6904 0.1365
Nemotron-Nano-12B VLM 0.2172 0.2568 0.0127 0.6318 0.1975
OlmOCR-2 VLM 0.2087 0.3405 0.0696 0.2791 0.1624
LightOnOCR-2-1B VLM 0.2024 0.3138 0.1005 0.4828 0.0345
DeepSeek-OCR VLM 0.1997 0.2534 0.0336 0.5004 0.1732
Tesseract Classical OCR 0.1091 0.1037 0.0285 0.1696 0.1808
Layout Parser Classical OCR 0.0263 0.0590 0.0231 0.0000 0.0000
EfficientOCR Classical OCR 0.0200 0.0509 0.0108 0.0000 0.0000
Sarvam Vision VLM 0.2735 0.2851 0.1899 0.7194 0.1335

Last updated: February 13, 2026