As discussed in a previous post, existing OCR benchmarks are not especially useful for discriminating between models on the kinds of documents that social scientists actually work with. Most benchmarks, like OmniDocBench v1.5, over-index on modern printed text, clean scans, and well-resourced languages. Handwritten census records, historical logbooks, degraded administrative forms, and other ``messy" real-world data are not well represented.

socOCRbench is a small (private) benchmark designed with this gap in mind. It evaluates OCR models on samples across handwriting recognition, table extraction, and printed text recognition. The overall score is the mean of three metrics: NES (Normalized Edit Similarity), chrF (character n-gram F-score) for text, and TEDS (Tree Edit Distance Similarity) for tables. Each ranges from 0 to 1, where 1.0 is perfect.

You can read more about socOCRbench and the motivation behind it in the corresponding working paper.

v3
v2
v1
Model socOCRbench NES Region W. Europe E. Europe E. Asia S. Asia MENA NES Format HW Text Print Text HW Table chrF W. Europe E. Europe E. Asia S. Asia MENA HW Text Print Text TEDS $/M In $/M Out
Gemini 3.1 Pro (low) VLM Proprietary 0.6357 0.6577 0.68910.66050.56950.73150.6377 0.6450 0.64860.70220.5843 0.6054 0.68430.53020.54750.74280.5221 0.57520.6835 0.6502 2.00 12.00
Gemini 3 Pro (low) VLM Proprietary 0.6249 0.6888 0.76270.72100.57040.73550.6545 0.6350 0.67980.80110.4241 0.6479 0.81080.56120.57200.76620.5291 0.60130.7964 0.5650 2.00 12.00
Gemini 3.1 Flash Lite (minimal) VLM Proprietary 0.6214 0.6502 0.74280.64900.52660.73410.5987 0.6424 0.63910.74870.5395 0.5822 0.75050.48980.50950.72160.4393 0.53640.7312 0.6356 0.25 1.50
Gemini 3.5 Flash (low) VLM Proprietary 0.6096 0.6566 0.71630.68610.54270.69900.6388 0.6129 0.64930.74680.4424 0.6122 0.75640.53890.51490.74030.5104 0.57540.7358 0.5819 1.50 9.00
Gemini 3.5 Flash (minimal) VLM Proprietary 0.6022 0.6441 0.70810.62580.55940.70830.6190 0.6032 0.62680.75070.4321 0.6001 0.74900.48720.53160.72440.5084 0.56270.7282 0.5828 1.50 9.00
Gemini 3 Flash (low) VLM Proprietary 0.5995 0.6453 0.70630.61430.54460.71560.6456 0.6076 0.63920.73000.4537 0.6068 0.77070.48200.52110.73400.5261 0.57310.7378 0.5652 0.50 3.00
Claude Sonnet 4.6 VLM Proprietary 0.5980 0.5628 0.69380.60560.33890.67040.5052 0.5764 0.53580.70200.4914 0.5039 0.70700.44030.37600.64700.3492 0.46070.6674 0.7205 3.00 15.00
Gemini 3 Flash (minimal) VLM Proprietary 0.5920 0.6390 0.72870.60800.52270.72090.6145 0.6145 0.63340.73870.4712 0.5903 0.77660.46200.48890.73770.4863 0.56130.7258 0.5590 0.50 3.00
Qwen3.7 Plus VLM Proprietary 0.5830 0.6140 0.71180.59770.56560.70120.4936 0.5919 0.59060.74530.4399 0.5402 0.72200.42600.55650.66340.3330 0.48930.7052 0.6059 0.32 1.28
Gemini 3.1 Flash Lite (low) VLM Proprietary 0.5819 0.6359 0.73070.63730.51400.72030.5772 0.6137 0.62380.74750.4698 0.5767 0.75870.48650.51180.70540.4210 0.52950.7357 0.5443 0.25 1.50
Qwen3.5 122B VLM Open Source 0.5753 0.6078 0.71330.60140.55000.68600.4884 0.5929 0.58940.73420.4551 0.5398 0.71740.42320.54830.64620.3640 0.50130.6864 0.5858 0.40 3.20
Seed 2.0 Pro VLM Proprietary 0.5631 0.6010 0.65310.62720.55540.69930.4697 0.5643 0.55360.73540.4039 0.5513 0.68130.47070.57120.69110.3422 0.48940.7019 0.5554 0.47 2.37
Qwen3.6 Plus VLM 0.5623 0.5964 0.67840.55660.53680.70000.5102 0.5727 0.58210.70030.4355 0.5335 0.68270.40150.53010.67810.3753 0.49600.6608 0.5689 0.33 1.95
Qwen3.5 397B VLM Open Source 0.5616 0.6353 0.73990.60680.57810.74120.5105 0.6152 0.61400.76710.4644 0.5716 0.74680.44960.57050.70750.3838 0.52820.7219 0.4879 0.60 3.60
Qwen3.5 Plus VLM Proprietary 0.5576 0.6279 0.72990.58120.56930.72980.5293 0.6069 0.61010.75330.4574 0.5663 0.73880.44180.56990.70170.3789 0.52040.7186 0.4891 0.80 2.00
Qwen3.5 397B (thinking) VLM Open Source 0.5504 0.5935 0.69350.56230.57070.66980.4710 0.5809 0.56300.73210.4477 0.5436 0.72060.41350.56520.67000.3487 0.48920.7112 0.5204 0.60 3.60
Qwen3 VL 235B VLM Open Source 0.5478 0.5967 0.71150.59430.55400.69050.4330 0.6021 0.56270.73960.5040 0.5236 0.68930.43020.54470.66780.2859 0.46370.6908 0.5204 0.20 0.88
Gemini 2.5 Flash VLM Proprietary 0.5446 0.5833 0.65810.54890.49870.69780.5131 0.5720 0.53520.72190.4587 0.5471 0.70120.40540.50000.71280.4163 0.47850.7198 0.5091 0.30 2.50
Qwen3.5 27B VLM Open Source 0.5417 0.5926 0.69840.55740.56150.66860.4770 0.5786 0.58310.70620.4465 0.5155 0.69530.39460.54600.59980.3418 0.48410.6548 0.5242 0.30 2.40
Claude Opus 4.6 VLM Proprietary 0.5415 0.5568 0.68270.60210.34960.65690.4927 0.5537 0.52960.70380.4279 0.5057 0.71090.45270.38530.64990.3295 0.45680.6781 0.5637 5.00 25.00
Qwen3.5 Plus (2026-04-20) VLM Proprietary 0.5392 0.5802 0.69880.51490.55610.62940.5019 0.5756 0.57000.70280.4539 0.5246 0.71790.37720.53470.62160.3718 0.48970.6758 0.5150 0.30 1.80
Gemini 2.0 Flash VLM Proprietary 0.5295 0.5777 0.63560.55770.48150.69770.5162 0.5562 0.52180.71920.4276 0.5342 0.66560.41040.47720.71000.4080 0.45160.7132 0.4872 0.10 0.40
Datalab (accurate) VLM Proprietary 0.5213 0.5018 0.53940.55430.46740.63550.3122 0.4413 0.41550.69430.2141 0.4961 0.61270.43070.50820.69410.2347 0.39720.6874 0.5962 8.65 8.65
Datalab (balanced) VLM Proprietary 0.5167 0.5146 0.54400.57030.48130.64130.3362 0.4491 0.44210.68310.2221 0.4971 0.59860.42070.52210.69900.2451 0.40180.6750 0.5713 5.91 5.91
Seed 2.0 Lite VLM Proprietary 0.5160 0.5747 0.64470.59480.53070.60220.5013 0.5347 0.55920.68540.3594 0.5417 0.68350.44650.53610.65610.3865 0.49770.6755 0.4516 0.09 0.53
Datalab (fast) VLM Proprietary 0.5124 0.5151 0.54560.58560.47800.63740.3288 0.4492 0.44190.68580.2199 0.4952 0.59400.42890.51660.69640.2400 0.39880.6731 0.5598 5.87 5.87
Mistral OCR 4 VLM Proprietary 0.5055 0.5608 0.67900.54850.48890.63340.4545 0.5701 0.52030.71590.4740 0.5002 0.66700.38310.44300.64560.3622 0.43320.6783 0.4509 3.70 3.70
Seed 2.0 Mini VLM Proprietary 0.4974 0.5448 0.58310.53230.52830.62610.4542 0.5041 0.50820.65570.3483 0.5244 0.61980.39370.56140.68720.3600 0.46410.6532 0.4433 0.10 0.40
Qwen3.6 Flash VLM Open Source 0.4882 0.5347 0.65610.46800.52340.60080.4251 0.5392 0.51910.66030.4381 0.4669 0.64840.32200.50830.55150.3043 0.43330.6105 0.4609 0.19 1.12
dots.ocr 1.5 VLM Open Source 0.4778 0.5249 0.64080.58010.43690.69130.2754 0.4387 0.47050.75490.0907 0.4956 0.71720.42440.47570.68610.1746 0.40670.7308 0.4560 0.03 0.03
Qwen3.5 Flash VLM Open Source 0.4778 0.5893 0.68300.55400.54630.68400.4790 0.5489 0.57580.71450.3563 0.5289 0.70960.39790.56240.62940.3453 0.48900.6797 0.3353 0.10 0.40
Kimi K2.5 VLM Proprietary 0.4775 0.5080 0.68470.51070.43670.55120.3569 0.5275 0.48140.68990.4113 0.4456 0.69890.38770.41280.48520.2437 0.41430.6266 0.4689 0.40 1.90
Qwen3 VL 8B VLM Open Source 0.4725 0.5233 0.63220.54690.49400.62700.3165 0.4794 0.48030.70680.2512 0.4533 0.64340.37800.49210.55150.2014 0.39960.6277 0.4628 0.08 0.50
Qwen3.5 35B VLM Open Source 0.4720 0.5898 0.68590.54960.54790.68850.4772 0.5472 0.57030.72750.3440 0.5288 0.71870.38770.55870.63480.3442 0.48680.6874 0.3186 0.25 2.00
Step 3.7 Flash (medium) VLM Proprietary 0.4668 0.4705 0.64920.51210.42770.47940.2842 0.5016 0.44410.64770.4129 0.4237 0.65800.38580.45500.40020.2197 0.40020.5871 0.4906 0.20 1.15
GPT-5.5 (auto res., med. reason.) VLM Proprietary 0.4609 0.5469 0.67250.58040.48640.57720.4178 0.5302 0.51870.71020.3617 0.5126 0.71240.45340.48560.58380.3276 0.45960.6927 0.3315 5.00 30.00
MiniMax M3 VLM Open Source 0.4605 0.4482 0.55440.49430.35440.51280.3250 0.4537 0.41870.57780.3647 0.4021 0.62560.35580.33850.46010.2308 0.36600.5722 0.5284 0.30 1.20
Infinity-Parser2 Pro VLM Open Source 0.4592 0.5191 0.65860.51250.47110.59420.3589 0.4559 0.49630.71570.1558 0.4815 0.74210.37560.48670.54440.2586 0.43310.6898 0.4086 / /
dots.ocr VLM Open Source 0.4588 0.5159 0.57730.52300.49550.60400.3799 0.4143 0.47260.69810.0723 0.4718 0.63200.38560.50120.59140.2489 0.40060.6535 0.4393 0.03 0.03
Llama 4 Maverick VLM Open Source 0.4501 0.4585 0.59220.48310.19570.64970.3717 0.4980 0.38900.64000.4651 0.4200 0.56130.34480.26680.62310.3041 0.33470.6055 0.4521 0.15 0.60
Surya OCR 2 (quantized) VLM Open Source 0.4476 0.5801 0.61470.58000.51050.66530.5299 0.4704 0.55430.71800.1389 0.5281 0.65630.41210.51710.68460.3707 0.46070.6851 0.2895 / /
Mistral OCR VLM Proprietary 0.4467 0.4819 0.62880.44040.24730.61140.4815 0.5104 0.42750.66660.4372 0.3843 0.59850.30760.13630.57390.3052 0.30700.5993 0.4597 3.07 3.07
GPT-5 Mini (min. reason.) VLM Proprietary 0.4450 0.5065 0.63690.52200.40560.54300.4252 0.5147 0.48720.64370.4133 0.4321 0.64220.35920.40270.47550.2807 0.40570.5825 0.3922 0.25 2.00
ERNIE 4.5 VL 424B VLM Proprietary 0.4448 0.4575 0.58680.41700.49010.49710.2964 0.4932 0.43730.57710.4653 0.3830 0.54660.28740.48640.38870.2057 0.37810.4806 0.4762 0.42 1.25
Qwen3.5 9B VLM Open Source 0.4269 0.4492 0.62420.39330.45870.54380.2260 0.4731 0.40520.64980.3644 0.3920 0.63370.28030.43550.46600.1446 0.34350.5896 0.4276 0.04 0.15
Qwen3.5 4B VLM Open Source 0.4264 0.4840 0.62180.51220.43950.53800.3086 0.4962 0.45220.64210.3943 0.3991 0.60450.34260.43860.42600.1840 0.36030.5654 0.3901 / /
Qwen3 VL 30B VLM Open Source 0.4261 0.5129 0.64610.52450.49150.59300.3097 0.5053 0.46200.70740.3466 0.4438 0.64390.37390.47850.53720.1852 0.37880.6398 0.3253 0.13 0.52
Qwen3.6 27B VLM Open Source 0.3949 0.4414 0.51130.33870.50420.53950.3131 0.4383 0.39490.57050.3496 0.3781 0.50660.24050.47580.46360.2040 0.32600.5186 0.3669 0.32 3.20
PaddleOCR-VL-1.6 VLM Open Source 0.3944 0.4332 0.58010.41380.52910.41390.2293 0.3728 0.39980.65840.0602 0.4007 0.62570.28920.58700.36620.1355 0.33260.6237 0.3796 0.03 0.03
GPT-5.5 (high res., low reason.) VLM Proprietary 0.3924 0.5328 0.63090.50620.52940.59740.4001 0.5034 0.50450.67900.3268 0.4841 0.65150.39180.50430.60900.2640 0.43160.6431 0.1749 5.00 30.00
Qwen3.6 35B VLM Open Source 0.3896 0.4427 0.58410.31740.45030.53620.3256 0.4377 0.44140.57040.3015 0.3950 0.60520.22100.43530.48780.2254 0.37560.5342 0.3335 0.15 1.00
OlmOCR-2 VLM Open Source 0.3875 0.4850 0.62670.47420.42770.59380.3026 0.4469 0.44330.69230.2052 0.4296 0.67310.34180.44300.52400.1660 0.37330.6373 0.2668 0.09 0.19
Infinity-Parser2 Flash VLM Open Source 0.3867 0.4740 0.63540.48200.42090.54500.2870 0.4264 0.44730.68580.1460 0.4060 0.68510.33750.40620.44720.1540 0.36030.6215 0.3040 / /
Gemma 3 27B VLM Open Source 0.3844 0.3640 0.47750.40970.24360.42690.2626 0.4109 0.31760.49760.4176 0.3039 0.46610.30880.15790.38110.2054 0.26790.4380 0.4618 / /
Qwen3.5 2B VLM Open Source 0.3827 0.4618 0.59820.45050.42830.53140.3008 0.4622 0.43130.62450.3309 0.3818 0.58190.30220.40290.40730.2146 0.35440.5295 0.3044 / /
FireRed-OCR VLM Open Source 0.3782 0.3912 0.56070.36490.41350.46450.1524 0.3665 0.34400.62570.1297 0.3445 0.58760.25430.41050.39920.0708 0.28070.5652 0.4114 / /
PaddleOCR-VL-1.5 VLM Proprietary 0.3733 0.4172 0.55030.38220.50330.41340.2366 0.3622 0.36470.65390.0681 0.3801 0.59750.27790.53580.35960.1297 0.30200.6120 0.3502 0.03 0.03
GLM-OCR VLM Open Source 0.3679 0.3776 0.61550.47880.48910.25220.0522 0.3584 0.37920.60950.0867 0.3272 0.66250.30280.47390.17680.0200 0.31330.5401 0.4085 0.03 0.03
Gemma 4 31B VLM 0.3643 0.4447 0.49690.40050.33310.55580.4371 0.4174 0.41430.54790.2899 0.3582 0.48640.26950.27040.48730.2776 0.31530.4819 0.3036 0.12 0.37
Gemma 4 26B VLM 0.3560 0.3702 0.45240.33520.21580.49890.3489 0.3883 0.31910.49890.3469 0.2992 0.43310.21910.15800.44640.2391 0.24410.4419 0.3896 0.06 0.33
GPT-5.4 (low res., med. reason.) VLM Proprietary 0.3486 0.5151 0.62400.53210.46560.57880.3750 0.4788 0.50370.64920.2834 0.4539 0.64890.37750.44930.54280.2509 0.42490.5969 0.0948 2.50 15.00
GPT-5.4 (high res., med. reason.) VLM Proprietary 0.3406 0.5342 0.63090.54760.50240.59860.3915 0.4922 0.51960.66620.2909 0.4641 0.65720.39260.45880.56330.2484 0.42750.6159 0.0444 2.50 15.00
Qwen-VL-OCR VLM Proprietary 0.3366 0.4702 0.60620.51840.38440.59060.2513 0.4419 0.42070.67510.2298 0.4019 0.60930.34910.39880.51170.1406 0.33150.6081 0.1519 0.07 0.16
GPT-5.4 (auto res., med. reason.) VLM Proprietary 0.3352 0.5213 0.63910.55430.45200.58700.3739 0.4861 0.51270.65790.2877 0.4560 0.65700.39530.44380.55510.2286 0.42460.6047 0.0458 2.50 15.00
ERNIE 4.5 VL 28B VLM Proprietary 0.3321 0.4077 0.61190.28040.47880.40050.2667 0.4453 0.43000.53530.3706 0.3200 0.56640.13180.47540.23460.1916 0.35320.4168 0.2498 0.14 0.56
GPT-5.4 (high res., low reason.) VLM Proprietary 0.3261 0.5137 0.62200.55460.46730.55070.3738 0.4762 0.50630.64370.2787 0.4526 0.63890.40560.44640.54190.2302 0.42210.5925 0.0307 2.50 15.00
GPT-5.4 (orig. res., med. reason.) VLM Proprietary 0.3196 0.4963 0.64030.54730.36350.56770.3626 0.4697 0.48870.64840.2719 0.4236 0.65400.38450.33040.52230.2269 0.39210.5882 0.0522 2.50 15.00
Mistral Small 2603 VLM 0.3187 0.2729 0.48250.35510.05750.32460.1449 0.3550 0.22480.47120.3689 0.2312 0.47760.27280.03780.28680.0808 0.17720.4374 0.4110 0.15 0.60
GPT-5.2 (low res., med. reason.) VLM Proprietary 0.3132 0.4488 0.57900.47870.30310.56860.3147 0.4349 0.41480.61600.2738 0.4020 0.59200.34920.32950.51530.2238 0.35770.5618 0.0959 1.75 14.00
GPT-5.2 (high res., med. reason.) VLM Proprietary 0.3097 0.4407 0.57010.44670.33180.54600.3090 0.4350 0.40390.60750.2935 0.3968 0.57590.33010.36980.51220.1958 0.34610.5597 0.0943 1.75 14.00
Qwen3.5 0.8B VLM Open Source 0.3043 0.3851 0.52200.39830.36920.43530.2008 0.3864 0.34210.56740.2496 0.3080 0.51370.26800.31660.30900.1329 0.27640.4656 0.2190 / /
GPT-5.2 (auto res., med. reason.) VLM Proprietary 0.3035 0.4509 0.58270.48530.32250.53880.3250 0.4371 0.43290.59770.2807 0.4113 0.58870.34700.36690.51110.2428 0.37520.5537 0.0553 1.75 14.00
OCRVerse VLM Open Source 0.3020 0.2820 0.53600.20860.38010.27210.0133 0.2906 0.29400.48860.0892 0.2625 0.56270.11240.40430.23150.0015 0.26120.4330 0.3572 / /
Nemotron-3-Nano-Omni VLM 0.2973 0.3138 0.52140.32970.29990.29370.1244 0.3439 0.29790.50470.2292 0.2638 0.51650.21250.32470.18920.0759 0.25640.4180 0.2992 ? ?
GPT-5.4 Mini (auto res., no reason.) VLM 0.2764 0.4385 0.59980.49990.25940.51960.3139 0.4416 0.40870.61820.2979 0.3737 0.60070.34600.28880.43640.1965 0.33710.5456 0.0155 0.75 4.50
GPT-5 Nano (min. reason.) VLM Proprietary 0.2742 0.2799 0.44910.36530.16490.24590.1740 0.3291 0.25670.43370.2970 0.2238 0.43270.24180.11960.19300.1318 0.20270.3692 0.2944 0.05 0.40
Qianfan-OCR Fast Classical OCR Open Source 0.2683 0.4244 0.47480.44200.53380.43360.2379 0.3505 0.38970.57110.0908 0.3405 0.47890.28840.49510.31200.1280 0.30510.4687 0.0768 / /
Gemma 3 4B VLM Open Source 0.2596 0.2390 0.40000.27820.05220.27260.1918 0.3215 0.22330.34720.3939 0.1920 0.37460.20640.03030.20630.1425 0.18370.3023 0.3066 / /
LightOnOCR-2 VLM Open Source 0.2587 0.2708 0.39780.32820.11610.32010.1915 0.2733 0.20620.47480.1389 0.2430 0.42630.22270.11270.33180.1215 0.17150.4383 0.2610 / /
GPT-5 Mini (med. reason.) VLM Proprietary 0.2550 0.3845 0.55290.37370.27810.55960.1583 0.3886 0.37440.53440.2571 0.3347 0.57080.24060.27280.49680.0925 0.31350.4847 0.0438 0.25 2.00
GPT-5.4 Nano (auto res., no reason.) VLM 0.1939 0.2693 0.47870.39080.09230.20040.1841 0.3271 0.25970.43490.2865 0.2187 0.46070.27290.07090.16230.1266 0.19850.3820 0.0650 0.20 1.25
Grok 4.2 Fast VLM Proprietary 0.1937 0.2452 0.29880.26510.23030.30760.1240 0.2407 0.20480.35140.1657 0.2065 0.30180.16090.22270.25500.0918 0.17930.2945 0.1317 2.00 10.00
DeepSeek-OCR2 VLM Open Source 0.1759 0.2034 0.36370.30220.04200.12770.1815 0.2675 0.17110.35590.2755 0.1550 0.35190.18520.02520.09710.1157 0.12730.3047 0.1373 0.03 0.03
MinerU2.5-Pro VLM Open Source 0.1647 0.2222 0.38680.21270.22400.26860.0186 0.2473 0.18800.40050.1533 0.2066 0.39520.12580.26220.24530.0044 0.16500.3671 0.0529 / /
Nemotron-Nano-12B VLM Open Source 0.1296 0.1464 0.35280.18130.02650.10780.0637 0.2225 0.13330.30170.2324 0.1209 0.35920.11590.02790.07600.0254 0.11090.2681 0.0835 0.20 0.20
PP-OCRv5 Classical OCR Proprietary 0.1170 0.1623 0.41150.19800.13720.04480.0203 0.2138 0.19320.31320.1349 0.1630 0.42690.04270.33670.00860.0002 0.18530.2880 0.0000 0.03 0.03
PaddleOCR-VL-0.9B VLM Open Source 0.1083 0.1422 0.31650.15290.09170.06860.0815 0.1986 0.13430.27780.1838 0.1007 0.30030.07380.08390.02690.0187 0.08570.2362 0.0537 0.13 0.13
Tesseract v5 Classical OCR Open Source 0.0980 0.1461 0.40040.08680.04340.10230.0979 0.2039 0.13490.35680.1199 0.0813 0.39410.00350.00050.00800.0004 0.05490.2956 0.0376 / /
DeepSeek-OCR VLM Open Source 0.0855 0.1033 0.21820.10800.02950.07710.0835 0.1601 0.09470.18240.2032 0.0738 0.21720.05770.02120.04430.0288 0.06630.1650 0.0510 0.03 0.03
Molmo 2 8B VLM Open Source 0.0741 0.0939 0.26230.07390.00690.07210.0542 0.1625 0.09410.20010.1934 0.0544 0.23130.01670.00090.02050.0025 0.04880.1619 0.0396 / /
GPT-5 Nano (med. reason.) VLM Proprietary 0.0288 0.0451 0.12530.03210.00490.03730.0259 0.0640 0.05130.09580.0450 0.0240 0.11530.00170.00020.00280.0001 0.02240.0779 0.0077 0.05 0.40
Model socOCRbench Region Europe E. Asia S. Asia SE Asia MENA E. Africa Period Pre-mod. Historical Contemp. Format HW Text Print Text Print Tbl HW Tbl $/M In $/M Out
Gemini 3.1 Pro (low) VLM Proprietary 0.7009 0.7009 0.7070 0.6447 0.7189 0.7136 0.7204 0.6654 0.6952 0.5641 0.6762 0.8271 0.7024 0.6703 0.9415 0.8295 0.3561 2.00 12.00
Gemini 3 Pro (low) VLM Proprietary 0.7134 0.7134 0.6721 0.6908 0.7544 0.7463 0.7033 0.5976 0.6781 0.5882 0.6307 0.8107 0.6706 0.6551 0.9440 0.7914 0.2862 2.00 12.00
Gemini 3 Flash (low) VLM Proprietary 0.6967 0.6967 0.6754 0.6587 0.7460 0.6632 0.7400 0.6643 0.6738 0.5635 0.6400 0.8165 0.6736 0.6572 0.9218 0.8232 0.2916 0.50 3.00
Gemini 3 Pro (high) VLM Proprietary 0.6894 0.6894 0.6783 0.7297 0.6642 0.7088 0.6662 0.6650 0.6757 0.5682 0.6352 0.8189 0.6746 0.6586 0.9177 0.8472 0.2691 2.00 12.00
Gemini 3 Flash (minimal) VLM Proprietary 0.6903 0.6903 0.6700 0.6606 0.7571 0.6743 0.6893 0.6575 0.6705 0.5788 0.6325 0.7945 0.6619 0.6487 0.9253 0.7802 0.2946 0.50 3.00
Gemini 3 Flash (high) VLM Proprietary 0.6686 0.6686 0.6574 0.6451 0.7455 0.6079 0.6871 0.7298 0.6571 0.5689 0.6120 0.7782 0.6217 0.6450 0.9169 0.7360 0.1943 0.50 3.00
Gemini 3.1 Flash Lite VLM Proprietary 0.6606 0.6606 0.6551 0.6735 0.6492 0.7143 0.6109 0.6146 0.6425 0.5206 0.6018 0.8201 0.6737 0.6216 0.8842 0.8879 0.3088 0.25 1.50
Nano Banana 2 VLM Proprietary 0.5635 0.5635 0.6108 0.5993 0.5712 0.6090 0.4271 0.5348 0.5748 0.4408 0.5524 0.7446 0.6291 0.5483 0.8402 0.8738 0.2525 0.50 3.00
Gemini 2.0 Flash VLM Proprietary 0.6309 0.6309 0.6117 0.6283 0.7427 0.5819 0.5898 0.2326 0.5827 0.4916 0.5693 0.7055 0.6372 0.5339 0.8704 0.8543 0.2711 0.10 0.40
Claude Sonnet 4.6 VLM Proprietary 0.5791 0.5791 0.6708 0.6649 0.3461 0.6341 0.5795 0.3809 0.6064 0.4635 0.6182 0.7324 0.6568 0.5738 0.8076 0.8646 0.3806 3.00 15.00
Seed 2.0 Mini VLM Proprietary 0.5221 0.5221 0.5663 0.5928 0.5013 0.5326 0.4173 0.5695 0.5220 0.4067 0.5074 0.6967 0.5719 0.4882 0.8648 0.7324 0.2083 0.10 0.40
Claude Opus 4.6 VLM Proprietary 0.5590 0.5590 0.6571 0.6759 0.2823 0.5700 0.6099 0.2579 0.5905 0.4512 0.6110 0.6884 0.6242 0.5578 0.7852 0.8620 0.2860 5.00 25.00
Seed 2.0 Pro VLM Proprietary 0.5300 0.5300 0.5834 0.6921 0.4448 0.4795 0.4504 0.4309 0.5160 0.3857 0.5285 0.7038 0.6104 0.4760 0.8650 0.8314 0.2703 0.47 2.37
Qwen3.5-397B (thinking) 397B VLM Open Source 0.5452 0.5452 0.6111 0.6739 0.4912 0.4548 0.4951 0.1950 0.5444 0.4375 0.5711 0.6387 0.6089 0.4968 0.8072 0.8283 0.2826 0.60 3.60
Gemini 2.5 Flash VLM Proprietary 0.5260 0.5260 0.6019 0.6280 0.3612 0.4876 0.5514 0.1352 0.5238 0.3388 0.5678 0.6632 0.5933 0.4946 0.7361 0.8466 0.2777 0.30 2.50
Qwen3.5-122B 122B VLM Open Source 0.5386 0.5386 0.6181 0.6829 0.4321 0.4451 0.5150 0.1827 0.5502 0.4639 0.5653 0.6326 0.6179 0.4880 0.8292 0.8529 0.2792 0.40 3.20
Qwen3.5-397B 397B VLM Open Source 0.5261 0.5261 0.5899 0.6673 0.4787 0.4065 0.4881 0.1957 0.5216 0.4189 0.5453 0.6307 0.5964 0.4743 0.7949 0.8310 0.2698 0.60 3.60
Seed 2.0 Lite VLM Proprietary 0.4798 0.4798 0.5496 0.6361 0.3388 0.4753 0.3991 0.4287 0.4915 0.3606 0.4997 0.6577 0.5718 0.4521 0.8026 0.8220 0.2091 0.09 0.53
Qwen3.5-35B 35B VLM Open Source 0.5367 0.5367 0.5955 0.6737 0.4841 0.3780 0.5521 0.0690 0.5409 0.4638 0.5492 0.5927 0.5826 0.4823 0.7792 0.8047 0.2307 0.25 2.00
Qwen3.5-Plus VLM Proprietary 0.5205 0.5205 0.6303 0.6582 0.3997 0.4647 0.4494 0.1498 0.5491 0.4429 0.5736 0.6296 0.6062 0.4952 0.7976 0.8418 0.2684 0.80 2.00
Qwen3.5-Flash VLM Open Source 0.5342 0.5342 0.5930 0.6731 0.4865 0.3770 0.5416 0.0690 0.5395 0.4635 0.5461 0.5924 0.5821 0.4808 0.7750 0.8142 0.2251 0.10 0.40
Qwen3-VL-235B 235B VLM Open Source 0.5285 0.5285 0.6368 0.7218 0.3922 0.4066 0.4853 0.0976 0.5555 0.4760 0.5799 0.6138 0.6237 0.4934 0.7958 0.8817 0.2978 0.20 0.88
Qwen3.5-9B (no reasoning) 9B VLM Open Source 0.5149 0.5149 0.5972 0.7052 0.4288 0.3622 0.4812 0.0870 0.5273 0.4436 0.5470 0.5959 0.5930 0.4658 0.7838 0.8463 0.2506 / /
ERNIE 4.5 VL 424B 424B VLM Proprietary 0.5016 0.5016 0.6435 0.6928 0.4044 0.2192 0.5481 0.0576 0.5556 0.4744 0.5866 0.5613 0.5802 0.5032 0.7246 0.8242 0.2327 0.42 1.25
GPT-5.4 VLM Proprietary 0.5026 0.5026 0.6277 0.7608 0.2120 0.4003 0.5124 0.0969 0.5165 0.3357 0.5844 0.6319 0.5746 0.5000 0.7649 0.8167 0.2331 2.50 15.00
Qwen3.5-27B 27B VLM Open Source 0.5162 0.5162 0.6080 0.6913 0.4533 0.3577 0.4705 0.0131 0.5299 0.4402 0.5574 0.5863 0.6082 0.4553 0.7902 0.8621 0.2903 0.30 2.40
Qwen3-VL-8B 8B VLM Open Source 0.5049 0.5049 0.5396 0.8490 0.3524 0.3390 0.4445 0.0856 0.4422 0.3718 0.4771 0.5851 0.5766 0.3941 0.8176 0.8530 0.2513 0.08 0.50
Qwen3-VL-30B 30B VLM Open Source 0.4848 0.4848 0.5953 0.7066 0.4107 0.2800 0.4316 0.1088 0.5111 0.4380 0.5441 0.5650 0.5843 0.4507 0.7529 0.8528 0.2570 0.13 0.52
Qwen3.5-4B (no reasoning) 4B VLM Open Source 0.4708 0.4708 0.5846 0.6837 0.3981 0.2815 0.4059 0.0857 0.4980 0.4048 0.5271 0.5698 0.5732 0.4372 0.7472 0.8279 0.2538 / /
GPT-5.2 (high) VLM Proprietary 0.4320 0.4320 0.5834 0.6594 0.0823 0.3493 0.4858 0.0839 0.4770 0.2951 0.5450 0.5833 0.5486 0.4467 0.6697 0.8264 0.2195 1.75 14.00
ERNIE 4.5 VL 28B 28B VLM Proprietary 0.4697 0.4697 0.5166 0.7791 0.3545 0.2767 0.4216 0.0619 0.4138 0.3437 0.4518 0.5560 0.5444 0.3677 0.8017 0.7952 0.2251 0.14 0.56
Qwen3.5-2B (no reasoning) 2B VLM Open Source 0.4502 0.4502 0.6182 0.6202 0.3731 0.2448 0.3948 0.0484 0.4766 0.3799 0.5760 0.5161 0.5951 0.4157 0.7189 0.8440 0.3788 / /
Sarvam Vision VLM Proprietary 0.4533 0.4533 0.5746 0.4961 0.5608 0.1437 0.4912 0.0438 0.4785 0.4116 0.5095 0.5379 0.5367 0.4363 0.6899 0.7996 0.2181 / /
Llama 4 Maverick 17Bx128E VLM Open Source 0.3827 0.3827 0.5484 0.4513 0.1262 0.4344 0.3533 0.3066 0.4350 0.2615 0.5102 0.5740 0.5383 0.3998 0.6849 0.8031 0.2665 0.15 0.60
dots.ocr VLM Open Source 0.4444 0.4444 0.5866 0.6893 0.4162 0.2657 0.2643 0.0438 0.4974 0.4834 0.5066 0.5329 0.5556 0.4328 0.7678 0.8259 0.2145 / /
OlmOCR-2 7B VLM Open Source 0.4083 0.4083 0.6170 0.6604 0.1519 0.2531 0.3589 0.0632 0.4868 0.3437 0.5558 0.5297 0.5282 0.4488 0.6704 0.7390 0.2195 / /
GPT-5.2 (low) VLM Proprietary 0.4000 0.4000 0.5640 0.6379 0.0816 0.2801 0.4366 0.0187 0.4495 0.2591 0.5166 0.5553 0.5223 0.4189 0.6228 0.8011 0.2083 1.75 14.00
Qwen-VL-OCR VLM Proprietary 0.4103 0.4103 0.5620 0.6426 0.3564 0.1609 0.3294 0.0085 0.4562 0.3698 0.5010 0.4999 0.5186 0.3972 0.7102 0.7663 0.1660 0.07 0.16
Qwen3.5-0.8B (no reasoning) 0.9B VLM Open Source 0.3787 0.3787 0.5607 0.6072 0.2922 0.1575 0.2759 0.0388 0.4369 0.3458 0.5061 0.4617 0.4836 0.3887 0.6794 0.6650 0.1857 / /
GPT-5.3 Codex VLM Proprietary 0.3459 0.3459 0.5164 0.6209 0.0751 0.2527 0.2643 0.0000 0.3956 0.1736 0.4717 0.5333 0.4995 0.3662 0.5521 0.8357 0.2000 1.75 14.00
PaddleOCR-VL-1.5 VLM Open Source 0.3097 0.3097 0.4845 0.4084 0.1493 0.1717 0.3347 0.0248 0.3688 0.2394 0.4318 0.3724 0.3284 0.3491 0.5913 0.2006 0.1280 / /
GLM-OCR 0.9B VLM Open Source 0.2683 0.2683 0.5421 0.4205 0.0412 0.1746 0.1629 0.0504 0.3848 0.2346 0.4754 0.3641 0.3402 0.3662 0.5810 0.2202 0.1492 0.03 0.03
FireRed-OCR 2B VLM Open Source 0.2839 0.2839 0.5020 0.5785 0.0070 0.1078 0.2243 0.0082 0.3483 0.2041 0.4581 0.3563 0.3464 0.3266 0.5826 0.3131 0.1292 / /
Mistral OCR VLM Proprietary 0.2816 0.2816 0.4644 0.4102 0.0356 0.1263 0.3713 0.0615 0.3205 0.1429 0.4526 0.4474 0.4668 0.2887 0.5483 0.8719 0.1809 1.00 1.00
PaddleOCR-VL VLM Open Source 0.2420 0.2420 0.3731 0.3815 0.0114 0.1036 0.3402 0.0448 0.2768 0.1299 0.3078 0.3657 0.2685 0.2842 0.4066 0.2336 0.1294 / /
Gemma 3 27B 27B VLM Open Source 0.2761 0.2761 0.4593 0.3059 0.0943 0.2070 0.3139 0.0790 0.3229 0.1933 0.4050 0.4395 0.4440 0.2931 0.5009 0.7934 0.2201 0.20 0.40
PP-OCRv5 Classical OCR Open Source 0.2383 0.2383 0.4748 0.6559 0.0016 0.0348 0.0243 0.0296 0.3197 0.1596 0.4529 0.3930 0.4563 0.2669 0.5575 0.7961 0.1905 / /
Nemotron-Nano-12B 12B VLM Open Source 0.2357 0.2357 0.4568 0.4845 0.0370 0.1029 0.0974 0.0160 0.2950 0.1036 0.4339 0.4087 0.4401 0.2542 0.5510 0.7066 0.2291 0.20 0.20
Gemma 3 4B 4B VLM Open Source 0.2428 0.2428 0.4442 0.2563 0.0643 0.1439 0.3055 0.0608 0.2962 0.1766 0.3905 0.3896 0.3987 0.2735 0.4705 0.6860 0.1974 0.07 0.14
PaddleOCR-VL-0.9B 0.9B VLM Open Source 0.1824 0.1824 0.3663 0.3914 0.0201 0.0415 0.0925 0.0338 0.2353 0.0814 0.3588 0.3019 0.3162 0.1987 0.5310 0.3901 0.1187 / /
DeepSeek-OCR 1.3B VLM Open Source 0.1883 0.1883 0.2574 0.4116 0.1256 0.0670 0.0797 0.0254 0.1970 0.0991 0.2278 0.2725 0.2484 0.1768 0.3228 0.3155 0.1640 0.03 0.03
DeepSeek-OCR2 VLM Open Source 0.2412 0.2412 0.3680 0.5368 0.0739 0.0625 0.1649 0.0370 0.1823 0.0980 0.2878 0.3466 0.3319 0.1670 0.5477 0.5290 0.1293 0.03 0.03
Kimi K2.5 VLM Proprietary 0.1231 0.1231 0.2094 0.3004 0.0323 0.0497 0.0238 0.0000 0.1365 0.0347 0.1414 0.2715 0.1821 0.1448 0.2229 0.3720 0.0168 0.40 1.90
LayoutParser Classical OCR Open Source 0.1029 0.1029 0.3033 0.0510 0.0220 0.0536 0.0848 0.0732 0.1677 0.0892 0.2705 0.1650 0.2027 0.1372 0.3988 0.1187 0.1462 / /
Tesseract v5 Classical OCR Open Source 0.1030 0.1030 0.3036 0.0510 0.0220 0.0536 0.0848 0.0732 0.1673 0.0892 0.2709 0.1627 0.2004 0.1372 0.3988 0.1081 0.1479 / /
Molmo 2 8B 8B VLM Open Source 0.1162 0.1162 0.3275 0.1036 0.0188 0.0632 0.0681 0.0179 0.1863 0.0980 0.2757 0.2292 0.2544 0.1668 0.3405 0.3728 0.1495 / /
EfficientOCR Classical OCR Open Source 0.0400 0.0400 0.0979 0.0186 0.0168 0.0292 0.0374 0.0213 0.0568 0.0294 0.1057 0.0474 0.0806 0.0438 0.1236 0.0628 0.0944 / /
Model Overall HW [A] HW [B] Tables [A] Tables [B] $/M In $/M Out
Gemini 3.1 Pro (low) VLM Proprietary 0.5965 0.6959 0.5260 0.9636 0.3640 2.00 12.00
Gemini 3 Flash (low) VLM Proprietary 0.5522 0.6369 0.5153 0.9717 0.2737 0.50 3.00
Gemini 3 Pro (low) VLM Proprietary 0.5472 0.6445 0.5289 0.8936 0.2667 2.00 12.00
Gemini 3 Pro (high) VLM Proprietary 0.5406 0.6589 0.5106 0.8989 0.2400 2.00 12.00
Gemini 2.0 Flash VLM Proprietary 0.5100 0.5849 0.4312 0.9471 0.2850 0.10 0.40
Gemini 3 Flash (high) VLM Proprietary 0.4808 0.6049 0.4750 0.7551 0.1862 0.50 3.00
Gemini 2.5 Flash VLM Proprietary 0.4670 0.5390 0.3689 0.9107 0.2650 0.30 2.50
Qwen3.5-397B (thinking) 397B VLM Open Source 0.4598 0.6128 0.2734 0.9016 0.2552 0.60 3.60
Qwen3-VL-235B 235B VLM Open Source 0.4581 0.6174 0.1969 0.9398 0.3132 0.20 0.88
Qwen3.5-397B 397B VLM Open Source 0.4484 0.6055 0.2409 0.9301 0.2432 0.60 3.60
Qwen3.5-27B 27B VLM Open Source 0.4340 0.5733 0.1976 0.9368 0.2760 0.30 2.40
Qwen3.5-35B 35B VLM Open Source 0.4296 0.5784 0.2273 0.9275 0.2210 0.25 2.00
Qwen3.5-Flash VLM Proprietary 0.4280 0.5763 0.2247 0.9247 0.2220 0.10 0.40
Qwen3.5-122B 122B VLM Open Source 0.4264 0.5637 0.1954 0.9195 0.2692 0.40 3.20
Qwen3-VL-30B 30B VLM Open Source 0.4194 0.5716 0.1699 0.9104 0.2653 0.13 0.52
Qwen3-VL-8B 8B VLM Open Source 0.4105 0.5705 0.1637 0.8456 0.2709 0.08 0.50
ERNIE 4.5 VL 424B 424B VLM Proprietary 0.4057 0.5468 0.1896 0.9062 0.2222 0.55 2.20
Seed 2.0 Mini VLM Proprietary 0.4882 0.5695 0.2083 0.8648 0.4023 0.10 0.40
Seed 2.0 Pro VLM Proprietary 0.4058 0.4582 0.2601 0.9513 0.2350 0.47 2.37
GPT-5.2 (high) VLM Proprietary 0.3932 0.5190 0.1959 0.8951 0.2074 1.75 14.00
Llama 4 Maverick 400B VLM Open Source 0.3911 0.4563 0.1884 0.9358 0.2704 0.15 0.60
ERNIE 4.5 VL 28B 28B VLM Proprietary 0.3837 0.5256 0.1537 0.8735 0.2212 0.55 2.20
GPT-5.2 (low) VLM Proprietary 0.3759 0.5140 0.1693 0.8445 0.2014 1.75 14.00
Seed 2.0 Lite VLM Proprietary 0.3724 0.4653 0.1918 0.9013 0.1970 0.09 0.53
dots.ocr-1.5 3B VLM Open Source 0.3720 0.5670 0.1423 0.7768 0.1810 / /
Qwen-VL-OCR VLM Proprietary 0.3525 0.5082 0.1105 0.8696 0.1720 0.07 0.16
Datalab (accurate) VLM Proprietary 0.3477 0.3575 0.1854 0.9260 0.2363 ? ?
dots.ocr 3B VLM Open Source 0.3328 0.4193 0.0751 0.9150 0.2299 / /
GLM-OCR 0.8B VLM Open Source 0.3247 0.4369 0.0693 0.8200 0.2283 0.03 0.03
PaddleOCR-VL-1.5 0.9B VLM Open Source 0.3131 0.4066 0.1155 0.7949 0.1805 / /
Mistral OCR VLM Proprietary 0.3006 0.3103 0.1063 0.9760 0.1784 $2/1K pages
Datalab (fast) Classical OCR Proprietary 0.2975 0.3031 0.0837 0.9169 0.2317 ? ?
PaddleOCR-VL-0.9B 0.9B VLM Open Source 0.2864 0.3231 0.0856 0.8047 0.2145 / /
Sarvam Vision VLM Proprietary 0.2735 0.2851 0.1899 0.7194 0.1335 / /
FireRed-OCR 2B VLM Open Source 0.2585 0.3266 0.1292 0.5826 0.3131 / /
PP-OCRv5 Classical OCR Open Source 0.2501 0.3400 0.0069 0.7319 0.1759 / /
Nemotron-Nano-12B 12B VLM Open Source 0.2177 0.2568 0.0127 0.6273 0.2020 0.20 0.20
OlmOCR-2 7B VLM Open Source 0.2087 0.3405 0.0696 0.2791 0.1624 / /
LightOnOCR-2-1B 1B VLM Open Source 0.2024 0.3138 0.1005 0.4828 0.0345 / /
DeepSeek-OCR 3B VLM Open Source 0.1997 0.2534 0.0336 0.5004 0.1732 0.03 0.03
Kimi K2.5 VLM Proprietary 0.1336 0.2081 0.0415 0.3944 0.0129 0.40 1.90
DeepSeek-OCR2 3B VLM Open Source 0.1181 0.0995 0.0303 0.3715 0.1191 0.03 0.03
Tesseract Classical OCR Open Source 0.1091 0.1037 0.0285 0.1696 0.1808 / /
Layout Parser Classical OCR Open Source 0.1082 0.1037 0.0285 0.1609 0.1815 / /
EfficientOCR Classical OCR Open Source 0.0635 0.0509 0.0108 0.1104 0.1187 / /

API Cost Note: Pricing shown per million tokens ($/M In = input cost, $/M Out = output cost). "/" indicates free/open-source models. "?" indicates proprietary pricing (contact vendor). Costs are approximate and may vary by provider or region. Data collected February 2026.

Cost vs Performance

Average token cost = (input + output price per 1M tokens) / 2. Only models with published API pricing are included. Log scale on x-axis.

v3
v2
v1

Legend: • Google   • Anthropic   • OpenAI   • Qwen   • Mistral   • Other   • Traditional OCR

Methodology & Reproduction

All VLM models receive the same prompts. For handwriting samples: Transcribe all the text in this image exactly as written. Output ONLY the transcribed text, nothing else. For table samples: OCR this document image into a markdown table. Transcribe all text exactly as written. Output ONLY the markdown table, nothing else. Image-only models (dots.ocr, OlmOCR-2) receive no text prompt. dots.ocr-1.5 uses its native prompt_ocr prompt (Extract the text content from this image.). DeepSeek-OCR2 uses its native prompts (Free OCR. for handwriting, Convert the document to markdown. for tables). The metric is Normalized Edit Similarity (NES): 1 - edit_distance(pred, gt) / max(len(pred), len(gt)).

v2 Scoring

The headline socOCRbench score is a region macro-average: the mean of per-region dataset means, so that each region counts equally regardless of sample count. Regions: Europe, East Asia, South Asia, Southeast Asia, MENA, East Africa. Period and format breakdowns are shown as supplementary columns. Periods: Pre-modern (<1700), Historical (1700–1950), Contemporary (post-1950). Formats: Handwritten text, Printed text, Printed tables, Handwritten tables.

v1 Scoring

A simpler weighted average across HW [A], HW [B], Tables [A], and Tables [B] categories.

Model Provider Model ID / Notes
Gemini 3.1 Pro (low) OpenRouter google/gemini-3.1-pro-preview, reasoning_effort=low
Gemini 3 Pro (low) OpenRouter google/gemini-3-pro-preview, reasoning_effort=low
Gemini 3 Pro (high) OpenRouter google/gemini-3-pro-preview, reasoning_effort=high
Gemini 2.5 Flash Google GenAI SDK gemini-2.5-flash via genai.Client
Gemini 2.0 Flash OpenRouter google/gemini-2.0-flash-001
Gemini 3 Flash (low) OpenRouter google/gemini-3-flash-preview, reasoning_effort=low
Gemini 3 Flash (minimal) OpenRouter google/gemini-3-flash-preview, reasoning_effort=minimal
Gemini 3 Flash (high) OpenRouter google/gemini-3-flash-preview, reasoning_effort=high
GPT-5.3 Codex OpenRouter openai/gpt-5.3-codex
GPT-5.2 (low) OpenRouter openai/gpt-5.2, reasoning_effort=low
GPT-5.2 (high) OpenRouter openai/gpt-5.2, reasoning_effort=high
Claude Sonnet 4.6 OpenRouter anthropic/claude-sonnet-4.6
Claude Opus 4.6 OpenRouter anthropic/claude-opus-4.6
Qwen3-VL-235B OpenRouter qwen/qwen3-vl-235b-a22b-instruct
Qwen3-VL-30B DeepInfra Qwen/Qwen3-VL-30B-A3B-Instruct
Seed 2.0 Pro ZenMux volcengine/doubao-seed-2.0-pro
Seed 2.0 Lite ZenMux volcengine/doubao-seed-2.0-lite
Seed 2.0 Mini OpenRouter bytedance/seed-2.0-mini
dots.ocr-1.5 Local (vLLM) rednote-hilab/dots.ocr-1.5, native prompt_ocr
dots.ocr Replicate sljeff/dots.ocr, image-only input
GLM-OCR Zhipu AI glm-ocr via layout parsing API
PaddleOCR-VL-1.5 PaddlePaddle Cloud Layout parsing API (aistudio-app.com)
PaddleOCR-VL-0.9B DeepInfra PaddlePaddle/PaddleOCR-VL-0.9B
Nemotron-Nano-12B DeepInfra nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL
OlmOCR-2 DeepInfra allenai/olmOCR-2-7B-1025, image-only input
LightOnOCR-2-1B Local (transformers) lightonai/LightOnOCR-2-1B, bfloat16, CUDA
LightOnOCR-2-1B Local (Ollama) maternion/LightOnOCR-2, Q4 GGUF via Ollama
Qwen3.5-2B (no reasoning) Local (Ollama) qwen3.5:2b, Q4 GGUF via Ollama, thinking disabled
FireRed-OCR HF Spaces (T4) FireRedTeam/FireRed-OCR, Gradio API
DeepSeek-OCR DeepInfra deepseek-ai/DeepSeek-OCR
DeepSeek-OCR2 Novita AI deepseek/deepseek-ocr-2, native prompts
Qwen3-VL-8B Novita AI qwen/qwen3-vl-8b-instruct
Llama 4 Maverick Novita AI meta-llama/llama-4-maverick-17b-128e-instruct-fp8
ERNIE 4.5 VL 424B Novita AI baidu/ernie-4.5-vl-424b-a47b
ERNIE 4.5 VL 28B Novita AI baidu/ernie-4.5-vl-28b-a3b
Sarvam Vision Sarvam AI Document Intelligence API, output_format=md
PP-OCRv5 PaddlePaddle Cloud OCR API (aistudio-app.com)
Tesseract Local Tesseract 5, eng+fra+deu+nor, psm 6
Layout Parser Local Detectron2 (PubLayNet Faster R-CNN) + Tesseract
EfficientOCR Local Tiled word-level FAISS KNN recognition
Qwen3.5-397B Novita AI / OpenRouter qwen/qwen3.5-397b-a17b, thinking disabled
Qwen3.5-397B (thinking) Novita AI / OpenRouter qwen/qwen3.5-397b-a17b, thinking enabled, max_tokens=32768
Qwen3.5-27B Alibaba Cloud Model Studio qwen3.5-27b, thinking disabled
Qwen3.5-35B Alibaba Cloud Model Studio qwen3.5-35b-a3b, thinking disabled
Qwen3.5-Flash Alibaba Cloud Model Studio qwen3.5-flash, thinking disabled
Qwen3.5-122B Alibaba Cloud Model Studio qwen3.5-122b-a10b, thinking disabled
Qwen-VL-OCR Alibaba Cloud Model Studio qwen-vl-ocr, min_pixels=3072, max_pixels=8388608
Kimi K2.5 OpenRouter moonshotai/kimi-k2.5
Datalab (fast) Datalab API datalab-python-sdk, mode=fast
Datalab (balanced) Datalab API datalab-python-sdk, mode=balanced
Datalab (accurate) Datalab API datalab-python-sdk, mode=accurate
Infinity-Parser2 Flash / Pro HF Space (infly) infly/Infinity-Parser2-Demo via gradio_client, doc2json task
MinerU2.5-Pro Local (A100) opendatalab/MinerU2.5-Pro-2605-1.2B via vLLM
Mistral OCR Mistral AI mistral-ocr-latest via mistralai SDK, image-only input
Mistral OCR 4 Mistral AI mistral-ocr-4-0 via mistralai SDK, image-only input
Mistral Small 2603 OpenRouter mistralai/mistral-small-2603
GPT-5.4 Mini (auto res., no reason.) OpenAI gpt-5.4-mini, detail=auto, reasoning_effort=none
GPT-5.4 Nano (auto res., no reason.) OpenAI gpt-5.4-nano, detail=auto, reasoning_effort=none
MiniMax M3 OpenRouter minimax/minimax-m3
Qwen3.7 Plus OpenRouter qwen/qwen3.7-plus
Step 3.7 Flash (medium) OpenRouter stepfun/step-3.7-flash, reasoning mandatory (default medium), max_tokens up to 65536
Changelog

June 24, 2026: Added MinerU2.5-Pro (0.1647), a 1.2B local parser that tops OmniDocBench v1.6 but handles only Latin and CJK print, returning almost nothing on the non-Latin historical scripts that fill much of this benchmark. Corrected Kimi K2.5 (0.0969 to 0.4775): its old score was an artifact of provider timeouts that blanked 229 of 280 pages; re-running with throughput routing and reasoning off filled every page. The same routing fix raised Qwen3.6 27B (0.2810 to 0.3949), though it still returns blank on many non-Latin pages. Added an eval step that strips structural markup tokens (e.g. MinerU's <|txt_start|>) before scoring; no ground-truth text contains them.

June 23, 2026: Added Mistral OCR 4 (0.5055) via the Mistral API at $4/1000 pages, a clear gain over the previous Mistral OCR (0.4467), with much stronger tables (TEDS 0.45 vs 0.13). Strong on printed text (0.7159) and the best Mistral result on handwriting to date.

June 18, 2026: Added MiniMax M3 (0.4605) via OpenRouter at $0.30/$1.20 per million tokens. Also hardened the OCR prompt and added an eval-time preamble stripper: some vision models prefix their output with meta-commentary (“This is a Hebrew manuscript page… Here’s my best attempt:”) before the actual transcription, which inflated edit distance. The stripper removes a leading commentary block only when a substantial transcription follows (≥200 chars), so genuine refusals still score near zero. No benchmark ground-truth text begins with a trigger phrase, so it never removes real OCR. Net leaderboard effect was negligible (preambles cluster on the hardest manuscripts, where even cleaned output scores low), but it makes the scoring fair to models that did the work.

June 11, 2026: Added Qwen3.7 Plus (0.5830) via OpenRouter at $0.32/$1.28 per million tokens, ninth overall and the best Qwen result to date. Added Step 3.7 Flash (0.4668), StepFun's 196B-A11B MoE at $0.20/$1.15, scored at its default (medium) reasoning effort; reasoning cannot be disabled, and half the samples needed a 32K completion budget (~30 needed 64K) to emit any output.

June 8, 2026: Corrected the Infinity-Parser2 scores. Both models had been run through the infly Gradio Space in doc2md mode, which linearizes every document to plain prose with no table structure, so their near-zero table (TEDS) scores were a data-collection artifact, not a model limitation. Re-ran both in doc2json mode, where table blocks carry <table> HTML that the scorer reads: Infinity-Parser2 Pro 0.3541 → 0.4592 (TEDS 0.04 → 0.41) and Flash 0.3040 → 0.3867 (TEDS 0.06 → 0.30). A handful of non-Latin handwriting scans where the layout detector finds only the stamped folio number still score near zero; those are genuine model misses.

May 29, 2026: Fixed a TEDS scoring bug that systematically penalized OCR models which emit tables as HTML <table> rather than markdown pipes: the parser previously scored their tables near zero even when read correctly. This raised 11 open-source OCR specialists on the table category: dots.ocr 1.5 (+0.145, to 0.478), FireRed-OCR (+0.137), dots.ocr (+0.134), GLM-OCR (+0.121), PaddleOCR-VL-1.6 (+0.111), OCRVerse, PaddleOCR-VL-1.5, OlmOCR-2, LightOnOCR-2, Nemotron-3-Nano-Omni, and Qianfan-OCR. General VLMs were unaffected (they were prompted to output markdown tables). Added Surya OCR 2 (0.4476), run locally on an RTX 4090 via llama.cpp with quantized GGUF weights: strong on printed text (0.7180), weak on handwritten tables and right-to-left scripts. Added PaddleOCR-VL-1.6 (0.3944) via the Baidu AIStudio job API; handles printed text (0.6584) far better than non-Latin scripts. Added Infinity-Parser2 Pro (0.3541) and Flash (0.3040), run through the public infly Gradio Space; both read printed text well (Pro 0.7321) but linearize handwritten tables in markdown mode, so their table (TEDS) scores are near zero. Re-ran Datalab on a new subscription key and added the balanced mode: fast 0.5124, balanced 0.5167, accurate 0.5213, marginally below the prior key (fast 0.5374, accurate 0.5391) across all modes.

May 23, 2026: Added GPT-5.5 with auto resolution + medium reasoning (0.4609) and high resolution + low reasoning (0.3924). Both clear the best GPT-5.4 variant (0.3424). Added Gemini 3.1 Flash Lite at minimal (0.6214) and low (0.5819) reasoning, replacing the previous medium-reasoning entry (0.6067). Audited every OpenRouter-backed model's pricing against the live OpenRouter API: corrected Claude Opus 4.6 ($15/$75 → $5/$25), Gemini 2.5 Flash ($0.15/$0.60 → $0.30/$2.50), Kimi K2.5 ($0.60/$2.50 → $0.40/$1.90), Llama 4 Maverick ($0.27/$0.85 → $0.15/$0.60), Qwen3 VL 30B, Qwen3.5 9B, Qwen3.5 Plus (2026-04-20), and the Qwen3.6 27B/35B/Flash line. Fixed a long-standing bug in call_openai: reasoning-capable models (GPT-5 / o3 family) were given only 4096 completion tokens, so medium-reasoning calls burned the entire budget on hidden reasoning tokens and returned empty output; the budget now scales to 32K when reasoning is on.

May 21, 2026: Added Gemini 3.5 Flash with minimal (0.6022) and low (0.6095) reasoning, a modest gain over Gemini 3 Flash (0.5920 / 0.5995). Pricing tripled to $1.50/$9.00 (from $0.50/$3.00), so cost-performance is meaningfully worse despite the score bump.

May 14, 2026: Various updates. Audited every model's API pricing against its native source (Google, Anthropic, OpenAI, OpenRouter, DashScope, xAI). Gemini 2.5 Flash $0.30/$2.50 (was $0.15/$0.60, per Google); Claude Opus 4.6 $5/$25 (was $15/$75); Kimi K2.5 $0.40/$1.90 (was $0.60/$2.50); Llama 4 Maverick $0.15/$0.60 (was $0.27/$0.85); Grok 4.2 Fast $0.20/$0.50 (was $2/$10); Qwen3 VL 30B $0.13/$0.52 (was $0.10/$0.40). Filled previously-unknown prices: Qwen3.6 Plus $0.33/$1.95, Mistral Small 2603 $0.15/$0.60, Gemma 4 31B $0.12/$0.37, Gemma 4 26B $0.06/$0.33, GPT-5.4 Mini $0.75/$4.50, GPT-5.4 Nano $0.20/$1.25. Cost-performance chart entries refreshed accordingly.

March 17, 2026: 64 models evaluated. Added Mistral Small 2603 (0.3140), GPT-5.4 Mini (0.4401) and GPT-5.4 Nano (0.2982) with auto resolution and no reasoning, Datalab fast (0.4849) and Datalab accurate (0.4908).

March 15, 2026: 59 models evaluated. Added GPT-5 Mini and GPT-5 Nano with both default (medium) and minimal reasoning variants. Clarified all GPT model names to show resolution and reasoning settings (e.g. "GPT-5.4 (high res., med. reason.)"). Reducing reasoning dramatically improves smaller models: GPT-5 Mini jumps from 0.3866 to 0.5106, GPT-5 Nano from 0.0546 to 0.3045. Added Seed 2.0 Pro (0.5831), dots.ocr (0.4651), Grok 4.2 Fast (0.2433). Added Qwen3.5 9B via OpenRouter (0.4612), Qwen3.5 4B/2B/0.8B via Ollama, and LightOnOCR-2 via Ollama (0.2720). Updated chart to auto-generate from scores. Fixed GPT pricing from official OpenAI docs.

March 12, 2026: Launched v3 with 280 full-page document images across 5 regions (W. Europe, E. Europe, E. Asia, S. Asia, MENA) and 3 formats (HW Text, Print Text, HW Table). 38 models evaluated. Added Mistral OCR, DeepSeek-OCR, DeepSeek-OCR2, OlmOCR-2, PaddleOCR-VL-0.9B, Nemotron-Nano-12B, ERNIE 4.5 VL 28B, and Qwen3 VL 30B. Added batch Gemini pricing to the cost-performance chart. Removed GPT-5.3 Codex (incompatible with chat completions API).

March 5, 2026: Added LightOnOCR-2-1B via Ollama (0.2185) and Qwen3.5-2B (no reasoning) via Ollama (0.3682). Both run locally with quantized GGUF weights. Small images padded to 224px minimum for Ollama compatibility.

March 3, 2026: Added Gemini 3.1 Flash Lite (0.6546), Gemma 3 27B (0.2069), Gemma 3 4B (0.1808), and Nemotron-Nano 12B (0.2071). Re-benchmarked OlmOCR-2 with correct prompt (0.3678, up from 0.223). Re-benchmarked Nemotron-Nano with reasoning disabled (0.2071, up from 0.1359).

March 2, 2026: Added Qwen3.5 small models via HuggingFace Inference Endpoints: Qwen3.5-9B (0.4469), Qwen3.5-4B (0.4108), and Qwen3.5-0.8B (0.3294). All run with reasoning disabled.

March 1, 2026: Added Nano Banana 2 (0.5727), FireRed-OCR (0.2592). Added Gemini 3 Flash (minimal reasoning) variant.

February 27, 2026: Added Seed 2.0 Mini (0.5257), ERNIE 4.5 VL 424B (0.4424), Llama 4 Maverick (0.3730), PaddleOCR-VL (0.2933), GLM-OCR (0.2660), Molmo-2 8B (0.0944). Added LLM post-processing (cleanup) experiments for several models.

February 26, 2026: Expanded benchmark with new handwriting and table samples. Europe share reduced from 58% to 44% of samples. Added Tesseract v5, LayoutParser, EfficientOCR, PP-OCRv5, and Mistral OCR. Added cost vs performance scatter plot.

February 25, 2026: Expanded benchmark with new handwriting samples. Launched v2 scoring with macro-averaging across Region, Period, and Format axes. Added Claude Sonnet 4.6, Claude Opus 4.6, and GPT-5.3 Codex.

February 24, 2026: Added five Qwen models via Alibaba Cloud Model Studio: Qwen3.5-27B (0.4340), Qwen3.5-35B (0.4296), Qwen3.5-Flash (0.4280), Qwen3.5-122B (0.4264), and Qwen-VL-OCR (0.3525). All Qwen3.5 variants scored within a tight 0.426 to 0.434 range, with the smaller 27B model slightly outperforming the 122B. Qwen-VL-OCR, Alibaba's dedicated OCR model, underperformed the general-purpose Qwen3.5 models.

February 19, 2026: Added Gemini 3.1 Pro (low), which takes the #1 spot at 0.5965. Fixed a bug in the evaluation script where models that wrapped output in markdown code fences (e.g. ```markdown ... ```) had their entire response stripped instead of just the fence markers.

February 16, 2026: Added Qwen3.5-397B in two variants: thinking disabled (0.4484) and thinking enabled (0.4598).

February 15, 2026: Added API cost columns, Gemini 2.5/2.0 Flash, ERNIE 4.5 VL, Llama 4 Maverick, dots.ocr-1.5, Qwen3-VL-8B, DeepSeek-OCR2, Kimi K2.5, Datalab, PP-OCRv5. Fixed max_tokens for thinking models.

Last updated: June 24, 2026