← Back to feed

chandra

GitHub Repo Pretty sure · multilingual OCR is genuinely hard
https://github.com/datalab-to/chandra

Vision transformer OCR that actually handles math, tables, and 90 languages—the README shows real benchmarks instead of vibes, and the model ships with both local and hosted inference.

15%
20%
65%
Slop 15%Signal 20%Science 65%

Chandra is a real OCR model with substantive technical work: multilingual support (90+ langs), handwriting, math/tables, layout preservation. Benchmarks are specific (olmocr scores, custom multilingual benchmark). Science score reflects that it's an engineering effort, not a research contribution—solid execution of existing VLM techniques. Slop is low because the README proves claims with examples and concrete comparisons. Signal is modest because the primary value accrues to the hosted API; ...

6078 stars Python 2026-03-18 169 days old

Become a MFer to rate — log in