← Back to feed

opendataloader-pdf

GitHub Repo Pretty sure · accessibility compliance is real market
https://github.com/opendataloader-project/opendataloader-pdf

Serious PDF parser with genuine benchmarks and accessibility compliance pedigree—not wrapping an API, actually shipping deterministic extraction + AI hybrid mode. The accessibility angle is novel and regulatory-driven, not marketing theater.

25%
50%
25%
Slop 25%Signal 50%Science 25%

This is credible: #1 benchmarks (0.907 vs 0.882 docling) on real 200-PDF test sets, PDF Association + veraPDF collaboration, deterministic local mode + hybrid fallback is a sound architecture. The accessibility angle (Tagged PDF auto-tagging) solves a genuine $50–200/doc manual remediation problem with regulatory teeth (EAA, ADA). Trade-offs are honest (Q2 2026 timeline, enterprise PDF/UA export upsell). The Java dependency and per-JVM-spawn penalty are implementation warts, not dealbreakers....

5491 stars Java 2026-03-19 310 days old

Become a MFer to rate — log in