← Back to feed

markitdown

GitHub Repo Pretty sure ยท Microsoft shipping boring infrastructure
https://github.com/microsoft/markitdown

Microsoft's document-to-Markdown converter that actually solves the plumbing problem LLM pipelines needed. Unsexy but functional.

25%
60%
15%
Slop 25%Signal 60%Science 15%

MarkItDown is a genuinely useful tool solving a real problem: converting PDFs, images, audio, and other document formats to LLM-friendly Markdown. The approach is straightforward โ€” preserve document structure (tables, headings, lists) rather than flattening everything to text. The implementation supports a wide range of formats via optional dependency groups (clean design choice), has a plugin system, and integrates with both Azure Document Intelligence and LLM Vision for OCR/descriptions. Th...

102302 stars Python 2026-03-30 514 days old

Become a MFer to rate โ€” log in