📄Microsoft Lança Conversor Universal de Documentos
PDF, Word, Excel, PowerPoint, imagens, áudio, URLs do YouTube. O MarkItDown da Microsoft converte tudo isso em Markdown limpo que sua IA consegue usar de verdade. --- É uma biblioteca Python leve, open source (MIT), com 87 mil stars no GitHub. Funciona na linha de comando, via API, Docker, e tem até servidor MCP pra integração direta com Claude Desktop. O time do AutoGen construiu e está usando em produção. --- Pra quem trabalha com RAG (sistemas que alimentam IAs com documentos), isso resolve o inferno de preprocessamento. Sem parsers customizados, sem pipelines frágeis. pip install markitdown e pronto.
🚨 Microsoft just quietly dropped a tool that turns ANY document into LLM-ready data in seconds. It's called MarkItDown, a lightweight Python library that converts PDFs, Word, Excel, PowerPoint, images, audio, and YouTube URLs into clean Markdown your LLM can actually use. No custom parsers. No brittle pipelines. No preprocessing hell. Built by the AutoGen team and battle-tested across 87K GitHub stars. The numbers don't lie: → pip install markitdown and you're converting files in under 60 seconds → 10+ file formats supported out of the box → Native MCP server for direct Claude Desktop integration And it works everywhere: → Command line: markitdown file.pdf > doc .md → Python API: 3 lines of code → Docker → Azure Document Intelligence for enterprise OCR 100% Opensource. MIT license. This is the document preprocessing tool your RAG pipeline has been waiting for LLM-ready output without the LLM-ready headache. Link in the first comment 👇
— @hasantoxr View on X
