
What It Does#
Converts compliance documents (DOCX, PDF, PPTX, XLSX) into Markdown and JSON so you can actually query them programmatically instead of ctrl+f’ing through 500 page PDFs.
Why#
FedRAMP is pushing toward measurement-based compliance. That means moving from “do I have this document?” to “what can I measure from this document?” This demo shows how to get legacy docs into formats you can actually work with.
Tools Compared#
- Pandoc - Fast, reliable, well-established
- MarkItDown - LLM-optimized, handles many formats
- Docling - Deep document understanding, good with tables
What Gets Extracted#
- NIST 800-53 control references
- Document metadata
- Named entities (roles, systems, standards)
- FedRAMP 20x Key Security Indicator mappings
Deployment#
- GitHub Actions (auto-runs on document push)
- Docker
- Local Python/Bash





