Skip to main content
  1. Projects/

Legacy Document Transformation Demo

Ethan Troy
Author
Ethan Troy
hacker & writer
Table of Contents

Legacy Document Transformation Demo

What It Does
#

Converts compliance documents (DOCX, PDF, PPTX, XLSX) into Markdown and JSON so you can actually query them programmatically instead of ctrl+f’ing through 500 page PDFs.

Why
#

FedRAMP is pushing toward measurement-based compliance. That means moving from “do I have this document?” to “what can I measure from this document?” This demo shows how to get legacy docs into formats you can actually work with.

Tools Compared
#

  • Pandoc - Fast, reliable, well-established
  • MarkItDown - LLM-optimized, handles many formats
  • Docling - Deep document understanding, good with tables

What Gets Extracted
#

  • NIST 800-53 control references
  • Document metadata
  • Named entities (roles, systems, standards)
  • FedRAMP 20x Key Security Indicator mappings

Deployment
#

  • GitHub Actions (auto-runs on document push)
  • Docker
  • Local Python/Bash

Related