Adaptive PDFs
- AI
- Accessibility
- Security
- Developer Tools
The post demonstrates an "adaptive PDF" that uses PDF replacement text and related structure so a human sees a normal designed document, while text extraction tools can pull out cleaner, more structured content such as Markdown. The point is not that the visible PDF changes by reader. It is that the machine-readable layer can differ from what is visually rendered. That immediately pushed attention away from the demo itself and toward the long-standing mess around PDFs. People noted that the format has supported embedded structure, attachments, JavaScript, and accessibility tags for years, but most authoring tools still emit files that look fine and extract badly. A few corrected the post's claim that LaTeX cannot do tagged PDF. Modern LaTeX tooling can, and public-sector accessibility rules already require semantic tagging in many cases. The sharper takeaway was that LLM use did not create this problem. It just turned a niche accessibility and document-engineering issue into an operational one for anyone feeding PDFs into automated systems.
If your product ingests PDFs with AI or automation, do not trust extracted text as ground truth. Build pipelines that validate extractor behavior, prefer tagged or accessibility-friendly PDFs when you control generation, and assume prompt injection or hidden-text tricks will show up in real workflows.
- sgaud.com
- Discuss on HN