Unlimited OCR: One-shot long-horizon parsing
- AI
- Open Source
- Developer Tools
- Education
Baidu posted code and a paper for an OCR system that tries to solve a very specific bottleneck in modern vision-language OCR. When these models transcribe long PDFs, they keep a growing memory of everything they have already generated. That memory, the KV cache, eats VRAM and slows decoding, so production systems usually fall back to crude chunking by page or crop. Unlimited OCR replaces that with a reference sliding window setup. The model keeps full access to the original document image while only attending to a short recent slice of its own output. The pitch is simple: keep global visual context, stop hoarding text history, and make long-horizon parsing feasible on smaller hardware.
If you process long, messy documents today by page-splitting and stitching, this is worth testing because it targets exactly that engineering pain. It also signals where document AI is heading: less brittle page-by-page OCR, more streaming parsers that preserve cross-page context without requiring giant GPUs.
- github.com
- Discuss on HN