An entire Herculaneum scroll has been read for the first time

AI
History
Science
Open Data
Imaging

The post links to Scroll Prize’s announcement and preprint claiming the first full reading of an unopened Herculaneum scroll, one of the carbonized papyrus rolls buried by Vesuvius in 79 AD. The team used high-power synchrotron CT scans to capture the internal structure, then virtually segmented and unwrapped the tightly crushed layers and detected faint traces of carbon-based ink. The recovered text is not a dramatic lost epic. It looks like a philosophical treatise on ethics in a Stoic context, with the name Aristocreon pointing to a 2nd century BC work tied to the school of Chrysippus.

If you work with ML, this is a clean case study in where the bottleneck is labeled data, tooling, and domain experts rather than model novelty. If you care about archives, archaeology, or scientific imaging, expect more value from workflows that combine better sensors, careful human-in-the-loop reconstruction, and open datasets than from fully automated systems anytime soon.

June 25, 2026
scrollprize.org
Discuss on HN

Discussion mood

Strongly excited and admiring. People saw this as a rare, tangible use of ML that produces new historical evidence, and the enthusiasm was amplified by direct answers from a project team member. The main note of caution was that the process is still slow, expensive, and heavily dependent on human annotation, so the breakthrough is real but not yet close to push-button automation.

Key insights

Ground truth is the actual engine

The result depends on a labor pipeline more than a magical model. Humans manually trace layer boundaries, refine virtual unwrappings, and label ink textures on rendered papyrus. Even some readable text can appear from simple physical rendering before any ML is applied. That changes the story from 'AI read a scroll' to 'custom imaging plus a large annotation operation made ML usable at all.'

Treat this as a human-in-the-loop imaging system, not an autonomous model demo. If you want similar wins in your own domain, invest first in annotation workflows, intermediate visualizations, and tools experts can correct quickly.

Attribution:

verditelabs #1 #2 #3 #4

Automation works only on the easy parts

The current pipeline is uneven across the scrolls. Cleaner sections can yield entire columns and even complete character recovery, while damaged or warped sections still collapse to zero readable text. The first fully read scroll was also unusually favorable because it was small and happened to have readable ink. That means the headline marks a milestone, not a solved production process for the whole archive.

Do not extrapolate one flagship success into linear throughput. Plan for a long tail where each next artifact may be much harder, and measure progress by coverage on difficult cases, not just best-case demos.

Attribution:

verditelabs #1 #2 #3 #4

Beam time and hardware are hard bottlenecks

This is not compute-only archaeology. The team says only about 30 scrolls have been scanned so far, largely because the work needs extremely powerful synchrotron beam lines that cost real money and have tight scheduling. They paid for scan time from donations and internal funding. That puts a physical cap on how fast the corpus can expand, even if the software keeps improving.

If you are thinking about scale, the limiting resource may be access to specialized instruments rather than better models. Partnerships, funding, and scheduling for scarce hardware are part of the product here.

Attribution:

verditelabs #1 #2 #3 #4

The first full text already broke one expectation

A papyrologist friend was cited saying the villa would likely yield mostly Epicurean philosophy, which fits the traditional picture of the library. But commenters pointed out that this recovered scroll appears Stoic, and one of the first three is not Epicurean either. That does not prove the collection is broadly diverse, but it weakens the assumption that the unread rolls are just more of the same.

Watch for changes in collection-level interpretation, not just isolated translations. Even a few early exceptions can change what historians prioritize scanning and how they model the library behind these artifacts.

Attribution:

Matticus_Rex #1
kome #1
verditelabs #1

This is old-school ML, not LLM theater

Several people praised the project as one of the few AI stories that clearly improves the world, and the team member noted that much of the segmentation stack comes from medical imaging. Another commenter pushed back on the idea that this is unusual for AI, arguing that useful narrow ML has always existed and only feels overshadowed because LLMs dominate attention. The useful frame is that this project belongs to the long tradition of domain-specific vision and reconstruction systems, not to chatbots doing archaeology cosplay.

When evaluating AI bets, separate perception from capability. High-value ML may look boring, specialized, and deeply tied to domain workflows, which is often a better sign than general-purpose hype.

Attribution:

giancarlostoro #1
verditelabs #1
cyberpunk #1
delusional #1

Against the grain

The headline overstates the novelty

What is new is not that Herculaneum scrolls have ever been read. Some were physically opened and studied long ago. The specific breakthrough is reading an entire unopened scroll digitally and non-destructively. That narrower claim still matters a lot, but it cuts through the impression that scholars had been staring at an entirely unread library until now.

Be precise when repeating this story internally or publicly. The real advance is scalable non-destructive recovery from fragile artifacts, not the first human understanding of Herculaneum texts.

Attribution:

suddenlybananas #1
verditelabs #1
IAmBroom #1
legitster #1

Most finds may add detail, not revolution

The likeliest outcome is not a constant stream of civilization-upending discoveries. For this period, a fair amount is already known, and many recovered works may be familiar genres, school texts, or philosophical material from one wealthy Roman library. The payoff could still be substantial in mundane records, variant readings, and missing pieces of known traditions rather than spectacular hidden truths.

Set expectations around cumulative scholarly value. The biggest business and research lesson is that unlocking a large archive can be worthwhile even when most individual items refine knowledge instead of overturning it.

Attribution:

empath75 #1 #2 #3

In plain english

CT ↩

Computed tomography, an imaging method that builds a 3D internal picture from many X-ray measurements.

Epicurean ↩

Relating to Epicureanism, an ancient philosophical school associated with pleasure understood as freedom from fear and distress.

ESRF ↩

European Synchrotron Radiation Facility, a major research center in France that provides powerful synchrotron X-ray beams.

ground truth ↩

Trusted labeled data used as the reference standard for training or evaluating a model.

ML ↩

Machine learning, a family of algorithms that learn patterns from data to make predictions or classifications.

papyrologist ↩

A scholar who studies ancient texts written on papyrus.

Stoic ↩

Relating to Stoicism, an ancient Greek and Roman school of philosophy focused on reason, ethics, and self-mastery.

synchrotron ↩

A large scientific machine that accelerates particles to produce extremely bright X-rays for detailed imaging.

Reference links

Project resources

Scroll Prize announcement
Main announcement describing the first full reading of an unopened Herculaneum scroll
Scroll Prize preprint
Technical paper behind the announcement
Scroll Prize GitHub repository
Project code and related materials linked from the post
Scroll Prize data browser for PHerc Paris 4 segments
Interactive example commenters used to inspect segmentation and ink overlays
Scroll Prize data fragments explanation
Explains how fragment data and visible ink evidence support training and validation
AWS Open Data registry for Vesuvius Challenge scrolls
Public access to scan data cited by the project team member
Scroll Prize models on Hugging Face
Public release of trained models mentioned in the comments

Background on archaeology and ancient texts

Mausoleum of Qin Shi Huang
Example in the excavation debate about waiting for better preservation and imaging methods
List of lost literary works
Used to illustrate the range of ancient works that no longer survive
Are there more surviving ancient writings in Greek or Latin
Cited in discussion of how little of classical literature survives
The bawdy graffiti of Pompeii and Herculaneum
Offered as evidence that everyday ancient writing could be closer to social media than high literature

Related imaging and technical references

Xerox scanners switching written numbers when scanning
Warning example that document-processing systems can produce plausible but false outputs
From tomographic reconstruction to automatic text recognition
Reference to prior work on reading closed books via tomography
The Whitworth three plates method
Linked in a side discussion about how precise stone surfaces can be made by repeated grinding