Amateur may have cracked Linear A

AI
History
Research
Languages

The post claims Tom di Mino, an amateur linguistics enthusiast with an AI engineering background, may have cracked Linear A by treating some Linear A signs as sharing phonetic values with their Linear B counterparts and then testing whether resulting word patterns fit an extinct Semitic language related to Hebrew. The writeup says Claude Code helped build Python tooling to query and organize the GORILA and SigLA corpora, then run large numbers of simulations to estimate whether the matches were luck. The missing piece is the one everyone wanted most: there is no public paper, no full table of proposed sound values, no released translation list for the claimed 300 words, and no code or prompts to inspect. That made the central question less "did AI solve Linear A" and more "is there enough here to evaluate anything at all".

Treat this as a watch item, not a breakthrough. If you work around AI-assisted research, the useful signal is not "LLMs solved ancient language" but that coding assistants may speed up corpus wrangling and hypothesis testing in expert-heavy fields where validation is still brutally human and evidence constrained.

June 19, 2026
aiclambake.com
Discuss on HN

Discussion mood

Interested but strongly skeptical. People liked the idea of AI-assisted tooling for a hard humanities problem, but the lack of a paper, data, translation tables, and independent validation made the claim feel much closer to a familiar amateur decipherment announcement than to a confirmed breakthrough.

Key insights

The corpus is too small and too weird

Linear A survives in such a fragmentary form that the usual standards for decipherment barely apply. Most of what remains are tiny administrative lists, seal marks, and a single recurring ritual formula, so any proposed reading is being fit to scraps that may include abbreviations and may not even all represent the same language. That makes elegant-looking matches much less persuasive than they would be in a richer corpus.

If you evaluate any future claim here, ask first what portion of the corpus it explains outside the libation formula and short list headers. A system that works only on the most reused phrase is not a decipherment.

Attribution:

stratocumulus0 #1 #2
Tuna-Fish #1

The Semitic case looks under-motivated

The headline-grabbing move relies on stripping down one word until a possible Semitic root appears, while leaving major parts of that word and the surrounding sentence unexplained. That is exactly the sort of selective matching that undeciphered scripts invite. Commenters also noted that if Minoan Crete had a Semitic written language in active contact with Greek, stronger traces in Greek loans and place names should be easier to point to than they are.

Watch for whether the eventual paper shows systematic sound correspondences and grammar, not just lexical lookalikes. A handful of plausible roots is cheap. A consistent language-wide mapping is the bar.

Attribution:

yorwba #1 #2

Real verification needs predictions, not vibes

Past decipherments became convincing when they generated readings that later matched new evidence, or at least made precise claims that could fail. With Linear A there is almost no fresh text to hold back for testing, so verification has to come from a rigorous step-by-step argument and internal consistency across the whole corpus. Without that, the field is left with something closer to cryptanalysis below the unicity distance, where many different keys can seem to work.

Any serious release should include explicit falsifiable predictions, confidence levels, and failed alternatives. If everything can be explained after the fact, nothing has really been tested.

Attribution:

red_admiral #1 #2
canjobear #1

The AI story is toolmaking, not machine decipherment

The credible version of the AI angle is mundane and useful. Claude Code appears to have helped build scripts for corpus parsing, sign co-occurrence analysis, and simulations that estimate whether observed matches beat chance. That is very different from an LLM free-associating a translation. It puts the interesting part in workflow acceleration for a hypothesis-driven human, not in treating the model as an oracle.

For teams doing research, this is a better template than hype about autonomous discovery. Use coding agents to compress the boring setup and expand the search space, then keep the inferential burden on transparent methods.

Attribution:

simonw #1
peterfirefly #1
Kosturdistan #1 #2

Known-language anchors still dominate decipherment

Several examples from Egyptian, Maya, Akkadian, and even Cypriot scripts reinforced the same point. Successful decipherments usually get traction from a bilingual text, a descendant language, or at least a known language family. Linear A has none of those anchors in a firm form. That does not make progress impossible, but it means any claim has to overcome a much higher evidentiary bar than the blog post clears.

Do not generalize from this story to "AI can decode lost languages." In domains with no external anchor, the bottleneck is often missing information, not missing compute.

Attribution:

simonw #1

Against the grain

Early circulation before publication is normal

A few readers pushed back on the demand for an immediate preprint and argued that sharing a promising result before formal release is not automatically shady. The interesting part is that the work reportedly includes simulations and a draft manuscript already circulating among specialists, which at least distinguishes it from a pure blog-post fantasy. That does not validate the decipherment, but it does make the right next step expert review rather than instant dismissal.

Keep skepticism pointed at the evidence, not at the fact that the work surfaced first in an informal venue. Some real results do leak into public view before they are packaged for academia.

Attribution:

GavinMcG #1
m0llusk #1
doublepg23 #1

A Semitic language can still write vowels

The objection that a Semitic language would not use a syllabary overstates how writing systems work. Consonantal roots are central to Semitic morphology, but vowels carry a lot of lexical and grammatical information, so a script that marks syllables is not inherently disqualifying. That line of attack is weaker than the deeper problems about corpus size and selective matching.

If you are stress-testing the claim, spend less time on blanket script-family assumptions and more on whether the proposed readings produce consistent morphology and phonology across many inscriptions.

Attribution:

mcswell #1

In plain english

Akkadian ↩

An ancient Semitic language of Mesopotamia written in cuneiform.

Claude Code ↩

Anthropic's coding-focused AI assistant used to help write and work with software tools.

corpus ↩

The full body of surviving texts available for study.

GORILA ↩

A major scholarly corpus and reference work collecting Linear A inscriptions.

libation formula ↩

A recurring Linear A ritual inscription that appears in multiple examples and is one of the most studied texts in the corpus.

Linear A ↩

An undeciphered Bronze Age script used on Crete, usually associated with the Minoan civilization.

Linear B ↩

A later script related in appearance to Linear A that was deciphered and shown to write Mycenaean Greek.

LLM ↩

Large language model, a type of AI system trained on huge amounts of text to generate human-like responses.

Maya ↩

A family of Mesoamerican languages and the civilization whose script was deciphered in part through links to living descendant languages.

Minoan ↩

Referring to the Bronze Age civilization centered on Crete before Mycenaean Greek dominance.

phonetic values ↩

The speech sounds that scholars think written symbols represent.

Semitic ↩

A language family that includes Hebrew, Arabic, Aramaic, and Akkadian.

SigLA ↩

A digital database of Linear A inscriptions used by researchers.

substrate vocabulary ↩

Words in a language that are thought to come from an older language previously spoken in the same region.

syllabary ↩

A writing system in which each symbol usually represents a syllable rather than a single consonant or vowel.

unicity distance ↩

In cryptography, the rough amount of ciphertext needed before there is enough information to uniquely determine the key.

Reference links

Background on related scripts and languages

Idalion bilingual
Example of a bilingual inscription used in decipherment of the Cypriot syllabary
Eteocypriot language
Parallel example of an undeciphered pre-Greek language written in a known script
Indus script
Another undeciphered script raised as a possible future target for AI-assisted analysis
Yuri Knorozov
Reference for the Maya decipherment example and the role of descendant languages
La Mojarra Stela 1
Example of a script with too little evidence for a confident decipherment
La Mojarra Stela 1 schematic
Visual reference for the undeciphered Isthmian script example

Specific linguistic references

Kupirijo in museum
Evidence cited for a Greek word of Semitic derivation attested in Linear B
Ancient Greek Κύπρος entry
Used to discuss Cyprus and possible Semitic loan relationships
Hebrew נווה entry
Cited in criticism of the proposed Semitic root match
Biblexika navah H5115
Reference offered to support the Biblical Hebrew root discussed in the claim
Hebrew Academy word entry
Reference for pronunciation and meaning of the Hebrew root under discussion
Bible Hub Hebrew 5116
Another lexical reference used to defend the Hebrew comparison

Core discussion references

Claude Shannon, A Mathematical Theory of Communication / entropy paper mirror
Cited to argue that unknown-script unknown-language decipherment faces hard information limits
xkcd 2151: Linear Regression
Referenced for the "Linear A/B testing" joke
lineara.xyz
Suggested visual resource for seeing Linear A and related material
Reddit classics discussion
Linked as an external skeptical reaction from people interested in classics
Everything is a Remix
Shared during the side argument about AI, originality, and credit