HN Debrief

Amateur may have cracked Linear A

  • AI
  • History
  • Research
  • Languages

The post claims Tom di Mino, an amateur linguistics enthusiast with an AI engineering background, may have cracked Linear A by treating some Linear A signs as sharing phonetic values with their Linear B counterparts and then testing whether resulting word patterns fit an extinct Semitic language related to Hebrew. The writeup says Claude Code helped build Python tooling to query and organize the GORILA and SigLA corpora, then run large numbers of simulations to estimate whether the matches were luck. The missing piece is the one everyone wanted most: there is no public paper, no full table of proposed sound values, no released translation list for the claimed 300 words, and no code or prompts to inspect. That made the central question less "did AI solve Linear A" and more "is there enough here to evaluate anything at all".

Treat this as a watch item, not a breakthrough. If you work around AI-assisted research, the useful signal is not "LLMs solved ancient language" but that coding assistants may speed up corpus wrangling and hypothesis testing in expert-heavy fields where validation is still brutally human and evidence constrained.

Discussion mood

Interested but strongly skeptical. People liked the idea of AI-assisted tooling for a hard humanities problem, but the lack of a paper, data, translation tables, and independent validation made the claim feel much closer to a familiar amateur decipherment announcement than to a confirmed breakthrough.

Key insights

  1. 01

    The corpus is too small and too weird

    Linear A survives in such a fragmentary form that the usual standards for decipherment barely apply. Most of what remains are tiny administrative lists, seal marks, and a single recurring ritual formula, so any proposed reading is being fit to scraps that may include abbreviations and may not even all represent the same language. That makes elegant-looking matches much less persuasive than they would be in a richer corpus.

    If you evaluate any future claim here, ask first what portion of the corpus it explains outside the libation formula and short list headers. A system that works only on the most reused phrase is not a decipherment.

      Attribution:
    • stratocumulus0 #1 #2
    • Tuna-Fish #1
  2. 02

    The Semitic case looks under-motivated

    The headline-grabbing move relies on stripping down one word until a possible Semitic root appears, while leaving major parts of that word and the surrounding sentence unexplained. That is exactly the sort of selective matching that undeciphered scripts invite. Commenters also noted that if Minoan Crete had a Semitic written language in active contact with Greek, stronger traces in Greek loans and place names should be easier to point to than they are.

    Watch for whether the eventual paper shows systematic sound correspondences and grammar, not just lexical lookalikes. A handful of plausible roots is cheap. A consistent language-wide mapping is the bar.

      Attribution:
    • yorwba #1 #2
  3. 03

    Real verification needs predictions, not vibes

    Past decipherments became convincing when they generated readings that later matched new evidence, or at least made precise claims that could fail. With Linear A there is almost no fresh text to hold back for testing, so verification has to come from a rigorous step-by-step argument and internal consistency across the whole corpus. Without that, the field is left with something closer to cryptanalysis below the unicity distance, where many different keys can seem to work.

    Any serious release should include explicit falsifiable predictions, confidence levels, and failed alternatives. If everything can be explained after the fact, nothing has really been tested.

      Attribution:
    • red_admiral #1 #2
    • canjobear #1
  4. 04

    The AI story is toolmaking, not machine decipherment

    The credible version of the AI angle is mundane and useful. Claude Code appears to have helped build scripts for corpus parsing, sign co-occurrence analysis, and simulations that estimate whether observed matches beat chance. That is very different from an LLM free-associating a translation. It puts the interesting part in workflow acceleration for a hypothesis-driven human, not in treating the model as an oracle.

    For teams doing research, this is a better template than hype about autonomous discovery. Use coding agents to compress the boring setup and expand the search space, then keep the inferential burden on transparent methods.

      Attribution:
    • simonw #1
    • peterfirefly #1
    • Kosturdistan #1 #2
  5. 05

    Known-language anchors still dominate decipherment

    Several examples from Egyptian, Maya, Akkadian, and even Cypriot scripts reinforced the same point. Successful decipherments usually get traction from a bilingual text, a descendant language, or at least a known language family. Linear A has none of those anchors in a firm form. That does not make progress impossible, but it means any claim has to overcome a much higher evidentiary bar than the blog post clears.

    Do not generalize from this story to "AI can decode lost languages." In domains with no external anchor, the bottleneck is often missing information, not missing compute.

      Attribution:
    • simonw #1

Against the grain

  1. 01

    Early circulation before publication is normal

    A few readers pushed back on the demand for an immediate preprint and argued that sharing a promising result before formal release is not automatically shady. The interesting part is that the work reportedly includes simulations and a draft manuscript already circulating among specialists, which at least distinguishes it from a pure blog-post fantasy. That does not validate the decipherment, but it does make the right next step expert review rather than instant dismissal.

    Keep skepticism pointed at the evidence, not at the fact that the work surfaced first in an informal venue. Some real results do leak into public view before they are packaged for academia.

      Attribution:
    • GavinMcG #1
    • m0llusk #1
    • doublepg23 #1
  2. 02

    A Semitic language can still write vowels

    The objection that a Semitic language would not use a syllabary overstates how writing systems work. Consonantal roots are central to Semitic morphology, but vowels carry a lot of lexical and grammatical information, so a script that marks syllables is not inherently disqualifying. That line of attack is weaker than the deeper problems about corpus size and selective matching.

    If you are stress-testing the claim, spend less time on blanket script-family assumptions and more on whether the proposed readings produce consistent morphology and phonology across many inscriptions.

      Attribution:
    • mcswell #1

In plain english

Akkadian
An ancient Semitic language of Mesopotamia written in cuneiform.
Claude Code
Anthropic's coding-focused AI assistant used to help write and work with software tools.
corpus
The full body of surviving texts available for study.
GORILA
A major scholarly corpus and reference work collecting Linear A inscriptions.
libation formula
A recurring Linear A ritual inscription that appears in multiple examples and is one of the most studied texts in the corpus.
Linear A
An undeciphered Bronze Age script used on Crete, usually associated with the Minoan civilization.
Linear B
A later script related in appearance to Linear A that was deciphered and shown to write Mycenaean Greek.
LLM
Large language model, a type of AI system trained on huge amounts of text to generate human-like responses.
Maya
A family of Mesoamerican languages and the civilization whose script was deciphered in part through links to living descendant languages.
Minoan
Referring to the Bronze Age civilization centered on Crete before Mycenaean Greek dominance.
phonetic values
The speech sounds that scholars think written symbols represent.
Semitic
A language family that includes Hebrew, Arabic, Aramaic, and Akkadian.
SigLA
A digital database of Linear A inscriptions used by researchers.
substrate vocabulary
Words in a language that are thought to come from an older language previously spoken in the same region.
syllabary
A writing system in which each symbol usually represents a syllable rather than a single consonant or vowel.
unicity distance
In cryptography, the rough amount of ciphertext needed before there is enough information to uniquely determine the key.

Reference links

Background on related scripts and languages

  • Idalion bilingual
    Example of a bilingual inscription used in decipherment of the Cypriot syllabary
  • Eteocypriot language
    Parallel example of an undeciphered pre-Greek language written in a known script
  • Indus script
    Another undeciphered script raised as a possible future target for AI-assisted analysis
  • Yuri Knorozov
    Reference for the Maya decipherment example and the role of descendant languages
  • La Mojarra Stela 1
    Example of a script with too little evidence for a confident decipherment
  • La Mojarra Stela 1 schematic
    Visual reference for the undeciphered Isthmian script example

Specific linguistic references

Core discussion references