HN Debrief The signal in the discussion

Cessation of public development of Kefir C compiler

Open Source
AI
Programming
Developer Tools
Copyright

Kefir is an independently built C compiler that commenters describe as unusually small, correct, and capable enough to pass GCC’s torture tests, which made the shutdown feel like more than the loss of a niche side project. In the announcement, the author says they still enjoy the hard parts of compiler work, but no longer feels justified publishing that effort publicly because the biggest beneficiaries now look like companies vacuuming up code for large language models. That landed with a lot of sympathy. Several people said they have already stopped publishing code, art, or writing for the same reason, or have put sites behind logins because AI crawlers ignore robots.txt and hammer small servers.

If more maintainers start treating public release as optional instead of default, AI data extraction stops being a copyright fight and becomes a supply problem for future open source and developer tooling.

26 May, 2026
kefir.protopopov.lv
Discuss on HN

Discussion mood

Mostly frustrated and resigned. People admired Kefir and were unhappy to lose an impressive compiler, but the stronger mood was broader disillusionment that AI scraping has broken the informal reciprocity many creators thought they were signing up for when publishing publicly.

Key insights

01 The key shift is not legal doctrine but the creator’s mental model of what "copying" means.
One commenter argued that older open source norms assumed copying required obvious reuse, so licenses and reciprocity expectations lined up with reality. LLMs break that intuition by reproducing the functional value of public work while laundering its origins, which lets code contribute to closed commercial systems without attribution, sharing, or even user awareness of where it came from.

The damage here is motivational before it is legal. When reuse becomes invisible, the old social incentives to publish stop working.
- rspeele #1 #2
02 Private retreat can become a structural moat, not just an individual protest.
Commenters noted that Kefir is effectively a one-person project, so closing development likely ends the public line outright. More broadly, if creators stop posting new work or pull old work down, incumbent model vendors keep their already-scraped corpus while everyone else faces a thinner future commons. That does not bring back open source. It creates closed knowledge networks and advantages the companies that extracted first.

The first-order risk is not only fewer public projects. It is a more concentrated AI market built on a one-time harvest of open culture.
- tocariimaa #1
- lesostep #1
- irdc #1
03 The strongest concrete objection was not abstract training theory but output behavior.
Commenters pointed to models returning broken but recognizable GPL code fragments and to examples like Copilot reproducing Quake code, arguing that real systems do sometimes emit license-bound material without attribution. That weakens the clean claim that model use is merely "learning" and makes the enforcement gap feel immediate rather than philosophical.

For many developers, the issue is settled by empirical leakage. If models can regurgitate licensed code, the compliance problem is not hypothetical.
- binaryturtle #1
- LtWorf #1
04 Several comments widened this beyond licensing into a quality and innovation problem.
Public code and writing used to signal that someone had invested thought, which gave artifacts social value beyond raw utility. Cheap synthetic output weakens that signal, floods the channel, and makes sharing less rewarding. A minority pushed back that automating boilerplate is exactly the point and that most code was never especially original anyway, but even that view conceded the economics of publishing have changed because output itself is no longer scarce.

AI changes the value of publishing by making code abundant and provenance blurry. That hits reputation, discovery, and willingness to share, even if productivity rises.
- rurban #1
- irdc #1
- rgoulter #1
- f6v #1

Against the grain

01 The cleanest pro-LLM position was that free software has always permitted broad downstream use, and model training is use, not redistribution.
This view treats LLMs as closer to a human learning patterns from public code than to a transpiler or archive copying a specific program. On that reading, trying to carve out AI as a forbidden use case is the actual break from FOSS norms, not the training itself.

If you believe open source is about freedom to use rather than control over outcomes, anti-training arguments look like a category error.
- Gormo #1 #2 #3
02 One commenter suggested AI is not only upsetting because of scraping or licensing.
It is also exposing an uncomfortable truth about the audience for language tools and deep technical work. That hints at a separate source of demoralization. If the market now rewards convenience and generated output over craftsmanship, maintainers may lose motivation even without any legal or ethical fight about training data.

Some of the burnout may be cultural, not just extractive. AI can make maintainers feel alienated from the people they were building for.
- kazinator #1
03 A harder-line minority said the whole complaint is economically and legally misplaced.
They argued that others benefiting from your code does not exploit you, that GPL explicitly allows commercial use, and that model outputs should only trigger enforcement when they actually reproduce protected code. In that frame, LLMs extend the reach of open source rather than betray it, and the real concern should be access to the models, not access to the training data.

This view sees the problem as enclosure of AI products, not training on public code. Open source values are preserved if the resulting tools remain broadly available.
- sneak #1
- CamperBob2 #1
- Rochus #1 #2

Reference links

Story and project links

Kefir cessation announcement
Original post explaining why public development is ending
Kefir source repository
Source link shared for the project being discussed

AI scraping and creator response

Website Protection Act proposal
A proposed legal framework inspired by anti-spam fax rules to charge abusive crawlers
Variety on Tony Gilroy withholding Andor scripts over AI fears
Example from another creative field of authors declining to publish material because of model training concerns

Evidence and analogies in the licensing debate

Reddit post on Copilot reproducing Quake code
Used as evidence that code models can emit recognizable licensed source verbatim
Science article on historical shift in horse riding style
Shared in an analogy about whether new techniques still emerge after a technology transition

Broader trust context

United Nations World Social Report 2024
Cited to support the claim that trust is declining more broadly, not only online

Projects and books mentioned

Book on LLMs and human creativity
Mentioned by a commenter describing why they published a book commercially instead of as a public blog series
Oberon project
Example GPL project cited by a commenter arguing GPL has always allowed broad commercial use
Luon project
Another GPL project cited in the same argument about licensing expectations
Micron project
Another GPL project cited in the same argument about licensing expectations