HN Debrief

πFS

  • Programming
  • Mathematics
  • Storage
  • Developer Tools
  • AI

The GitHub repo is a novelty filesystem that presents a classic gag as software: instead of storing your file, it “stores” offsets into the digits of π and reconstructs the bytes from there. The catch is the whole point. This only works if π’s digits behave like a normal number, which is suspected but not proven, and even then the index of a string in a random-looking sequence is usually enormous. Several commenters grounded the joke in plain information theory. If a file is long enough to be interesting, the first place you should expect to find it in a random sequence is so far out that describing that position costs about as much information as the file. The repo dodges that by splitting files into tiny chunks. In practice it encodes each byte separately, which turns the idea into a guaranteed expansion scheme rather than a compressor. People also connected it to older versions of the same trick like Borges’s Library of Babel, the Sloot coding myth, and “illegal number” arguments. The common landing point was that πFS is useful as a teaching prop. It makes the hidden cost of metadata painfully concrete, and it exposes how often people smuggle in unproven assumptions about π when they talk as if infinite digits automatically contain every possible file in a usable way.

Treat this as a vivid demo of why magical compression claims collapse once you count the metadata. If you build or buy anything that promises huge savings from clever indexing into a shared corpus, ask how many bits the lookup keys actually cost and what assumptions about the source are still unproven.

Discussion mood

Amused and approving. Most people treated it as a good math-programming joke, then immediately used it to explain why indexing into π does not buy real compression and may not even be well-defined without unproven assumptions about π’s normality.

Key insights

  1. 01

    The implementation expands every byte

    The repo is not merely impractical in theory. It is wasteful in code. Commenters noticed that the current format writes 16 bits for each 8-bit input byte, and because it stores bytes independently the encoder could be reduced to a tiny 256-entry lookup table. That strips away any illusion of deep compression and shows the project is a deliberately literalized joke about metadata overhead.

    When evaluating a compression or deduplication idea, inspect the concrete encoding format before the theory. A toy implementation can accidentally reveal the real economics faster than the concept page does.

      Attribution:
    • nyc_pizzadev #1
    • amluto #1
    • wavemode #1
  2. 02

    Random-sequence search kills compression fast

    Treating π like random digits gives the right intuition for why this fails. A 128-bit pattern should show up only after about 2^128 candidate positions on average, so the offset you need to record is itself about 128 bits long. The arithmetic in the Stack Exchange quote and follow-up explanation turned the joke into a clean back-of-the-envelope rule. Early matches that would deliver real savings are rare and mostly uninteresting.

    If a scheme says “the data is already somewhere in a huge corpus,” estimate the expected first-match position. If the answer grows exponentially with pattern length, the address budget will erase the win.

      Attribution:
    • thangalin #1
    • csunoser #1
    • adzm #1
  3. 03

    The whole premise leans on unproven normality

    The repo talks as if π contains every finite digit string in a usable way, but that requires properties like disjunctiveness or normality that have not been proved for π in any rational base. Several people pushed on this because it is exactly where a clever joke can turn into a false mathematical claim. Infinity alone is not enough. Plenty of infinite irrational expansions fail to contain all finite patterns.

    Be careful with “contains everything” claims built on infinite sequences. Separate what is proved from what is folklore before you use the idea in an argument, a product pitch, or a legal analogy.

      Attribution:
    • windward #1
    • keithnz #1
    • glitchc #1
    • koolala #1
  4. 04

    It rebuts the “illegal numbers” handwave

    One useful frame was legal and philosophical rather than technical. People sometimes argue that because every image or file can be encoded as a number, regulating specific data is absurd. πFS makes the missing piece obvious. The meaningful artifact is not the existence of the file somewhere inside a vast number space. It is the compact recipe that extracts it, and that recipe is effectively the data in another form.

    If someone claims a harmful file is “just a number,” focus on the executable representation and retrieval path. The legal and operational question is about usable access, not abstract existence in a huge mathematical object.

      Attribution:
    • windward #1
    • charles_f #1
  5. 05

    Sloot’s demo likely exploited reuse, not magic compression

    The Sloot Digital Coding System came up as a close cousin to the same fantasy. One commenter gave the most plausible account of how the famous demo could have worked. Store scan lines in a database, compose frames from line lookups, and videos from frame lookups. That would make multiple split-screen streams and smooth scrubbing look impressive on 1990s hardware without violating information theory. The trick is aggressive reuse of repeated structure, not impossible universal compression.

    When a compression demo seems to beat first principles, look for hidden dictionaries and constrained workloads. You may still find a clever engineering trick, just not the miracle being advertised.

      Attribution:
    • ndiddy #1

Against the grain

  1. 01

    Compression still explains why LLMs feel powerful

    A few people used the joke as a bridge to a broader idea rather than a takedown. They argued that while πFS fails because exact-address encoding carries the full information load, modern language models succeed at a different task. They are lossy compressors of structure and regularity. That is why relatively compact models can generate coherent text despite the combinatorial size of language spaces. The comparison is loose, but it is a useful way to think about what models actually store.

    For AI products, separate exact recall from compressed world knowledge. Use models where preserving the gist is valuable, and do not mistake that for faithful storage or retrieval of original data.

      Attribution:
    • jamwise #1
    • janalsncm #1
    • ainch #1

In plain english

normal number
A number whose digits are distributed uniformly in a given base, so every finite digit sequence appears with the expected frequency.

Reference links

Math and information theory references

Related cultural and historical analogies

  • The Library of Babel PDF
    Borges’s story was cited as the literary version of the same “all texts already exist” idea.
  • Sloot Digital Coding System
    Referenced as a notorious earlier claim of impossible compression and discussed as likely heavy dictionary reuse.
  • Tom7 Harder Drive
    Shared as another joke engineering project in the same spirit as πFS.
  • Service Model
    Mentioned because its Library Archive section has a similar vibe to πFS and Library of Babel ideas.

Related tools and thread-search references