HN Debrief

Show HN: ZeroFS – A log-structured filesystem for S3

  • Infrastructure
  • Open Source
  • Storage
  • Developer Tools

ZeroFS is an open source filesystem that stores data and metadata in S3 and exposes it through familiar filesystem interfaces. The pitch is that you get POSIX-like semantics on top of cheap object storage, using a log-structured layout plus SlateDB under the hood, with support for NFSv3 and a richer 9P-based path for features the author did not want to build into NFSv4. What grabbed attention was not the architecture alone but the marketing around it. Several readers immediately challenged the landing page claim of sub-millisecond writes “at rest in S3,” saying the published benchmark measures buffered write latency and not durable persistence unless fsync is included. The author replied that ZeroFS follows normal filesystem semantics, where write() is buffered and fsync() is the durability boundary, then agreed the wording should be tightened.

If you are evaluating object-storage-backed filesystems, ignore headline latency numbers until you see fsync semantics, failover behavior, and request-volume benchmarks for your workload. The practical divide is now clear: use direct S3 when you can, and only add a filesystem layer when legacy POSIX compatibility is worth the performance and cost tradeoffs.

Discussion mood

Interested but skeptical. People liked the general direction of a serious object-storage-backed filesystem, but the mood was dragged down by benchmark wording, AI-sounding marketing copy, and a lack of convincing data on durability, failover, request cost, and small-I/O performance versus established alternatives.

Key insights

  1. 01

    The benchmark fight was really about durability semantics

    What readers forced into the open is that the key product claim lives or dies on the difference between buffered writes and durable writes. ZeroFS may well behave like ext4 or XFS, where write() is cheap and fsync() is the real commit point, but the site phrasing made it sound like the fast number already meant data was safely in S3. That is not a cosmetic mistake. For storage buyers it changes the meaning of the benchmark from “fast persistence” to “fast buffering with honest fsync.”

    Treat storage benchmarks as meaningless until you know exactly what operation is timed and what durability event it corresponds to. If you publish numbers for your own system, label write(), flush, and fsync() separately so buyers do not have to reverse-engineer your claims.

      Attribution:
    • rockwotj #1 #2
    • xyzzy_plugh #1
    • Eikon #1
    • zenoprax #1
  2. 02

    S3 request economics can erase the value proposition

    The deeper objection was not just latency. It was that object storage punishes designs that turn filesystem traffic into lots of small remote operations. A 128 KiB fetch size may be reasonable for locality, but readers kept circling back to the fact that read and write requests cost far more than bytes at S3 scale. SlateDB might reduce some of that pressure by coalescing and optimizing access patterns, but until request counts and retries are shown for real workloads, the architecture still carries a potentially expensive hidden tax.

    Model request volume and not just bandwidth before adopting any filesystem layer over S3 or GCS. Ask for per-workload counts for GET, PUT, retries, and compaction traffic, because that is where cloud storage bills and tail latency usually break the story.

      Attribution:
    • coxley #1
    • throw1234567891 #1
    • karakanb #1
    • aniketsaini777 #1
  3. 03

    The real use case is legacy POSIX compatibility

    A clear framing emerged around where systems like this belong. If you are building something new, making the application speak S3 directly is still the safer and usually cleaner design because object stores do not behave like disks. The filesystem abstraction earns its keep when you have existing software that expects POSIX semantics and cannot be rewritten cheaply. That is why s3fs was dismissed as not comparable for serious semantics, while ZeroFS got attention for trying to provide something stricter.

    Use object-storage-backed filesystems as compatibility infrastructure, not as the default storage layer for greenfield systems. If your workload can be made object-store-aware, that route will usually be simpler, faster, and easier to reason about.

      Attribution:
    • tmach32 #1
    • the8472 #1
    • rapatel0 #1
  4. 04

    Ceph is the performance bar, not the idea of S3

    One practitioner claim cut through the abstract debate by saying local tests with a disk-backed S3 setup were nowhere near Ceph, especially for the small-I/O patterns that filesystems and block devices live on. That comparison matters because a filesystem is judged on random and metadata-heavy workloads, not just large sequential transfers. The follow-up asking whether this meant CephFS or an RBD-backed filesystem also highlighted how easy it is to make fuzzy benchmark claims in storage if the test shape and stack are not nailed down.

    Demand apples-to-apples comparisons against the exact deployment mode you would otherwise choose, such as CephFS versus RBD plus ext4. Sequential throughput screenshots are not enough if your workload is metadata-heavy, random, or latency-sensitive.

      Attribution:
    • dangoodmanUT #1
    • neverartful #1
  5. 05

    High availability now hinges on failover plumbing

    The useful product update was that metadata is stored in the bucket and that replicated mode now offers automatic failover, which moves ZeroFS closer to being a real alternative to JuiceFS rather than a single-node experiment. But that did not end the concern. Once failover exists, operational details become the next bottleneck, such as how NFS clients are redirected and whether something like HAProxy is needed in front.

    Do not stop at “supports failover” when evaluating distributed storage. Map the full client path, including VIPs, proxies, stale handles, and recovery behavior, because that is what determines whether HA works in production or only on a diagram.

      Attribution:
    • ChocolateGod #1 #2
    • Eikon #1
  6. 06

    Choosing NFSv3 was a scope tradeoff

    The NFS version question exposed a sensible design decision. NFSv4 would add a large stateful protocol surface with compound operations and delegations, which is expensive to implement correctly. ZeroFS instead leans on 9P with extensions for richer semantics and keeps NFSv3 mostly as the compatibility path. That makes the protocol choice look less outdated and more like an explicit attempt to keep complexity under control.

    When a storage product exposes an older protocol, check whether it is a limitation or a deliberate compatibility layer hiding a better native interface. Protocol scope often predicts implementation risk more accurately than feature checklists do.

      Attribution:
    • chillfox #1
    • Eikon #1
    • rockwotj #1

Against the grain

  1. 01

    The marketing copy may be the bigger adoption problem

    A lot of negativity attached to the project was triggered by presentation, not by the filesystem design itself. The site read to many people like unedited LLM output, with explanatory text that sounded like prompt residue instead of crisp product communication. A few readers still liked some of the transparency, such as explaining asciinema playback behavior, and the practical advice was not “stop” but “rewrite the copy by hand and keep the evidence-driven posture.” That shifts part of the story from technical trust to communication hygiene.

    If you are launching developer infrastructure, spend real effort on voice and wording because buyers use that as a proxy for engineering seriousness. Keep the proof links and public CI, but cut anything that sounds like the model explaining itself.

      Attribution:
    • dan_sbl #1
    • xx_ns #1
    • Eikon #1
    • abtinf #1
    • felooboolooomba #1
  2. 02

    Massive fan-out could enable unusual access patterns

    One speculative point cut against the generally defensive tone by noting that object storage can effectively spread tiny reads and writes across huge numbers of drives. That hints at workloads where aggregate parallelism matters more than single-operation latency, and where a filesystem layer over S3 could support patterns that local-disk mental models miss. It does not rescue the current benchmark claims, but it does suggest the architecture might be more interesting for novel distributed access patterns than for pretending to be a normal fast disk.

    If you explore this class of system, do not benchmark only against local filesystem instincts. Test high-parallel fan-out workloads too, because that is where object-storage-backed designs might show strengths that random small-I/O comparisons miss.

      Attribution:
    • gvkhna #1

In plain english

9P
A network filesystem protocol from Plan 9 that exposes files and services through a simple remote interface.
asciinema
A tool for recording and replaying terminal sessions as text rather than video.
Ceph
An open source distributed storage system that provides object, block, and filesystem interfaces.
CephFS
The filesystem interface provided by Ceph.
ext4
A widely used Linux filesystem.
fsync
A system call that forces buffered file changes to be written to durable storage before returning success.
HAProxy
A widely used open source load balancer and proxy server.
JuiceFS
A filesystem that uses object storage for data and a separate metadata service for filesystem state.
KiB
Kibibyte, a unit of 1,024 bytes.
NFS
Network File System, a protocol that lets a computer access files over a network as if they were local.
NFSv3
Version 3 of the Network File System protocol, an older and simpler mostly stateless version.
NFSv4
Version 4 of the Network File System protocol, a newer and more feature-rich but more complex stateful version.
POSIX
Portable Operating System Interface, a standard set of filesystem and operating system behaviors that Unix-like software expects.
RBD
RADOS Block Device, Ceph's virtual block storage layer that can be formatted with a normal filesystem.
S3
Simple Storage Service, an object storage interface popularized by Amazon and widely implemented by other storage systems.
s3fs
A tool that mounts S3 buckets as filesystems, usually with weaker semantics than a native disk filesystem.
SeaweedFS
A distributed storage system that can present filesystem-like interfaces over object-backed storage.
SlateDB
An embedded storage engine mentioned by commenters as part of ZeroFS's design, intended to reduce and organize object-store operations.
XFS
A high-performance filesystem commonly used on Linux systems.

Reference links

Competing and related storage systems

  • FiberFS
    Mentioned as a newly launched S3-based filesystem to compare against ZeroFS.

Technical references and implementation details

Background papers and prior posts