HN Debrief

Corrupting a ZFS File on Purpose

  • Infrastructure
  • Storage
  • Open Source

The post is a hands-on experiment in forcing corruption into a ZFS-backed file and observing how ZFS responds. The useful takeaway was not the exact mechanics of `dd` and block offsets, but the reminder that ZFS is built around end-to-end verification. It checks that the data returned for a given block is the data that block was supposed to contain, not just that the drive returned something internally consistent.

If you rely on plain RAID or drive firmware to keep data safe, you are missing failure modes that only filesystem-level checksums can catch. For archival or large critical datasets, pair ZFS-style integrity checks with redundancy and periodic verification instead of assuming modern disks will surface every error cleanly.

Discussion mood

Mostly positive on ZFS and on the experiment as a learning tool. The strongest mood was pragmatic respect for end-to-end checksums, driven by personal stories of silent corruption, bad firmware, and recovery from ugly real-world storage failures.

Key insights

  1. 01

    Filesystem checksums catch failures drive ECC cannot

    Drive-level ECC only validates that the device can read back a sector-shaped chunk that matches its own coding. It cannot prove the firmware wrote the correct payload, stored it at the correct LBA, or returned the block the OS actually asked for. That is why ZFS can flag corruption even when the disk reports a clean read, including cases like the Samsung 840 EVO queued TRIM bug where the device looked healthy from below the filesystem.

    Do not treat successful block reads as proof your data is intact. If the data matters, use a filesystem or verification layer that validates content against higher-level checksums.

      Attribution:
    • throw0101c #1
    • matja #1
    • ssl-3 #1
  2. 02

    Archival storage still needs its own redundancy

    Long-stored disks can develop a few bad sectors that slip past normal handling, and independent hashes are often the only reason the damage gets noticed. PAR2 or RAR recovery records can repair scattered corruption, but commenters were clear that sidecar redundancy is not a substitute for full duplicate copies on separate media and in different locations.

    For cold storage, keep per-file verification data and maintain at least one independent duplicate. Use PAR2 for repairable drift, not as your only disaster plan.

      Attribution:
    • adrian_b #1 #2
    • ramses0 #1
    • wongarsu #1
  3. 03

    ZFS can survive terrible hardware combinations

    One detailed report described a RAIDZ setup built from external USB SMR disks, which is close to a worst-case stack for reliability. Even there, the filesystem usually preserved data through controller issues, dropped drives, corrupted files, and damaged metadata, with recovery possible after manual repair steps. That is a strong vote for ZFS resilience, though not for copying that architecture.

    If you are already stuck with flaky storage, ZFS can buy you recovery headroom. It is still cheaper to avoid fragile USB controllers and SMR-heavy designs than to depend on heroics later.

      Attribution:
    • guardiangod #1
  4. 04

    Multiple copies on one disk can help

    ZFS's ability to keep more than one copy of a block on a single disk sounds odd until you optimize for sector failure rather than whole-drive death. For workloads where isolated block errors are more common than total device loss, extra in-disk copies can reduce corruption exposure without requiring another full mirror device.

    Match redundancy to the failure mode you actually expect. If you care about localized media errors on a single device, block-level duplication may be worth considering alongside pool-level redundancy.

      Attribution:
    • BuildTheRobots #1

Against the grain

  1. 01

    Random byte flips may model the wrong failure

    The experiment's corruption method may not look like a typical failing disk, because real devices often detect unreadable sectors and raise I/O errors instead of quietly returning garbage. That does not undercut ZFS, but it does mean a more realistic test would include truncation, holes, or forced read failures rather than only hand-edited bytes.

    When you test storage recovery, simulate the failures your stack is likely to produce. Include outright read errors and missing sectors, not just silent corruption.

      Attribution:
    • ralferoo #1
  2. 02

    The writing style annoyed more than it informed

    A noticeable side conversation argued that the post leaned too hard on dramatic phrasing and suspense for what was really a straightforward technical walkthrough. The complaint was not about one word choice. It was about a broader grandiose tone that some readers now associate with LLM-assisted writing and find exhausting in explanatory posts.

    If you publish technical writeups for engineers, keep the prose tight and concrete. A clear lab notebook voice will usually travel better than a theatrical one.

      Attribution:
    • anonymous_user9 #1
    • calcifer #1
    • rcxdude #1
    • eigencoder #1

In plain english

dd
A Unix command-line tool used for low-level copying and editing of raw data blocks.
ECC
Error-correcting code memory, hardware memory that detects and corrects some data corruption such as bit flips.
LBA
Logical Block Addressing, the numbered block locations a computer uses to read and write sectors on a disk.
PAR2
Parchive version 2, a file format and toolset that creates recovery data so damaged files can be verified and repaired.
RAIDZ
ZFS's RAID-like storage layout that uses parity across multiple drives to survive disk failures.
SMR
Small modular reactor, a proposed smaller nuclear reactor design intended to be built more cheaply and in standardized units.
SSD
Solid-state drive, a storage device based on flash memory that is much faster than a mechanical hard disk for many workloads.
TRIM
A command that tells an SSD which blocks are no longer in use so the drive can manage flash storage more efficiently.
ZFS
A filesystem and volume manager designed for data integrity, snapshots, and pooled storage, known for end-to-end checksums and self-healing.

Reference links

File recovery and integrity tools

  • Parchive
    Referenced as a way to add repair data for archive files so corrupted files can be reconstructed.

Related experiments