HN Debrief

DNS is for people, not for IT infrastructure

  • Infrastructure
  • Networking
  • Security
  • Developer Tools

The post argues that DNS is mainly a convenience for humans and that internal infrastructure should often skip it. The proposed alternative is to inject IP addresses directly into configs or distribute /etc/hosts with tools like Ansible or pyinfra, on the theory that this removes a fragile dependency from the critical path and makes recovery easier when DNS-related incidents happen.

If you are fighting internal DNS outages, the fix is usually simpler DNS, better dependency design, and reliable fallback paths, not replacing DNS with fleet-wide config pushes. Treat name resolution as a core control plane and harden it like one.

Discussion mood

Overwhelmingly negative. Readers saw the proposal as a reinvention of DNS with worse scaling, worse failure modes, and less support for the real jobs internal name systems already do.

Key insights

  1. 01

    DNS exists because hosts files failed

    The historical point changes the whole frame. DNS was created to replace centrally distributed HOSTS.TXT because copying name mappings to every machine stopped working even at a few hundred hosts. Modern networks move faster, not slower, so tying failover and service changes to rapid hosts-file pushes recreates the exact scaling problem DNS was invented to solve.

    If someone proposes fleet-wide hosts-file distribution, treat it as a design smell and ask why standard DNS patterns are not being used. The burden of proof is on the replacement, because the old failure mode is well known.

      Attribution:
    • JdeBP #1
    • nemothekid #1
    • fulafel #1
  2. 02

    The outages cited were control-plane failures

    The article's own examples undermine its conclusion. Facebook and DynamoDB were discussed as cases where DNS amplified an upstream failure, not cases where DNS itself was the root problem. That matters because swapping out DNS does not remove the need for some system that publishes and updates the canonical answer. It only changes which control plane can poison the whole estate when it breaks.

    Read postmortems for the failed dependency chain, not for the most visible protocol in the blast radius. Fix circular dependencies and unsafe update systems before you rip out a standard component.

      Attribution:
    • necovek #1
    • colechristensen #1
    • louwrentius #1
  3. 03

    Push to every host multiplies operational risk

    Moving from a few resolvers to thousands of endpoints turns one managed database into thousands of partially updated copies. That introduces missed updates, offline hosts, rollout lag across regions, coordination problems between operators, and edge cases where software ignores /etc/hosts entirely. The proposed simplification is only simpler on a whiteboard.

    Prefer centralized, purpose-built resolution with local caching over endpoint-by-endpoint state distribution. Every time you push config to the whole fleet for a naming change, assume you are widening your failure surface.

      Attribution:
    • kube-system #1
    • bravetraveler #1
    • ranger207 #1
    • kassner #1
  4. 04

    Internal DNS does more than A records

    A lot of the value disappears if you reduce DNS to human-friendly labels. Internal systems lean on forwarding, PTR records, SRV records, MX records, virtual hosting, ACME, Kerberos, SSHFP, and consistent answers across devices you do not fully control. You can hand-wave away some of that in a tiny server-only setup, but once the environment gets heterogeneous the missing pieces pile up fast.

    Before replacing internal DNS, inventory every feature riding on it, including the quiet ones. Many migrations fail because teams only count hostname lookups and forget the rest of the stack.

      Attribution:
    • arter45 #1
    • davkan #1
    • necovek #1
  5. 05

    Use DNS less in recovery paths, not everywhere

    The most useful refinement was not to abolish DNS but to keep critical break-glass paths independent of it. People suggested fixed addresses via VRRP, anycast, keepalived, or a bastion pinned in /etc/hosts so you still have a way in when resolution is broken. That preserves DNS for normal operation while avoiding total lockout during rare control-plane incidents.

    Design one or two recovery paths that do not require normal name resolution. That gets you most of the resilience benefit without rebuilding your entire service discovery layer.

      Attribution:
    • trumpdong #1
    • throw0101a #1
    • ryanshrott #1
    • throwway120385 #1

Against the grain

  1. 01

    Small static server fleets can get by

    For relatively small environments with infrequent address changes, avoiding internal DNS for server-to-server traffic can work tolerably well. If bootstrapping is handled by DHCP, PXE, TFTP, or HTTP, and the network is narrow in scope, static mappings may be good enough and easier for one team to reason about. That is a narrow operational choice, not a general design rule.

    If your fleet is small and stable, you can accept simpler static naming, but write down the limits that make it viable. Revisit the choice as soon as churn, heterogeneity, or failover frequency rises.

      Attribution:
    • adrian_b #1
    • simonjgreen #1
    • JackSlateur #1
  2. 02

    The anti-DNS instinct comes from real pain

    The temptation to bypass DNS is not irrational. Production DNS loops, bad dependency chains, and resolver misconfigurations are miserable to debug, so wanting a simpler escape hatch is understandable. The better answer is usually local fallback entries or router-level local DNS registration, not abandoning DNS as the naming layer.

    If your team keeps reaching for hosts-file hacks, treat that as a signal that your DNS operations need cleanup. Add fallback mechanisms and observability before frustration turns into architecture.

      Attribution:
    • ryanshrott #1
    • throwway120385 #1

In plain english

/etc/hosts
A local file on many operating systems that maps hostnames to IP addresses without querying DNS.
ACME
Automatic Certificate Management Environment, the protocol used by systems like Let's Encrypt to issue and renew TLS certificates automatically.
Ansible
An automation tool that pushes configuration and commands to many machines over the network.
anycast
A routing technique where the same IP address is advertised from multiple locations so traffic reaches one of them automatically.
DHCP
Dynamic Host Configuration Protocol, the system that automatically gives devices IP addresses and other network settings.
DNS
Domain Name System, the internet service that translates human-readable site names into network addresses.
HOSTS.TXT
The central host mapping file used before DNS became the standard naming system for the Internet.
Kerberos
A network authentication system that often relies on DNS to locate identity services.
PXE
Preboot Execution Environment, a way for computers to boot over the network.
pyinfra
A Python-based infrastructure automation tool used to configure remote systems.
resolver
The software component that looks up DNS answers on behalf of applications or operating systems.
service discovery
The process of finding which network address currently provides a given service.
SSHFP
A DNS record type that publishes SSH host key fingerprints so clients can verify servers.
TFTP
Trivial File Transfer Protocol, a simple file transfer protocol often used in network boot processes.
VRRP
Virtual Router Redundancy Protocol, a method for multiple machines to share a virtual IP address for failover.

Reference links

Standards and reference docs

Tools and software mentioned

  • CoreDNS
    Suggested as an easier DNS server option for internal infrastructure.
  • dnsmasq
    Mentioned as a lightweight DNS option if BIND feels heavyweight.