HN Debrief

The iPad was on Tailscale: a WebRTC debugging story

  • Infrastructure
  • Networking
  • Programming
  • Developer Tools

The post is a debugging write-up about a WebRTC app that worked everywhere except one iPad showing a blank page. The root cause was not one bug but two independent assumptions colliding. webrtc-rs used a hardcoded initial MTU, did no path MTU discovery, and kept retransmitting at the same size. Tailscale, on the IPv6 path involved here, treated packets with a Fragment header as unknown protocol and the default deny rule dropped them. Small control traffic still got through, so every obvious health check looked fine while the actual payload path was dead.

If you ship real-time or peer-to-peer networking over overlays like Tailscale, add explicit large-packet tests and fragmented-traffic checks to your diagnostics instead of trusting pings, health checks, or small control messages. Also audit any WebRTC stack you depend on for MTU defaults, path MTU discovery, and message-size handling before a rare edge case turns into a silent outage.

Discussion mood

Strongly positive about the write-up and sympathetic to the bug hunt. The dominant mood was equal parts admiration for the debugging and dread from engineers who have seen MTU and fragmentation failures hide behind healthy-looking small-packet checks.

Key insights

  1. 01

    IPv6 fragment filtering breaks classification

    The packet filter problem likely comes from a real implementation constraint, not random sloppiness. Later IPv6 fragments do not include transport-layer port numbers, so an ACL engine that wants to match on ports cannot classify them the normal way. That makes Tailscale's behavior easier to understand, but not easier to excuse. Older systems handled this with virtual reassembly, connection tracking, or at least first-fragment matching, all of which would have avoided turning a normal fragmented UDP flow into a silent deny.

    If your product enforces ACLs on tunneled traffic, inspect how it handles non-initial IPv6 fragments before assuming UDP support is complete. A narrow classification shortcut can create app-level outages that only appear under specific path sizes.

      Attribution:
    • inigyou #1
    • syllogistic #1
  2. 02

    WebRTC stacks still rely on hardcoded size limits

    The bug was not an isolated Rust implementation fluke. Comments connected it to older Pion SCTP work and to long-standing browser data channel limits where oversized messages have historically failed badly or silently. The useful pattern is that WebRTC often ships with conservative constants, partial exposure of negotiated maxMessageSize, and weak path adaptation. That leaves application developers patching around transport behavior with their own size caps instead of trusting the stack to discover and respect the path.

    Treat message sizing as an application concern even when using mature WebRTC stacks. Set conservative limits, surface negotiated sizes in telemetry, and test across libraries and browsers instead of assuming interoperability covers edge-case packet sizing.

      Attribution:
    • Sean-Der #1 #2
    • syllogistic #1 #2
    • cyanydeez #1
  3. 03

    Healthy checks can miss a dead data path

    Small probes are exactly why MTU failures waste so much time. A ping, auth request, or startup handshake can succeed while the first real payload crosses the fragmentation line and disappears. One commenter saw the same pattern with iOS simulator HTTP3 and QUIC, where the workaround was simply to block the UDP path. That underlines how often teams end up routing around MTU bugs because the usual diagnostics never hit the failing packet size.

    Add staged network checks that exercise the same protocol and payload sizes as production traffic. If you only test reachability with tiny packets, you are blind to the failure mode that actually hurts users.

      Attribution:
    • katericksonnow #1
    • cyberax #1

Against the grain

  1. 01

    This is a rare corner case

    The case for calling this a major Tailscale flaw got pushed back. Fragmented UDP over IPv6 on an overlay, combined with a hardcoded WebRTC packet size, is unusual enough that it can stay invisible in normal traffic for a long time. That does not make the failure acceptable, but it does explain why a popular product could ship with it and why most users never noticed.

    Prioritize fixes based on where your own stack sits in this combination. If you do not rely on large UDP payloads over IPv6 overlays, this is something to track and regression-test, not necessarily an immediate fire drill.

      Attribution:
    • happyopossum #1
    • syllogistic #1
  2. 02

    Fragmentation itself is the underlying design problem

    Another take was that the deeper issue is IP fragmentation, not just either implementation bug. Fragmentation and path MTU discovery have a long history of bad interactions, and modern stacks that drop oversized traffic instead of adapting keep recreating the same class of failure. That framing shifts attention away from this one incident and toward avoiding fragmentation-dependent designs altogether.

    Prefer protocols and libraries that actively size packets to the path and avoid fragmentation in the first place. If your design depends on fragmented UDP working cleanly across diverse networks, expect recurring edge failures.

      Attribution:
    • Veserv #1

In plain english

ACL
Access control list, a set of rules that decides which network traffic is allowed or denied.
HTTP3
HTTP version 3, the version of the web protocol that runs over QUIC instead of TCP.
IPv6
Internet Protocol version 6, the newer addressing system for devices on the Internet.
MTU
Maximum Transmission Unit, the largest packet size a network path or link can carry without fragmentation.
QUIC
A modern transport protocol built on UDP that is commonly used for HTTP/3.
SCTP
Stream Control Transmission Protocol, a transport protocol used by WebRTC data channels to provide ordered or unordered reliable delivery.
Tailscale
A mesh virtual private network service that connects devices over encrypted tunnels using the WireGuard protocol.
TCP
Transmission Control Protocol, a connection-oriented network protocol that handles retransmission, ordering, and flow control.
UDP
User Datagram Protocol, a connectionless network protocol that sends packets without built-in delivery or retransmission guarantees.
WebRTC
Web Real-Time Communication, a set of protocols and APIs for peer-to-peer audio, video, and data transfer.

Reference links

Upstream bugs and fixes

Reference implementations and prior art

Reproductions and background resources