HN Debrief

How we run Firecracker VMs inside EC2 and start browsers in less than 1s

  • Infrastructure
  • Security
  • Developer Tools
  • AI
  • Cloud

The post explains how Browser Use moved from slower, operationally awkward browser infrastructure to Firecracker microVMs on AWS, then chipped away at launch latency with snapshotting, lazy memory loading through userfaultfd, huge pages, CPU pinning, and a few smaller Chromium startup fixes. The claim is that each browser gets VM-grade isolation, starts in under a second, and can still support sticky browser profiles, long-running sessions, and anti-detection tweaks that matter for web agents and automation. A key detail the post underplays is that this only became feasible on non-metal EC2 very recently. AWS added nested virtualization on certain virtualized instance families in February 2026, which removed the old need to run on bare metal if you wanted Firecracker inside EC2.

If you need short-lived browser sessions with strong isolation, Firecracker snapshots are now more practical on standard EC2 because AWS recently added nested virtualization on some instance types. But the bigger product question is not raw startup time. It is whether your workload really needs stealth browsers and long-lived sessions, or whether a much simpler Lambda or container setup will do.

Discussion mood

Mixed. The infra work around Firecracker, snapshotting, and isolation impressed people, but the company’s stealth-browser and residential-proxy angle triggered a lot of hostility because many readers see it as bot-enablement that shifts cost onto site operators.

Key insights

  1. 01

    Nested virtualization is the hidden enabler

    What makes this setup newly practical is not just Firecracker tuning. It is AWS finally allowing nested virtualization on some regular EC2 instance families. That change lets you run Firecracker microVMs inside EC2 VMs instead of paying for metal, which is a huge operational shift. The catch is that capacity is still constrained and some people report odd KVM instability, so this is not yet boring commodity infrastructure.

    If you want to copy this design, first check whether your target AWS regions and instance families actually have enough nested virtualization capacity. Treat it as an emerging platform feature, not something you can assume will scale smoothly without AWS help.

      Attribution:
    • sudb #1
    • gregpr07 #1
    • roboben #1 #2
    • thundergolfer #1
    • Reformedot #1
  2. 02

    Lambda already covers simpler browser jobs

    For screenshot APIs and short stateless work, packaging Chromium into AWS Lambda can get you most of the practical benefits with far less machinery. The trade is straightforward. Lambda cold starts exist, but hot reuse smooths them out at volume. Browser Use’s custom stack earns its keep when you need long sessions, low-level host control, persistent profiles, and pricing that favors minute-scale automation over per-invocation screenshots.

    Do not start with Firecracker because it sounds advanced. If your browser work finishes in seconds and does not need stealth or long-lived state, price out Lambda first and keep the architecture simple.

      Attribution:
    • timojeajea #1 #2
    • Reformedot #1 #2
  3. 03

    MicroVMs solve a real isolation problem

    The strongest case for Firecracker is security isolation, not just speed. Browsers execute hostile code all day, and container isolation depends on the host kernel staying intact. Several comments argued that for internet-facing browser fleets, that is too weak a boundary to trust on its own. Firecracker gives each browser its own guest kernel and also makes full VM snapshot and rollback part of the design, which is hard to match with ordinary containers.

    If your product runs untrusted pages for many customers, treat containers as an operational convenience, not your final security story. Budget for a stronger isolation boundary before you scale.

      Attribution:
    • SomaticPirate #1
    • arianvanp #1
    • WhyNotHugo #1
    • simonreiff #1
    • mike-grant #1
    • roboben #1
    • rvz #1
  4. 04

    Chromium was chosen for stealth, not elegance

    Switching to a smaller browser engine like Lightpanda could cut memory and improve startup, but it breaks the business goal. The company is optimizing for undetectable automation, and Chromium is still the browser that can be modified to look most like the real thing. That means they are accepting Chromium’s bloat because stealth compatibility beats clean-sheet performance.

    Pick the browser engine that matches your bottleneck. If detection resistance is central, you may be forced into Chromium and all its overhead. If not, a lighter engine can simplify everything.

      Attribution:
    • hobofan #1
    • Reformedot #1
  5. 05

    Warm pools are a stopgap, not the end state

    Several people suggested the obvious answer to startup latency: keep browsers or VMs warm. The more interesting response was why that is still unsatisfying. Warm pools burn resources continuously, burst badly when traffic shape changes, and get messy when customers need different browser flags, fingerprints, or features. A post-Chromium snapshot would preserve the speed benefit without carrying a pool management problem forever.

    Warm pools are fine when your workload is narrow and predictable. Once per-session customization starts to matter, snapshot-based startup becomes a better long-term direction than just keeping more idle capacity around.

  6. 06

    userfaultfd is doing the heavy lifting

    One technical detail that stood out to systems people was the use of userfaultfd for lazy memory loading. That lets the host control how guest memory gets populated on page fault, which is exactly the kind of trick that makes snapshot resume fast without pulling every page into RAM upfront. It is a reminder that the headline latency win is coming from low-level memory behavior, not just from 'using Firecracker.'

    If you are chasing sub-second startup from snapshots, focus on memory restoration strategy as much as on the snapshot mechanism itself. Faster resume often comes from loading less, not from booting faster.

      Attribution:
    • CompuIves #1

Against the grain

  1. 01

    Stealth browsers still look like bot infrastructure

    The harshest reaction was not about architecture at all. It was about the product being sold. Once a company advertises bypassing anti-bot systems and leans on residential proxies, many readers stop seeing a neutral browser platform and start seeing infrastructure for scraping and abuse. That framing changes how the technical achievement lands. The better the stealth works, the more cost gets pushed onto everyone trying to run a public website.

    If your product depends on stealth automation, expect infrastructure buyers and partners to judge the business model as much as the tech. Plan for trust, policy, and reputational risk, not just throughput and latency.

      Attribution:
    • losteric #1
    • cute_boi #1
    • GrinningFool #1
    • sroussey #1
    • eab- #1
  2. 02

    There are legitimate uses with no API alternative

    A credible counterpoint is that many everyday automation tasks are benign and still get blocked. People cited change detection, price monitoring, link checking, software they already pay for but can only access through a browser, and even privacy cleanup on directory sites. In that framing, stealth is not about abuse. It is a workaround for a web that refuses structured access unless you are a giant partner.

    If you are building against third-party sites, separate low-rate utility automation from bulk extraction in your product and policy design. That distinction affects user trust and may determine whether customers see the tool as necessary or predatory.

      Attribution:
    • baby_souffle #1
    • nateb2022 #1
    • dagi3d #1
    • mystifyingpoi #1
    • MayCXC #1
    • stogot #1
  3. 03

    The benchmark story is still too vague

    Some readers were unconvinced by the writeup because the numbers are not broken down cleanly enough. The post names a few optimizations and shows big speedups, but it does not fully account for the path from 9.8 seconds to 400 ms or explain the profiling process in enough detail to reproduce the work. That leaves the piece feeling more like a marketing narrative than a rigorous engineering report.

    If you publish performance claims to win technical credibility, include the instrumentation, the per-step deltas, and what remains unexplained. Otherwise smart readers will assume the missing detail hides the real story.

      Attribution:
    • amarshall #1
    • Dibby053 #1

In plain english

API
Application Programming Interface, a way for software to call another service programmatically.
Chromium
The open source browser project that Google Chrome is built on.
cold start
The extra startup delay when a serverless function or service instance has to start from scratch instead of reusing an already running instance.
container
A packaged application environment that shares the host operating system kernel while isolating processes and dependencies.
CPU pinning
Binding a process or virtual CPU to specific physical processor cores to reduce scheduling overhead and improve consistency.
EC2
Amazon Elastic Compute Cloud, Amazon Web Services' virtual server product.
Firecracker
An open source micro virtual machine system from Amazon that runs lightweight virtual machines with stronger isolation than containers.
huge pages
A memory management feature that uses larger-than-normal memory pages to reduce overhead and improve performance.
KVM
Kernel-based Virtual Machine, the Linux feature for hardware-assisted virtualization.
Lambda
AWS Lambda, Amazon Web Services' serverless compute service that runs code on demand without managing servers.
microVM
A very small virtual machine designed to start quickly and use few resources.
nested virtualization
Running a virtual machine inside another virtual machine, so a guest machine itself acts like a host for more guests.
residential proxies
Proxy servers that route traffic through home internet connections so requests look like they come from ordinary consumer devices.
snapshotting
Saving the full state of a virtual machine or process so it can be restored quickly later.
userfaultfd
A Linux kernel feature that lets a user-space program handle page faults and decide when and how memory pages are loaded.

Reference links

Cloud infrastructure and virtualization references

  • AWS nested virtualization announcement
    Used to explain that Firecracker inside regular EC2 only became possible recently on certain instance families.
  • copy.fail
    Referenced as an example resource related to kernel and container breakout concerns.

Alternative browser infrastructure products

  • web-access-mcp
    Shared as a simpler browser subprocess setup for web access and automation.
  • Lightpanda
    Suggested as a lighter browser engine with better CPU and memory behavior, though weaker for stealth.
  • shellbox.dev
    Mentioned as a simpler alternative to running Firecracker inside EC2.
  • docker-android
    Referenced in a discussion about running Android browsers for reinforcement learning workloads.

Bot access and payment models

  • Cloudflare Pay Per Crawl
    Raised as a possible litmus test for whether stealth browser platforms support compensated bot access instead of bypassing controls.

Related cloud browser history