HN Debrief

Google copybara: moving code between repositories

  • Open Source
  • Developer Tools
  • Infrastructure
  • Programming

Copybara is Google’s tool for syncing code between repositories while applying scripted transformations, preserving enough history to keep authorship and blame useful, and supporting workflows where one codebase has to exist in two different shapes. The obvious case is Google’s internal monorepo exporting projects to GitHub, but people pointed out the reverse direction is just as important: importing upstream open source into a monorepo, keeping private forks close to upstream, or translating code between different build and import conventions. That is the center of gravity here. Copybara is not a fancy replacement for submodules, subtrees, or plain repo mirroring. It exists because mirrors are often not exact mirrors.

If you only need a mirror, use simpler Git tooling. Reach for Copybara when you must keep code in two places with deterministic rewrites such as monorepo layout changes, build file translation, or stripping internal-only content.

Discussion mood

Mostly positive from people who have wrestled with monorepos, open sourcing internal code, or tracking upstream forks. The enthusiasm was tempered by two recurring complaints: it is slow, and bidirectional workflows become painful unless the sync is tightly controlled and mostly one-way.

Key insights

  1. 01

    History mapping uses rewritten commits

    Copybara keeps useful provenance without keeping original Git identity. It cherry-picks and rewrites commits, then stores the source revision in a GitOrigin-RevId trailer. That explains why blame survives while SHA-based tooling does not, and why bidirectional sync only works cleanly when both sides stay close enough for those rewritten histories to be correlated.

    Do not design downstream automation around matching commit SHAs across repos. If you adopt Copybara, plan explicit provenance handling around commit trailers and define a single source of truth early.

      Attribution:
    • rnagulapalle #1
    • eddd-ddde #1
  2. 02

    Bidirectional sync only works with strict discipline

    Two-way workflows are viable only when the mirrors stay nearly current and the transforms are deterministic enough to invert. The workable model is not free-form editing in two places. It is export from one source of truth, keep the other side synchronized, then import change requests back through the inverse transform before divergence grows.

    If you need contributions from both sides, treat the secondary repo as a review surface, not an independent line of development. Put automation around frequent syncs and reject workflows that let the repos drift.

      Attribution:
    • dmoy #1 #2
    • eddd-ddde #1
  3. 03

    Josh is the stronger modern comparison

    Josh was singled out as more than a subtree replacement. It can dynamically expose monorepo directories as separate repos, and commenters said Rust chose it because subtree performance falls apart on larger repositories. That places Copybara in a narrower category. It is a transformation engine first, while Josh is attractive when the hard part is slicing and presenting a monorepo cleanly.

    If your problem is repo slicing and scalable monorepo views, evaluate Josh before defaulting to Copybara. If your problem is build-system and path rewriting across boundaries, Copybara still fits better.

      Attribution:
    • wasting_time #1
    • vlovich123 #1
    • IshKebab #1
    • MarkSweep #1
  4. 04

    The open source repo is a mirror

    The repo’s oddities make more sense once you realize the public Git repository is downstream of Google’s internal systems. External pull requests are accepted, but they are merged internally and re-exported. Piper is not literally Perforce, even if it is API-compatible, and Google uses Critique rather than Gerrit for code stored there. That explains why public tooling support can feel partial or strange around a project that clearly matters inside Google.

    Expect friction when extending Google-origin open source projects that are managed from an internal monorepo. Before building on one, check whether the public repo is truly authoritative or just an export surface.

      Attribution:
    • dijit #1
    • bruckie #1
    • Arainach #1
    • lima #1
  5. 05

    Performance is a real operational constraint

    People who used Copybara in production warned that speed is not a footnote. Remote-to-remote runs and larger histories can be painfully slow, to the point that some teams replaced it with custom Bash plus git-replace and git-filter-repo. Using local repositories for both origin and destination was the concrete tip that came up for bulk exports.

    Test Copybara on your actual history size before committing to the workflow. If throughput matters, benchmark local repo runs and keep a fallback path with lower-level Git tooling.

      Attribution:
    • veyh #1
    • nbobko #1

Against the grain

  1. 01

    One-way extraction is the sane use

    The most practical use is not long-term synchronized coexistence. It is carving a subproject out of a bigger repository, preserving enough history for blame, then moving ongoing development to the new repo. That turns Copybara from infrastructure into a migration tool and avoids the hardest class of sync problems entirely.

    If your real goal is repo separation, use Copybara once and then stop syncing. That will usually beat years of maintaining a fragile cross-repo pipeline.

      Attribution:
    • klodolph #1
  2. 02

    Simple mirrors should avoid Copybara

    For plain replication between Git hosts, Copybara adds complexity you do not need. Basic vendor mirroring in GitLab or other Git tools is easier to reason about and less exposed to Google-specific maintenance risk. Copybara only starts paying rent when you need systematic transformations on the way through.

    Write down the exact transforms you need before choosing this tool. If the list is empty or trivial, use ordinary mirroring and keep the operational surface small.

      Attribution:
    • the_dude_ #1
    • dmoy #1
  3. 03

    A library may still be cheaper

    Some commenters pushed back on the whole premise. If multiple repos share code, extracting a real library or separate repo often produces a cleaner system than perpetual copy-based syncing. Copybara solves a real monorepo boundary problem, especially inside Google, but outside that context it can be a sign you are postponing a cleaner dependency model.

    Do not let Copybara become an excuse to dodge module boundaries. Revisit whether a package, service, or standalone repo would lower long-term coordination cost.

      Attribution:
    • zem #1 #2
    • krick #1
    • klodolph #1

In plain english

BUILD file
A file that declares how code should be built and tested in systems such as Bazel or Blaze.
Critique
Google’s internal code review tool for code stored in Piper.
Gerrit
A web-based code review system often used with Git repositories.
Git subtree
A Git feature for embedding and synchronizing one repository or directory inside another repository.
git-filter-repo
A Git history rewriting tool used to extract, rename, or clean up parts of a repository.
git-replace
A Git feature that temporarily substitutes one object, such as a commit, for another during history operations.
GitOrigin-RevId
A commit message trailer that Copybara adds to record the original source revision after rewriting a commit.
Josh
A tool for synchronizing and projecting parts of a monorepo into separate repository views.
Jujutsu
A newer version control system, often abbreviated as JJ, that can interoperate with Git while offering different workflows.
monorepo
A single repository that stores many projects or components together instead of splitting them across separate repos.
Perforce
A commercial version control system commonly used for large codebases, especially in game development.
Piper
Google’s internal source control system for its monorepo, designed to be compatible with Perforce APIs but implemented separately.

Reference links

Alternative tools and approaches

Case studies and implementation notes

Google internal workflow context

Historical and archived references