HN Debrief

Sem: New primitive for code understanding – not LSPs, but entities on top of Git

  • Developer Tools
  • Programming
  • AI
  • Open Source
  • Infrastructure

Sem is a command-line tool for code understanding that sits alongside Git and works at the level of code entities rather than lines. It parses functions, classes, and methods with Tree-sitter, builds dependency graphs across files, and offers commands like impact analysis and structural diffs. The pitch is that this is more useful than raw line diffs for both humans and coding agents because it can answer questions like “what code and tests are downstream of this changed function?” instead of just showing edited hunks.

If you are evaluating coding-agent tooling, the interesting part here is not the benchmark claim. It is the idea of giving humans and models a structural map of a codebase so they can target review, testing, and refactors more precisely. Also, treat install flows that touch Git config or hooks as product-critical UX, because even optional shell setup will trigger backlash if it is unclear.

Discussion mood

Mostly positive about the underlying idea of entity-level diffs and dependency graphs, with clear annoyance at the website and setup wording that made optional Git integration look like a hijack. Skepticism focused less on the tool itself and more on oversold AI benchmark claims and unclear install UX.

Key insights

  1. 01

    Structural context as agent harness input

    Using a codebase graph to constrain and validate model behavior is the practical idea here. Instead of dumping whole files into a model or isolating one function with no surroundings, Sem can expose signatures, dependencies, and blast radius so an agent sees the minimum relevant structure and can be checked against it after making changes.

    If you are building internal coding agents, focus on feeding them dependency graphs and explicit change boundaries before chasing another memory layer. This is a concrete way to cut context bloat and review scope.

      Attribution:
    • rohanucla #1
    • hankbond #1
  2. 02

    Caching makes large repos plausible

    The performance story only became credible once the dependency graph stopped being rebuilt from scratch. The SQLite-backed topology cache means unchanged files can be skipped, which is why the author claims warm runs fell from unusable to near-interactive on 71K to 100K file TypeScript repos.

    Ask whether graph tools have incremental indexing and warm-cache behavior before trialing them in a monorepo. Without that, they will feel impressive in demos and get abandoned in daily work.

      Attribution:
    • rohanucla #1
  3. 03

    Tree-sitter beats regex for cross-file analysis

    The jump from ad hoc grep or regex to parser-backed analysis is what lets this survive real codebases. Aliased imports, re-exports, and nested scopes break text matching quickly, while Tree-sitter gives Sem a syntax tree that can support one structural model across languages without depending on full language servers.

    If you are tempted to build lightweight repo intelligence with regex, expect to hit a ceiling fast. Parser-backed extraction is the point where dependency and impact features stop collapsing on edge cases.

      Attribution:
    • rohanucla #1
    • Scaevolus #1
  4. 04

    The same idea extends to data pipelines

    One of the more interesting follow-ons was applying structural diffs to data artifacts, not just source code. The useful questions are not row-level deltas but semantic ones like whether a key stops being unique, a join starts fanning out, or a model changes grain several steps downstream. That reframes “diff” as impact on meaning, not just changed values.

    If your product touches analytics engineering or data platforms, there is room for code-style impact analysis on SQL and derived datasets. Teams care about downstream semantic breakage more than raw file diffs.

      Attribution:
    • cpard #1
    • gwerbin #1 #2
  5. 05

    Entity graphs could split monolithic AI commits

    A compelling adjacent use case is revision surgery after an agent produces one giant mixed commit. Structural change data could help tools like Jujutsu separate unrelated edits into smaller orthogonal revisions that humans can then reorder or squash into sensible units.

    Look beyond review and testing. Structural diffs may also improve commit hygiene and patch decomposition, which is becoming a bigger problem as agents generate large messy changesets.

      Attribution:
    • rohanucla #1
    • jiggunjer #1

Against the grain

  1. 01

    Setup UX poisoned first impressions

    The backlash was not about the core analysis engine. It was about readers believing the tool grabs `git diff` and hooks itself into their workflow before trust is earned. Even after clarification, the naming of `setup` and `unsetup` still felt too invasive for something that should be an explicit Git config snippet or alias.

    When a developer tool touches shell hooks or version control defaults, make opt-in boundaries painfully obvious on the landing page. Ambiguity here will suppress adoption no matter how good the underlying capability is.

      Attribution:
    • jawns #1
    • znnajdla #1 #2
    • OJFord #1
  2. 02

    The benchmark claim weakens the pitch

    The “2.3x more accurate” marketing line made people suspicious because the tasks were tailored to Sem's own entity model rather than real engineering work. That pushed attention away from the much more believable value proposition, which is better code navigation and impact analysis.

    Sell structural tooling on workflows users already recognize, like review scope and regression targeting. Benchmark claims tied to your own abstractions will be read as marketing inflation unless the tasks map cleanly to production use.

      Attribution:
    • onlyrealcuzzo #1
    • awoimbee #1
    • rohanucla #1
  3. 03

    Examples need to prove human value

    Not everyone bought that this solves a meaningful problem out of the box. The convincing case was the concrete monorepo example where a changed parser function fans out through aliased imports and barrel files, because that shows where plain grep and line diffs actually fail for humans, not just for models.

    If you are launching developer infrastructure, lead with one painful real-world workflow and show the before and after. Abstract claims about smarter diffs are too easy to dismiss.

      Attribution:
    • docheinestages #1
    • rohanucla #1

In plain english

CLI
Command-line interface, a text-based way to operate software by typing commands.
Git
A version control system developers use to track code changes and switch between branches of a project.
Jujutsu
A version control tool, often used with Git repositories, that changes how history editing and conflict handling work.
monorepo
A single repository that contains multiple projects, packages, or services together.
SQLite
A lightweight embedded database that stores data in a single file.
Tree-sitter
A parsing system used by editors to build syntax trees for many programming languages in real time.

Reference links

Related code intelligence tools and concepts

  • Kythe
    Mentioned as a comparable project for cross-language code indexing and dependency understanding.
  • Taint checking
    Referenced while discussing extending Sem from structural call graphs into data flow analysis.

Project and ecosystem references

  • Sem project page
    The main submitted project describing entity-based code understanding on top of Git.

Data versioning and artifact diffing

  • DVC
    Raised as a desirable integration point for better data diff tooling on large artifacts.