Launch HN: Expanse (YC P26) – Unlock Wasted GPU Capacity

AI
Infrastructure
Developer Tools
Startups

Expanse says GPU and HPC clusters waste huge amounts of capacity because users ask for far more walltime, memory, and compute than their jobs usually need. The company installs alongside SLURM or Kubernetes, reads submission scripts, source code, and live node telemetry, then predicts resource needs, likely failures, and code-level fixes before the job runs. Their key claim is that this is not an LLM wrapper. It is a cluster-specific multimodal model that learns how a particular environment behaves, because the same workload can perform very differently across hardware topologies.

If you operate expensive shared compute, the biggest near-term win may be better prediction and visibility around memory, walltime, and bursty job phases rather than smarter scheduling alone. If you build in this market, security posture and deployment model need to be legible up front or buyers will dismiss you before they get to the technical value.

June 1, 2026
news.ycombinator.com
Discuss on HN

Discussion mood

Mostly positive on the technical problem and skeptical on the business and deployment details. People who use clusters recognized over-allocation as real and painful, but they pressed on whether the product can handle security expectations, fit how HPC users actually behave, and deliver value when incentives to optimize are weak.

Key insights

Burst-shaped jobs create hidden waste

A lot of wasted capacity sits inside individual jobs, not in obviously idle machines. Real workloads like genomics pipelines swing between CPU-heavy, memory-heavy, and IO-bound phases, but schedulers usually force users to reserve the peak profile for the whole run. That makes a job look fully allocated even when large chunks of its walltime are lightly using the hardware.

Look for waste inside long-running jobs before assuming the main problem is cluster-wide placement. Profiling resource usage over time is likely to unlock more capacity than static per-job averages.

Attribution:

mbreese #1 #2

User incentives fight utilization gains

Researchers usually care about shortest time to result and least operational hassle. If the cluster does not bill them directly for waste, a sloppy bash pipeline that runs today often beats a carefully decomposed workflow that uses fewer resources. That is why over-allocation persists even when everybody knows it is inefficient.

Products in this space should reduce tuning work for users instead of expecting behavior change. If you run a shared cluster, pair recommendations with policy or pricing levers if you want utilization to actually move.

Attribution:

mbreese #1

Security posture must be obvious immediately

The sharpest commercial feedback came from someone who read the homepage and docs and still assumed risky telemetry egress and SaaS dependency. Expanse replied that deployments are air-gapped, data stays in the customer environment, and the daemon is not required for jobs to run. The gap between those two readings is the important signal. Enterprise buyers will reject the product on first impression if the architecture is not unmistakable.

For infrastructure sold into sensitive environments, lead with deployment boundaries, data flow, and failure modes before the optimization story. Put the architecture diagram where buyers can see it in the first minute.

Attribution:

mike_d #1
ismaeel_bashir #1

Against the grain

Cloud providers already do placement optimization

Large clouds and newer GPU providers are not ignoring this problem. They already use oversubscription and smarter placement to squeeze more out of fleets, so the easy wins may be gone in environments that own the full stack. That shifts the strongest use case toward clusters where user-level job requests and on-prem workflow habits create waste the provider cannot automatically smooth away.

Do not assume every low-utilization compute environment needs a new prediction layer. Separate managed cloud fleets from research and enterprise clusters where scheduling input quality is the real bottleneck.

Attribution:

nostrebored #1
aleksiy123 #1

Low utilization is not always waste

Some spare capacity is intentional. Operators may hold back headroom for disaster recovery, failover, or future demand spikes. Expanse said its measurements target waste inside already allocated user jobs rather than reserved idle capacity, which is an important distinction because headline utilization numbers can otherwise overstate the addressable problem.

When evaluating utilization products, ask whether they reduce over-requesting inside jobs or just count capacity that was intentionally reserved. Those are different problems with different buyers.

Attribution:

flounder3 #1
ismaeel_bashir #1

In plain english

GPU ↩

Graphics processing unit, a chip originally designed for graphics that is now widely used to train and run AI models.

HPC ↩

High-performance computing, the use of very powerful computers for simulation, modeling, and other demanding workloads.

IO ↩

Input and output, meaning data reads and writes to storage or network devices.

Kubernetes ↩

An open-source system for deploying and managing software containers across clusters of machines.

LLM ↩

Large language model, a type of AI system trained on huge amounts of text to generate and analyze language.

multimodal ↩

Able to work with more than one kind of input or output, such as text and images.

SLURM ↩

Simple Linux Utility for Resource Management, a common job scheduler for HPC clusters that queues and allocates compute resources.

telemetry ↩

Operational data collected from systems, such as logs and traces, used for monitoring and investigation.

VPC ↩

Virtual Private Cloud, AWS’s isolated virtual networking environment.

Reference links

Company and product references

Expanse website
The company homepage for the product being launched.

Founder evaluation reference

Expanse LLM evaluation post on X
Linked in the launch text as additional detail on the claimed benchmark against general-purpose models.

Related tooling mentioned by commenters

AgentNative traffic shaper post on LinkedIn
A commenter suggested this open source traffic shaping project as potentially relevant to request prediction and load control.