The Compute 100

Omen AI raised a $31M Series A led by Nava Ventures and CRV on Monday to monitor coolant health in liquid-cooled GPU clusters before it becomes a crisis.

The problem: as AI chips run hotter, operators increase water content in glycol-water cooling mixtures to absorb more heat — but higher water concentrations accelerate bacterial growth. When contamination builds undetected, the only fix is a full system flush that takes a rack offline for five to six hours at a cost of millions of dollars per event. Current practice is scheduled maintenance, meaning problems compound before they're found.

Omen's solution is a miniature spectrometer deployed inline with existing cooling infrastructure, monitoring fluid chemistry and microbial load in real time. The system flags bacterial growth and component wear — pumps, seals — before they cascade into a shutdown. The company has roughly a dozen data center customers, including TensorWave, the AMD-based AI compute cloud, whose president said the industry is currently "flying blind" on coolant health.

The market logic compounds with AI capex: every gigawatt of GPU capacity now being deployed runs on liquid cooling that no one is actively monitoring in real time. Omen AI was founded in 2024 and has raised $40M total. Check out the announcement from founder and CEO Zachary Laberge here.

Private Companies

I-Pulse CHIPS Act

I-Pulse signed a $250M CHIPS Act R&D award with the Department of Commerce on June 25 for silicon carbide power semiconductors designed for extreme-environment applications. The BHP-backed startup's core technology is high-voltage pulsed power that fractures rock for geothermal drilling; the SiC switching semiconductors required to deliver those pulses at scale are the focus of the award — components with no current domestic supplier capable of volume production at the performance thresholds I-Pulse requires. The CHIPS Act funding covers R&D through commercialization in partnership with Federal Laboratories and universities. The deal extends the federal semiconductor strategy past fab incentives into SiC switching physics where Chinese suppliers currently hold an advantage.

Public Companies

Qualcomm Customer Win

Qualcomm unveiled the Dragonfly C1000 server CPU at its June 24 Investor Day and disclosed Meta as its first enterprise customer — a multi-generation agreement to power Meta's next-generation server fleet. The C1000 is a 250+ core ARM-based server CPU targeting agentic AI orchestration workloads: high-throughput sequential reasoning and context switching at scale, a profile where fast CPUs have structural advantages over GPUs. Mark Zuckerberg appeared at the Investor Day to confirm the deal. The chip ships H2 2028; Qualcomm is targeting $15B in data center revenue by 2029. When one of the world's largest AI infrastructure operators makes a multi-generation bet on ARM over x86 for its server CPU layer, Intel and AMD's data center CPU businesses are the incumbents at risk — and the transition, though long-dated, is now a signed contract rather than a forecast.

Emerging

A paper published June 23 on arXiv — CompressKV: Efficient KV Cache Compression for Long-Context LLM Inference — demonstrates 32–50× reduction in KV cache memory footprint while retaining 97% of full-cache performance. The approach selectively compresses the key-value cache that transformer models maintain during inference: instead of storing all context activations in full precision, CompressKV identifies which tokens carry the most attention weight and discards or quantizes the rest at generation time. The result is that inference over very long contexts — tens of thousands of tokens — becomes memory-feasible on hardware that would otherwise run out of HBM capacity. The timing matters: as the SK Hynix and Samsung HBM4 supply picture shows, high-bandwidth memory is the physical constraint binding AI inference at scale. Techniques that cut the per-inference memory requirement by an order of magnitude either relax that constraint directly or, more likely, allow the same HBM allocation to serve substantially more concurrent inference requests — the measure that determines inference economics.

DeepSeek released DSpark on June 27 — a speculative decoding inference framework that delivers 60–85% faster per-user generation speed on DeepSeek-V4 without retraining the base model. The mechanism: a lightweight draft module proposes candidate token sequences; the full V4 model then verifies them in parallel batches rather than generating autoregressively. DeepSeek open-sourced the framework and training code under an MIT license as DeepSpec. The infrastructure read: speculative decoding at this efficiency gain means the same GPU fleet serves materially more concurrent users — a serving-layer answer to the throughput problem that requires no additional HBM or silicon. For operators running frontier models at scale, the competitive edge is increasingly determined by the inference stack above the hardware, and DeepSeek's public release sets a benchmark every major inference provider now has to match or explain away.

Watch This Week

This Week — SambaNova Round Close

The ~$1B funding round at $10B valuation, first reported by The Information on June 25, is expected to finalize shortly. Watch for the formal press announcement and whether any new strategic investors are named — an anchor hyperscaler or chip company would signal commercial intent beyond financial return.

This Week — Qualcomm-Modular Regulatory Filing

The $3.9B acquisition signed June 24 enters regulatory review. The timeline to close (H2 2026) depends on antitrust treatment: a software-only AI stack has no direct hardware market share, but the deal's explicit framing as a CUDA alternative may draw scrutiny on whether it strengthens Qualcomm's ability to bundle software with its data center chips.

The Compute 100 — Monday June 29, 2026

Keep Reading