Technology thesis · Semiconductors & Chips

medium conviction growth

Inference-Optimized Semiconductors

Inference is now the majority of AI compute spend, and custom silicon – hyperscaler TPUs and Trainium plus Cerebras – is the real threat to Nvidia’s pricing power, not its training moat.

Position maintained continuously · last reviewed Jun 24, 2026

The thesis

Core thesis

As AI shifts from training to deployment, inference becomes the dominant workload and the largest share of compute cost over a model's lifetime. Hyperscaler custom silicon — Google TPU (Ironwood), AWS Trainium/Inferentia, Microsoft Maia, Meta MTIA — plus wafer-scale and dataflow insurgents such as Cerebras and SambaNova are the real pressure on NVIDIA's inference pricing power, not its training moat. The contest is increasingly about cost-per-token at production scale, where domain-specific designs can out-economise general-purpose GPUs.

State of the art (2026)

By mid-2026 inference, not training, is the centre of gravity in AI silicon, and the competitive picture has split. Nvidia's Blackwell (B200, roughly 2.5x H100 inference throughput) still anchors 60–75% of inference accelerators, but custom silicon is eating the margin: Google's seventh-generation Ironwood TPU underpins Anthropic's ~1GW-in-2026 commitment of around one million chips, with a further 3.5GW via Broadcom from 2027. AWS Trainium, Microsoft Maia and Meta MTIA all scale internally. Among merchant insurgents, Cerebras floated in May 2026 (CBRS, ~$5.55B raised) on the back of its OpenAI cloud deal, while Groq, SambaNova, d-Matrix and Etched chase specialised decode economics. Etched's Sohu transformer ASIC has still not shipped in volume.

The rest of the file

Everything below is live inside CanaryIQ

The full analysis behind the verdict — the structure is real; the content unlocks when you log in.

Signal stack

Evidence stacked leading → lagging

10 signals

talent

research

patent

expert

operational

market

Technology-native KPIs

Metrics that predict trajectory, tracked over time

4 tracked

AI Inference Chip Market

NVIDIA Inference GPU Share

Inference vs. Training Spend

Custom ASIC Programs

Landscape map

Who builds what — and who depends on whom

75 players · 6 layers

Catalyst calendar

Dated events that will move the position

8 ahead

Technology roadmap

Milestones on the path to maturity

9 milestones

Watchlists

Companies, people and papers — each with a remove-by condition

20 · 19 · 2

Companies · 20

People · 19

Decision frameworks

The same call, framed for your desk

Locked

Public Equity

PE / VC

Corporate Leader

Thesis changelog

When our view changed, and why

5 updates

Change our mind

2 disconfirming conditions

The rest is inside

You've read the verdict. The file is much deeper.

The full signal stack, technology-native KPIs tracked over time, the landscape of who depends on whom, the dated catalyst calendar, decision frameworks for every desk, live watchlists and the changelog of every time our call on Inference-Optimized Semiconductors has changed — all live inside CanaryIQ.