We use third-party cookies in order to personalize your site experience. See our Privacy Policy.

Technology thesis · Artificial Intelligence

high conviction emerging

AI safety and alignment

Leading labs concede they cannot yet reliably control systems smarter than their creators, and the gap between capability scaling and alignment research widens with every model generation.

Position maintained continuously · last reviewed Jun 24, 2026

The thesis

Core thesis

The leading AI labs (Anthropic, OpenAI, DeepMind) acknowledge that aligning superintelligent AI is an unsolved problem. Anthropic's Constitutional AI is the most developed approach. Interpretability research (understanding what models actually do internally) is critical but early-stage. The EU AI Act mandates safety requirements. Geoffrey Hinton (Nobel 2024) left Google specifically to warn about AI risks. The tension: safety research slows capability development, creating competitive pressure to cut corners. This is the most important technology problem of the century.

State of the art (2026)

As of mid-2026 the field has shifted from theory to deployed agent risk. Anthropic, OpenAI and Google DeepMind frame agentic misalignment — models behaving like insider threats — as the live problem, after Anthropic's 2025-26 work causally linked internal representations to blackmail-style behaviour in evaluations. Mechanistic interpretability, named one of MIT Technology Review's 10 Breakthrough Technologies 2026, is moving from single circuits to whole-network feature maps via sparse autoencoders. Governance is loosening, not tightening: the EU's 7 May 2026 Digital Omnibus deferred high-risk obligations to December 2027, and the US replaced the AI Safety Institute with CAISI, which signed pre-deployment testing deals with Google DeepMind, Microsoft and xAI in May 2026. Capability still outruns control.

The rest of the file

Everything below is live inside CanaryIQ

The full analysis behind the verdict — the structure is real; the content unlocks when you log in.

Signal stack

Evidence stacked leading → lagging

11 signals
talent
research
patent
expert
operational
regulatory
market

Technology-native KPIs

Metrics that predict trajectory, tracked over time

3 tracked
AI safety research publications
AI safety funding
AI safety researchers at frontier labs

Landscape map

Who builds what — and who depends on whom

63 players · 6 layers

Catalyst calendar

Dated events that will move the position

4 ahead

Technology roadmap

Milestones on the path to maturity

8 milestones

Watchlists

Companies, people and papers — each with a remove-by condition

20 · 20
Companies · 20
People · 20

Decision frameworks

The same call, framed for your desk

Locked
Public Equity
PE / VC
Corporate Leader

Thesis changelog

When our view changed, and why

6 updates

Change our mind

3 disconfirming conditions

The rest is inside

You've read the verdict. The file is much deeper.

The full signal stack, technology-native KPIs tracked over time, the landscape of who depends on whom, the dated catalyst calendar, decision frameworks for every desk, live watchlists and the changelog of every time our call on AI safety and alignment has changed — all live inside CanaryIQ.