AI agents

AI agents are in production by 2026; reliability in unstructured environments – not raw model capability – now decides which enterprise use cases scale through 2027.

Position maintained continuously · last reviewed Jun 24, 2026

The thesis

Core thesis

Agentic AI – systems that autonomously plan, use tools, and execute multi-step tasks – is the defining AI paradigm shift of 2026. Anthropic Claude, OpenAI (GPT-5.5), Google Gemini 3, and orchestration frameworks such as LangChain/LangGraph power most deployments. Enterprise agents handle code development, customer service, legal review, financial analysis, and admin support. But reliability in unstructured, consequential environments remains the constraint that decides which use cases scale past supervised pilots. The EU AI Act high-risk classification for autonomous agents in employment, credit and legal was deferred from August 2026 to December 2027 under the Digital Omnibus, though GPAI enforcement powers still begin August 2026.

State of the art (2026)

By mid-2026 agentic AI has crossed from demo to revenue. Anthropic Opus 4.8 (May 2026) posts 84% on the Online-Mind2Web computer-use benchmark and ships Claude Managed Agents that run in customer-controlled sandboxes against private MCP servers. Cognition (Devin), having absorbed Windsurf, runs near a $492M revenue run-rate after cutting pricing to $20/month plus usage. Sierra, Bret Taylors customer-service agent firm, raised $950M in May 2026 at a $15.8B valuation on roughly $150M ARR. The open question is no longer capability but reliability and unit economics in unstructured, consequential workflows – the variable that decides which use cases scale past supervised pilots into default deployment.

The rest of the file

Everything below is live inside CanaryIQ

The full analysis behind the verdict — the structure is real; the content unlocks when you log in.

Signal stack

Evidence stacked leading → lagging

11 signals

talent

research

patent

expert

operational

regulatory

Technology-native KPIs

Metrics that predict trajectory, tracked over time

1 tracked

Agent task success rate

Landscape map

Who builds what — and who depends on whom

67 players · 6 layers

Catalyst calendar

Dated events that will move the position

4 ahead

Technology roadmap

Milestones on the path to maturity

8 milestones

Watchlists

Companies, people and papers — each with a remove-by condition

20 · 20

Companies · 20

People · 20

Decision frameworks

The same call, framed for your desk

Locked

Public Equity

PE / VC

Corporate Leader

Thesis changelog

When our view changed, and why

5 updates

Change our mind

3 disconfirming conditions

The rest is inside

You've read the verdict. The file is much deeper.

The full signal stack, technology-native KPIs tracked over time, the landscape of who depends on whom, the dated catalyst calendar, decision frameworks for every desk, live watchlists and the changelog of every time our call on AI agents has changed — all live inside CanaryIQ.