Blog → Perspective

Nobody is winning enterprise AI yet.

Microsoft Copilot's enterprise ROI is widely reported as disappointing. Pilots that didn't convert. Productivity numbers that didn't move the way the deck said they would. Renewals that came back smaller than the original deal.

The vendors building competitive products are quietly elated. Their decks now write themselves: "Microsoft tried it. It didn't work. Try us instead."

We are not going to do that.

Copilot is the canary, not the failure

Microsoft has the distribution. They are inside Outlook, Word, Excel, Teams, SharePoint, and the directory that knows who reports to whom. They have the security posture, the enterprise contracts, the procurement relationships, and the budget to throw at the problem until it works. If Copilot couldn't get clean ROI on a chatbot-plus-productivity layer wired into the seat every employee already has, the problem is not the implementation. The problem is the shape of enterprise AI as everyone has been trying to build it.

That's the hard part to say out loud. Most of the companies competing in this space — including this one — have shipped products that bet on broadly the same shape. Wrap a model in some chat scaffolding. Connect it to the documents. Let people ask it things. Hope the answers are good enough often enough to pay for themselves.

The Copilot ROI dip is the first big public data point telling us that the shape might be wrong. Or at least, not enough on its own.

Anyone in this market who tells you they have the formula is selling you something. We are eighteen months into "enterprise AI" being a thing. There is no playbook. There are no winners yet. There are only people running honest experiments and people pretending to have answers.

The vibes-driven failure mode

We wrote about this from the engineering side in Vibes don't ship. Systems do., and from the regulatory side in Vibes don't comply. The same pattern shows up here, on the buyer side: a product gets bought because the demo was impressive, wired into a workflow that already worked, and judged by feel rather than by a measurable change in the loop the workflow was part of.

"Saves us hours" is not a measurement. It is a vibe. When the renewal conversation arrives and the CFO asks how many hours and on what, the answer is rarely structured enough to defend. The pilot enthusiasm meets a procurement spreadsheet, and the spreadsheet wins.

This isn't a Microsoft mistake. It's an entire-industry mistake. We have all been selling and buying chat-shaped AI into enterprise workflows that don't have a measurable loop attached. Without a loop, the AI's contribution is a feeling. Feelings don't survive the second budget cycle.

What we think the shape might actually be

We have a hypothesis. We are not going to dress it up as a proven model — we don't have the years of evidence yet, and we wouldn't believe anyone who claimed otherwise.

The hypothesis: enterprise AI returns are concentrated in places where the AI is wired into a measurable operating loop, not bolted onto an existing workflow.

The chat-on-top-of-Office shape spreads thin across every workflow and never owns one. The result is exactly what Copilot's ROI numbers suggest: real value in scattered places, not enough to justify the seat price across the whole org. The alternative shape is narrower and deeper. Pick the loops that matter — the ones where the org is already trying to make a decision, ship a thing, govern a process. Wire AI into the orientation phase of those loops. Make the loop itself measurably faster, tighter, or more honest. Then count the loop as the unit of return.

We call this governed orientation. The "governed" part is what protects the loop from the vibes-driven failure — every input is attributed, every output carries confidence, every promotion through a quality gate is recorded. The "orientation" part is what concentrates the AI in the place it can actually move the dial — between observing what's happening and deciding what to do about it.

Whether this hypothesis is the right one is something the next twenty-four months will tell us. We are running it as our own bet. We will say so when the numbers move and we will say so when they don't.

What we don't know

This is the section every vendor leaves out, and the one that matters most.

We don't know if governed orientation is the right shape for every kind of organisation. It might fit professional-services firms (the bet we are building toward) better than it fits manufacturing, retail, or healthcare. We don't know if our specific implementation of governance — five-tier confidence, brain-level isolation, write gates that cannot be bypassed — is the right balance between rigour and friction. We don't know if the operators who would benefit most from this shape can actually adopt it without an integrator on hand.

We don't know how much of the current enterprise AI struggle is shape-of-product, how much is change-management, how much is genuinely the wrong technology for the job, and how much is just the dip every new technology category goes through before adoption patterns stabilise. The honest answer is "some of each, in proportions nobody has measured." If a vendor tells you they know the proportions, ask them how they measured.

We don't know what enterprise AI looks like in 2028. Nobody does. The people who will be most right will be the ones who kept their orientation tight enough to update fast when the data arrived. The people who will be most wrong will be the ones who locked in a thesis at the height of the hype and shipped it forward without ever revisiting.

"Frontier markets reward the discipline to update, not the certainty to commit. The question isn't whether your AI thesis is right today. It is whether your loop is tight enough to notice when it stops being right."

The frontier ethic

Three commitments, made in public so they can be held to.

We will not claim certainty we don't have. When something works in our deployment data, we will say so and link to the data. When something doesn't, we will say so. When we change a thesis, we will say what changed our mind. When a competitor's piece of the picture is correct and ours is wrong, we will say that too. The brand cost of admitting uncertainty in the short term is smaller than the brand cost of being caught having claimed certainty falsely in the long term.

We will compete on discipline, not prophecy. Anyone can say where AI will be in three years. The hard thing is to say where it is right now and what specifically you are doing about it tomorrow. We would rather lose a deal to a vendor who outperforms us on operating discipline than win one by selling a future we can't deliver.

We will treat the buyer's loop as the unit of analysis. Not seats. Not licences. Not "AI deployed enterprise-wide." When we propose work, we will name a specific loop in your operation, a specific thing we expect to change about it, and a specific way we expect to measure that change. If we can't name one, we won't propose the work.

Where this leaves Copilot

Microsoft will probably figure out their next move in 2026 — they have the resources to absorb the dip and pivot. Copilot v2 (or whatever they call it) will likely be narrower, more loop-aware, and harder to point at as a generic disappointment. By the time that lands, the second wave of enterprise AI will be underway, and the discourse will have moved on.

What the dip leaves behind is bigger than Copilot. It leaves a procurement memory: "we tried that, it didn't pay back." That memory will set the bar for everything that follows, including everything we ship. The vendors who survive the next eighteen months will be the ones who can answer the procurement question — what specifically changed in our operation, by how much, measured how — without flinching.

That is the bet we are running. We don't know if we are right. We do know we won't be the ones telling you we are.

If you are mid-rollout and feeling the dip, or pre-rollout and trying to avoid it, we would rather think it through with you than sell to you. The conversations on this side of the Copilot reckoning are more honest than the ones on the other side — and the honest conversations are where the real decisions get made.

Frontier · not feature

Compete on discipline, not prophecy.

A governed loop wired into the part of your operation where AI can actually move the dial — measured by what changes in the loop, not by how the chat felt.