A handful of companies in the world train frontier language models. Everyone else builds the system that makes those models useful — the scaffolding that decides what an agent remembers, what it's allowed to do, and which systems it can actually touch.
That scaffolding has a name now. The field has landed on the harness. The model is the engine. The harness is everything else: prompts, memory, tools, identity, authority, and the control flow that ties them together. Two agents running on the same model with different harnesses are not the same agent. They produce different work, make different mistakes, and expose different risks.
The market has started paying attention. Manus, the Chinese agent platform, has publicly rebuilt its harness five times in under a year — each rebuild driven not by a model upgrade but by what the harness was doing wrong. Meta's reported acquisition of the company is, in effect, a bet on harness work, not model work. Vercel cut its AI SDK's surface area by roughly 80% on the thesis that fewer, better tools give agents more leverage than more tools. When a senior engineer on LinkedIn summarised the shift as "if you're not the model, you're the harness", it stuck because it was already true.
ORCA is a harness. This is a tour of what that means when you stop treating it as plumbing and start treating it as the product.
The five layers
An honest agent architecture has five layers, not one. Each one fails independently. Each one has its own failure mode, its own observability signal, and its own governance surface.
Serving. How the agent is reached — authenticated transport, identity, rate limiting, audit. In ORCA every agent call is authenticated at the edge with enterprise-grade identity, scoped to a role, and logged. If the serving layer is wrong, nothing above it matters, because the wrong caller has the wrong reach.
Orchestration. How work is decomposed, scheduled, and recovered. A named workforce of agents runs to a declarative schedule, with heartbeats and reasoning capture on every turn. The orchestrator decides who does what, and when — and it has to survive a model time-out, a network blip, or a tool that simply refuses to answer.
Context. What the agent knows at the moment it acts. This is where most harnesses quietly fail. Context is not a single prompt; it is the composite of injected memory, retrieved knowledge, live operational signal, and recent conversation. Get this wrong and the agent behaves like an amnesiac with opinions.
Tools. What the agent can actually do in the world. Vercel's 80% cull was not a tidying exercise — it was a recognition that every extra tool is a distractor for the model and a risk for the operator. A small, sharp, well-named tool set out-performs a large one.
LLM core. The model itself. Interchangeable by design. ORCA routes different kinds of work to different classes of model — frontier reasoning for heavy lifting, efficient models for bulk cognition, research-tuned models for sweeps — and can swap any of them without touching the layers above. The core is a dial, not a deity. The harness is what stays constant when the dial moves.
A useful test: if you could swap the model underneath your agent tomorrow and the behaviour would fundamentally change, you don't have an agent — you have a prompt. The harness is what makes the agent itself a durable object.
IMPACT — the six pillars of a working harness
The five layers are the architecture. IMPACT is the operating discipline — six questions every harness has to answer, regardless of how it's built. Intent, Memory, Planning, Authority, Control flow, Tools. Skip any one of them and you get a different kind of broken agent.
Intent. What is the agent for? Not a job title, not a personality, but a scope — what it owns, what it escalates, what it refuses. In ORCA every agent has a named role and a line manager. One runs the day as chief of staff; another owns delivery; another watches security posture; another reads code and surfaces candidates for improvement. If you asked any of them "is this yours?" the answer would always be the same. That's intent.
Memory. What the agent knows, structured by half-life. Short-term memory is the current conversation and the current plan. Long-term memory is the brain — personal, practice, engagement, support, governance — each with its own access rules and its own confidence threshold. In ORCA the brain-first discipline is not a suggestion: before any agent responds, a current brain extract is injected into its workspace, and the agent is required to search before it stores. Search, decide, store if new. Every agent. Every turn.
Planning. How the agent decomposes a goal into steps it can actually execute. Good planning looks like a ranked list with decision points, not a monologue. In the ORCA harness, the chief of staff plans the day across the rest of the workforce; delivery agents plan the sprint across the week; an adversarial auditor plans a daily challenge pass against what's already stored. Planning failures are the most visible kind of harness failure — you can see the shape of a plan that doesn't fit its own goal.
Authority. What the agent is permitted to decide unilaterally, what it must propose, what it is forbidden from touching. Authority is enforced in the serving layer, not in the prompt. Agents write only to their own personal brain; promotion to the shared practice brain requires a line manager to approve; promotion to governance requires the Founder. Specialists see fewer tools than line managers, who see fewer than the Founder. The model is never in charge of its own permissions.
Control flow. What happens when something breaks. Every agent call has a timeout, a retry strategy, a heartbeat, and a place to log reasoning. A research run that exceeds its budget is killed and re-queued. A frontier-model call that fails falls back to a local model rather than staying silent. Control flow is the difference between an agent that stops and an agent that stops politely.
Tools. What the agent can actually do. ORCA converged on a small set of generic tools — find, store, think, reveal, act, support — after collapsing a much larger set of specific ones. The result is a narrower surface for the model to misuse and a wider one for the operator to govern. Every tool is role-gated at the server; an invalid operation is refused at the edge, not negotiated in the prompt.
Short-term memory vs. long-term memory
Every serious agent harness has to reconcile two kinds of memory, and they fail for different reasons.
Short-term memory is the working set — the conversation, the current task, the transient observations. It is fast, cheap, and disposable. The failure mode is overflow: stuff too much in, and the model loses the thread. The discipline is ruthless pruning of what is actually relevant to the next action.
Long-term memory is the brain. It is slow, expensive to keep clean, and dangerous if wrong. The failure mode is drift: stale entries pretending to be current, contradictions accumulating, signal diluted by noise. The discipline is an auditor — a separate agent whose only job is to challenge what is already stored.
ORCA handles these differently on purpose. Short-term memory lives in the agent's own workspace — hot, local, pruned on every turn. Long-term memory lives in a governed vector store, behind the write gate, tokenised on the way in, challenged continuously once there. The harness doesn't pretend these are the same substance; it gives each one its own code path and its own governance.
Guardrails are not prompts
"A prompt that says 'do not output PII' stops nothing. A pipeline that cannot accept PII stops everything. The difference is not one of emphasis — it is one of architecture."
Every serious harness eventually reaches the same conclusion: safety language in prompts is theatre. The model will occasionally, under pressure, do the thing you told it not to do — especially when "the thing" is a plausible continuation of what it has been asked to write. The only reliable guardrail is one the model cannot route around.
In ORCA the guardrails are code. The write gate rejects any entry that fails schema, confidence, freshness, classification, or PII screening — in that order, with no override. Identifiers are tokenised before content reaches memory; the raw values live in a separate encrypted vault with its own retention schedule. The tool layer refuses unauthorised operations at the server, not at the model. The model can propose anything it likes; the harness decides what actually happens.
This is the unglamorous half of agent work, and it is the half that determines whether the agent can be deployed in a regulated setting at all.
What ORCA's harness actually does, today
The abstract tour is useful; the operational shape is more useful. As of April 2026 the harness is doing the following things in production:
A named workforce of agents, each with an enterprise identity, a role, a line manager, and a narrow remit. One briefs the Founder in the morning and wraps in the evening. One watches security posture. One reads the codebase and surfaces candidates for improvement. One runs a daily research sweep and only stores what the brain didn't already know.
A library of skills mounted into every agent's workspace — governed memory access, security posture, live operational telemetry, compliance posture, structured content, source of truth for code, self-improvement, and an overnight synthesis pass. Skills are code, not prompts; they expose a narrow interface and enforce their own error handling.
A small, generic tool surface at the gateway, role-gated on the server. Every call is authenticated. Every response is logged. The set is deliberately narrow because the alternative — a sprawling tool catalogue — degrades both model behaviour and operator oversight.
Integrations with the systems the organisation already runs on — observability, security, compliance, delivery, finance, customer relationships, source control, governance posture. The integrations are the agent's hands; the things they are not allowed to touch are as important as the things they are.
A crash-safe memory substrate. Long-term memory lives behind a governed write gate, on storage that is built to survive unclean shutdown. The memory layer has to be trusted before anything above it can be. Choosing the right substrate was a harness decision, not a model decision.
Observability end-to-end. Live dashboards for the platform, telemetry from every host, structured logs, and reasoning capture on every agent turn. When an agent misbehaves the answer is not "ask the model what happened"; it is a trace.
Why this matters
The organisations that win the next decade of AI-assisted work are not the ones with the best prompts. They are the ones whose harnesses are good enough that the choice of model becomes a tuning decision, not a strategic one.
That is what we are building. Not a smarter chatbot. Not a cleverer prompt library. A harness — one that remembers, governs, acts, and improves — wrapped around whichever model is best at the moment.
If you're not the model, you're the harness. The harness is the product.