Day zero — ORCA HQ Blog

We shipped our first customer install in late April. By the end of day zero we had logged twenty-seven distinct case-law entries — each one a thing the installer should have done and didn't, or did and shouldn't have. The product is better for it. The honest version of why is the point of this piece.

The deterministic deployment promise

For the six months before that install, we had been telling ourselves — and prospective customers — that the platform deploys deterministically. Run the installer, watch the script provision Azure resources, get a working instance at the end. No "it depends." No "you'll need an Azure expert." No "let's get on a call and screen-share for an afternoon." This is what the marketing said. This is what the runbook said. This is what the dry-run on a fresh tenant said.

The word "deterministic" is doing a lot of work in that sentence. We meant it. We had built the installer with that bar in mind. We had a fresh-tenant test that ran every release. We thought we were ready.

The first real customer is not a fresh tenant. The first real customer is a tenant with policies you didn't anticipate, with leftover resources from previous experiments, with email domain quirks that don't match your test fixtures, with managed-identity assignments your test environment never exercised, with security-baseline settings that block half the things your script tries to do. The "deterministic" promise survives contact with the test environment. It survives the dry-run. It does not survive contact with reality on its first try.

What actually happened on day zero

The installer ran. Most of it worked. Then the failures arrived in waves, and each wave was instructive in a different way.

Wave one — assumptions baked into the test environment. We assumed the deployer's machine had certain shell tools available. We assumed network access to certain Microsoft endpoints would not be blocked by an enterprise firewall. We assumed the Azure tenant would allow operations that some tenants disallow by default. Each of these assumptions had been correct on every test machine we had ever run the installer on. None were robust to the actual customer's environment. Wave one taught us that "works on my machine" generalises to "works on every machine that looks like mine" — and that's a very narrow universe.

Wave two — leftover state. The customer tenant had a half-deployed previous experiment that left orphaned resources, role assignments to identities that no longer existed, and a Key Vault soft-deleted under the same name we needed to create. Our installer assumed a clean slate. We had to build leftover-resource detection — and a clear "here's what we found, here's what we recommend you do about it" — into the installer's preflight phase. This wasn't a bug fix. It was a category of behaviour we hadn't designed for.

Wave three — sequence-of-operations bugs that only show up under real load. We had a step that bound Key Vault secrets to a Container App and then immediately set environment variables that referenced those secrets. In our test environment, the binding had always propagated by the time the env-var step ran. In the customer's tenant, it didn't, and the env-var step failed because it referenced a secret the platform didn't yet know was bound. The fix was to add an explicit propagation check between the two steps. The lesson was that ordering invariants you can't see in your tests will surface in your customers' tenants.

Wave four — the things the installer did "correctly" that confused the deployer. Several steps logged warnings that looked like errors. One step succeeded but in a way that the deployer reasonably interpreted as failure. The installer's UX (such as it was) failed several deployers' reasonable expectations. This wasn't an installer bug — every step did what it was designed to do — but the experience said "something went wrong" when nothing had. We had to rewrite the operator-facing log output to make success look like success and failure look like failure, with no ambiguity in between.

How we caught it without losing the customer

Two practices made the difference between "embarrassing first install" and "first install that proved the model."

The case-law brain. Every problem we hit on day zero was written into a structured knowledge entry within hours of being resolved. By the end of the install we had twenty-seven entries — each one named, classified, dated, and pointed at a specific commit or runbook step. When we sat down two days later to plan the fix-it sprint, we didn't have to reconstruct what went wrong. We had the case law. The next person who hits the same issue — whether it's another install team, a customer trying to self-troubleshoot, or our own engineer six months from now — will find it indexed and searchable. Most teams that ship a first customer install have the war stories in someone's head. We had them in the brain, with attribution, by the second day.

The honest call afterwards. We did not pretend the install had gone smoothly. The customer knew it hadn't gone smoothly because they were there for every minute of it. The conversation the next morning was a debrief, not a performance. We named every issue, told them what we were going to fix and on what timeline, and committed to a follow-up release that would prevent each one from happening to the next customer. They appreciated it. The relationship is stronger for the failures than it would have been for a clean install — because they saw how we operate when things break, not just when they work.

The first customer install is not the dress rehearsal for future installs. It is the loudest signal you will ever get about what your product actually is, separated from what you have been telling yourself it is. Every team should be ready to throw away the version that survived their tests and ship the version their first customer reveals.

What the fix-it release looked like

Eight weeks of intermittent work — fast at the start, slower as the issues got more architectural — produced an installer release that closed twenty-nine of the thirty case-law entries. (One was deferred to the following release on the basis that the fix would have changed the customer-facing API and we wanted to land the API change in a deliberate window.)

The mechanics of the release matter less than the discipline of how it was scoped. Each closed case-law entry was tied to a specific commit or sequence of commits. Each commit named the case-law identifier in its message. The release notes listed every closed entry with a link back to the underlying brain record. The audit trail from "first customer hit this" to "this is the commit that fixed it" is intact and queryable, three months later, by anyone with access to the brain.

This is not how most teams ship installer releases. Most teams ship a version bump, a changelog written from memory, and an implicit promise that the bugs are gone. We ship a version bump, a changelog generated from the case-law-to-commit mapping, and an explicit list of "here is what was wrong, here is the commit that fixed it, here is the test that prevents regression." It costs more in discipline. It buys credibility you cannot construct any other way.

What the case-law brain caught that nothing else would have

A particular kind of issue, lurking in the structure of how we write installers: the silent assumption. A line of code that has been correct on every machine we have run it on, that fails the moment it meets a sufficiently different environment, and that is never tested in isolation because it is part of a longer flow. These are the bugs that don't show up in unit tests, integration tests, or end-to-end smoke tests, because they only manifest in the seam between the installer and a real-world Azure tenant.

If we hadn't recorded each one as a structured brain entry — with the file, the line, the symptom, the diagnosis, the fix, and a one-line lesson — most of them would have been forgotten within a week. The next install would have hit some of them again, treated them as new problems, lost a few hours rediscovering the diagnosis, and moved on. The third install would have hit a different combination, and the cycle would have repeated. With the case-law in place, the second install hit only the issues that the first had not encountered. The third hit fewer still. The shape of the curve is what tells you a knowledge base is working.

"A team that captures what their first install taught them, structurally, gets better with every install. A team that doesn't, learns the same lessons every time, more slowly, with no compounding signal."

What we'd do differently from day one

The honest list, in order of regret.

We'd run the installer against a tenant we did not control. Not as the final pre-release test — as the first one. Every test fixture we had was a tenant we had built up to look like a customer's. The first time the installer met a tenant we had not pre-shaped, every assumption we had not noticed making got surfaced. We should have done that test in week one of the installer's existence, not in the days before the customer call.

We'd have built leftover-resource detection in from the start. The "fresh tenant" assumption was a fiction we maintained because it made the installer code simpler. The real world has leftover state. Every installer should be built for it on day one. The cost of retrofitting this into the installer was higher than the cost of designing for it would have been.

We'd have written the case-law schema before the first install, not during it. We invented the schema under pressure as the issues piled up. Some early entries are missing fields the later ones gained. Some are duplicated. The structure tightened as we used it — but the entries written under pressure carry less metadata than the ones written calmly. A schema designed in advance, even a thin one, would have left us with cleaner records of the early issues.

We'd have set the customer's expectation explicitly. The customer knew the install was going to be the first. They were generous about it. We should still have framed it more clearly: "We expect the install to surface things our test environment didn't. Here is what we will do when each of them happens. Here is the timeline for the fix-it release that follows." Saying it out loud changes the relationship from "watch the install go right" to "watch how we operate when things go wrong" — and the second is genuinely the more important demonstration.

Why this is on the public blog

Two reasons. The first is that pretending first installs go smoothly is a tax on the next vendor's first install — every prospective customer arrives with an expectation set by everyone who came before, and the more vendors who tell the truth about how messy day zero is, the more honest the field gets. The second is that we would rather attract customers who appreciate the truth about how we operate than customers who are buying a marketing version that doesn't survive contact.

If you are running an installer that is about to meet its first customer — talk to us. We have lessons we'd be happy to share, and a brain entry we'd be happy to send you for each one. The honest conversations on this side of the day-zero reckoning are more useful than the polished case studies on the other side, and we are still close enough to ours to remember exactly how it felt.

Day zero.