My tests passed. The client redrew everything by hand.

We shipped the automatic duct-routing feature with the whole test suite green. Every run it produced was valid: axis-aligned, correctly sized, airflow adding up along the trunk. By every check we had written, it worked.

The contractor printed the output and redrew every duct by hand.

That's the moment worth sitting with. Not because the feature was broken in any way the tests could see — but because "the tests pass" and "the person who does this for a living accepts the output" turned out to be two completely different claims, and only one of them mattered.

The bug was a single coordinate

The engine anchored the main trunk at the air handler's position. That sounds reasonable — the air handler is where the air comes from — but the air handler sits at the edge of the floor plan. Anchoring the trunk there meant every branch had to travel from that edge, across the whole plan, to reach its register. The drawing was a valid duct system and also one no contractor would ever build: a spine jammed against one wall with long branches raking across the entire floor.

You can measure it. Total branch length came out around 2,140px; a tiny 7-CFM closet register was running a 450px branch to reach the spine. A human doesn't draw it that way. A human runs the trunk down the middle of the registers it serves, so the branches are short and balanced on both sides.

The fix was one word. Instead of anchoring the trunk at the air handler's coordinate, anchor it at the median of the register coordinates. Total branch length dropped to 840px. The closet branch became a stub. Nothing else changed.

"Passing" and "accepted" are different claims

Here's the uncomfortable part: the tests weren't wrong. They tested exactly what I had asked them to test — geometry is orthogonal, trunk airflow is non-increasing, branches fan to both sides, elbows land on corners. All true, before and after the fix. The output was correct the entire time.

What the suite didn't assert was the thing that actually defined the feature: the drawing has to look like what a contractor would draw. It didn't assert that because, when I wrote it, I didn't know that was the requirement. I knew the physics. I didn't yet know the objective.

The redraw wasn't a complaint. It was the specification — handed over late, in the only language that would have made it unambiguous: someone doing the job the way it's supposed to be done.

Senior work is discovering the objective function

Most of what looks like "engineering skill" on a green-field problem is actually this: figuring out what you're really optimizing for. The junior version of the job is optimizing the objective you were given. The senior version is noticing that the stated objective ("produce a valid duct layout") is a proxy for a different, unstated one ("produce the layout an expert would have drawn") — and going to find the real one before the client has to draw it for you.

You find it by watching the work get used, not by reading the spec. The spec said "size and route the ducts." It did not say "minimize total branch length and center the trunk on its registers," because the contractor didn't think of that as a requirement — it's just how ducts are drawn. It was invisible to them and unknown to me, which is exactly the kind of requirement that only surfaces when a real user touches the output.

What we changed

Once you know the objective, it stops being fuzzy taste and becomes a test like any other. The redraw became a one-line assertion: total branch length under 1,100px, checked on every run, forever. The lesson generalized past that one bug into how we build:

Assert the property an expert would recognize, not the coordinates. "Returns these exact points" passes while the feature is broken. "Branches are short and balanced on both sides" is the thing a contractor actually cares about, and it stays meaningful as the internals change.
Treat the first expert redraw as the real spec. When someone who knows the domain reworks your output by hand, that rework is the requirement. Capture it as a check before you fix the code, so it can never regress.
Get the output in front of a real user early, because the objective function you assumed and the one that matters are rarely the same — and the gap between them is where the actual work lives.

The engine that made this fixable is a pure function — plain numbers in, plain geometry out, no UI — so the whole thing could be reproduced, measured, and asserted without ever opening the canvas. That's what let a one-word change be verified with confidence instead of hope. The full build is written up in the US HVAC contractor case study.

Green tests told me the code did what I asked. The redraw told me what to ask. The second one was the job.

My tests passed. The client redrew everything by hand.

The bug was a single coordinate

"Passing" and "accepted" are different claims

Senior work is discovering the objective function

What we changed

Want this kind of thinking on your product?