Reset to Zero — Temper Line

I walked into the manufacturing plant in Tijuana to see someone erasing the “This facility has gone ____ days without an accident” board and writing in a zero.

I was a mechanical design engineer in for a site visit, walking the floor to see first parts off the line for a product I’d designed. As I walked in and met my colleagues on the manufacturing side, they let me know a PEM press operator had lost part of his ring finger building parts for my product.

A PEM is a small threaded stud you push into sheet metal with a press. The press is set up to be safe: two palm buttons mounted apart from each other, plus a foot pedal to ensure the operator’s body is outside of the press. The press hits the PEM twice with a couple thousand PSI while the operator safely watches with all his limbs away from the machinery.

The operator who lost his finger had bent a piece of steel pipe into a U-shape, propped it across the two palm buttons, and pressed both with one hand. His other hand was holding the part on the anvil. His foot hit the pedal, the press fired, the part shifted, he reached in to reset it, and the ram smashed his ring finger at the last knuckle.

Why was the operator holding the part with his hand?

Because the part wasn’t sitting flat on the anvil.

It was a sheet metal part with a long stiffening rib along one edge. We assumed the rib would be made on a press brake, a single hard fold across an otherwise flat sheet that made it easy to control warping.

The manufacturing engineers used a roller tool on a turret punch to form the rib instead. Same dimension, different process. The rib came out within spec, but the rolling process introduced a real warp. The sheet bowed a few millimeters across its length, end to end.

Nobody measured flatness at the component level. The part went through too many operations for that to make sense: punch, debur, multiple forming operations and PEM insertions, along with powder coating. The plan was to check flatness at the end of the line, after the part was welded into its enclosure, where the assembly itself would constrain it back to a tolerance that mattered to the customer. From an inspection standpoint, that was the right place to measure.

It also meant that at the PEM step, the warp was invisible to the system. The drawing said flat. The punches and bends were to spec. The end-of-line check would say in spec. So the PEM press operator received parts that were marked as good and in spec that wouldn’t sit flat in his machine.

A warped part rocks on a flat anvil. A rocking part doesn’t seat a PEM correctly. The fastener goes in crooked, or doesn’t bottom out, and you get a reject.

To make the part stay flat at the PEM step, somebody had to hold it steady despite the safety features designed into the press.

So the operator bent a steel pipe and circumvented the safeguards on his machine.

First, this was a system failure to provide a safe working environment to production staff. Second, the operator made a judgment error. The risk of bodily harm to meet production deadlines isn’t acceptable. I’ve held on to this story for 10 years as I’ve grappled with the respectful way to extract learning from it without trivializing or demeaning the real cost of these mistakes.

The gut punch from rewinding this event was that every decision upstream of that moment was rational from where the person was standing, and none of them were visible to the seats around them.

I drew a flat part with a rib formed on a press brake. The drawing was internally consistent.

The manufacturing engineers changed the forming process to one their equipment was running. From their seat, they delivered a part that passed every inspection step it was supposed to pass.

The quality plan measured flatness where the customer would feel it, at the end of the line. Every measurement they did passed.

The operator received parts that wouldn’t sit flat on a press that required two hands. He had a quota and a part that needed three hands to build successfully. He bent a pipe to have a hand free to steady the warped sheet metal.

I had no idea any of this was happening until I walked in the door and learned it had been exactly zero days since an accident. The production line was in another country. The flow of information between the team manufacturing my designs and me had been compressed through four or five layers of reporting, from drawings to assembly notes and production instructions to inspection and quality reports, none of which had a column for “don’t put yourself at risk of bodily harm to build this.” I couldn’t have walked the floor any sooner, and I couldn’t have known this was happening without being there in person.

Each step was rational from where the person was standing. The cascade is only visible from outside, looking back, after something has failed.

In the first piece I wrote about how organizations perform certainty they don’t have. Roadmaps presented as commitments. Research called “de-risking.” Dashboards built to make decisions look good rather than tell us whether they were wrong.

This is what that performance produces.

The drawing performed certainty: flat part. The inspection plan performed certainty: flatness measured where it mattered. The safety system performed certainty: two buttons, a foot pedal, hands clear of the press. The accident board performed certainty: zero. Every layer of the system was telling the layer above it that everything was fine.

Everything was fine right up until a piece of pipe defeated all of it.

Years later, in software, I watched the same thing happen from a different seat. No accident board. No floor I couldn’t walk. Just a quieter version of the same compression, running for years before anyone could see it.

I was a product manager at an enterprise SaaS company. The professional services team had been running a set of longitudinal research programs: surveys that asked the same question across a dozen variables, in loops, the kind of data structure that’s a nightmare to analyze raw. Our engineering team built a custom data pipeline off the core platform to make the analysis work by “stacking” the looped questions into clean rows (Question 1 variable A, Question 2 variable B) so the analysts could actually run their cuts. It was bespoke and worked for the original use case.

The programs sold. They sold so well that leadership decided to make it a standard product line.

Nobody who made that decision understood the pipeline. From the executive seat, the metric said “this offering generates revenue, scale it.” That was true at the level the metric was measured but what was invisible was that the offering was held up by a custom pipeline that didn’t share the platform’s infrastructure, configurability, or its engineering velocity.

So we scaled it the only way the architecture allowed: by hiring people. A large services delivery team built each program by hand, because the pipeline couldn’t be configured the way the platform could. Each delivery was bespoke and each program was managed, and we were accountable for services and software margins. When we built new platform features, turning them on for these customers meant our services team doing the work, which ate into the margins we were measured on. So the services team kept building on the bespoke pipeline as it was, and feature development stalled.

The pipeline got more load-bearing and the engineering team had less ability to change it. The business cracked under its own maintenance weight, not in a single failure, the way a PEM press fires, but slowly. Each quarter slightly worse than the last, more unhappy customers, slower growth, less engineering investment, services teams burning out with no way to course correct. By the time it was visible at the executive level, the cascade had been running for years.

Here’s the thing: each of these decisions, taken on its own, is good practice. Build a custom solution when the platform can’t support the use case. Productize something that’s already selling. Staff up delivery to scale it. Protect your margins. You’d find each of these in a playbook. They aren’t workarounds, they’re the right moves. The cascade didn’t come from bad decisions, it came from good decisions that couldn’t see each other.

That’s what makes the software version harder to catch. In manufacturing, the cascade left physical evidence in a bent piece of steel pipe and an accident board reset to zero. The press operator’s workaround was visible the moment you walked the floor. In software, the trail is buried in Slack threads, slide decks, services contracts, in the muscle memory of people who’ve been around long enough to know which fields you don’t touch and which customers you don’t surprise. By the time the trail is visible, it’s already infrastructure.

But the deeper difference is this: the press operator knew he was working around the system. He bent the pipe because the system failed him and he needed a solution. When our software business showed signs of cracking, everyone was following the playbook to work around the challenges. That’s the scarier version — not a cascade of unsanctioned workarounds, but a cascade of best practices that quietly compound until the weight is structural and the cost is sunk.

These are the consequences nobody puts on a slide. A team that burns out shipping the wrong thing for a year. A product that becomes too rigid to change because the rigidity is somebody’s livelihood now. A business that slowly suffocates under decisions that were all, individually, correct.

The cascade is what happens when you pretend it isn’t, when every layer hardens its outputs into certainty for the layer above until the whole system is brittle.A blade that’s all hard edges - with no temper line - shatters. A business that’s all confident outputs does the same just slowly.

The solution isn’t closing the gap. It’s staying close enough to the work to feel where the compression is happening, and not mistaking business as usual for the truth.

This is the second in a series about the boundaries between structured and unstructured knowledge in product development.