Years ago I was part of a project where we paid a Big Three consulting firm to build a data product for us. It cost serious money. It took months. When they handed it over, it didn’t work. Not “needs some polish” didn’t work. Didn’t work in ways that mattered. We sat around a table beating our heads against the wall trying to make it function under real conditions.

Eventually I put all the data on the table and said: this didn’t work. Full stop.

We moved on. But I never forgot what it felt like to take someone’s word for “it’s ready” and be wrong.


So when I got close to shipping OIM, I had a voice in the back of my head. The doubting engineer. The one who’s seen enough delivered software to know that “it works in dev” is not the same as “it works.”

OIM processes raw ERP exports and builds a dimensional data model on a laptop, no server, no cloud, no database administrator. That is a real promise. I needed to know it was actually true before I handed it to a customer.

The stress harness is what I built to answer that question.


The design is a 25-run adversarial test built around four phases. Not 25 random runs. 25 runs that simulate a skeptical early adopter putting the product through real conditions.

Phase 1 — Warm-up (runs 1-6). Small loads. First-ever run on a blank slate. Re-run the same data to test deduplication. Clean slate again. The behavior a cautious user exhibits when they are not sure they trust the software yet.

Phase 2 — Normal cadence (runs 7-16). Steady weekly drops at scale. 1 million rows. 5 million rows. Back to 1 million. The rhythm of a real production environment.

Phase 3 — The surprise (run 17). A massive 25 million row historical backfill. The thing nobody plans for but every customer eventually does. “We want to load the last two years.” This is where software breaks.

Phase 4 — Recovery (runs 18-25). Small drops after the big dump. Back to normal cadence. Does the system recover cleanly or does the large run leave junk behind?


Each run checks the same list:

The harness does not accept partial credit.


Here are the results.

RunPhaseTargetActionDurationRowsHealth Score
1Warm-up50KClean14.7s50,58589.02
2Warm-up50KKeep (dedup)10.6s50,58589.02
3Warm-up50KClean11.4s50,49989.05
4Warm-up250KClean29.2s252,75188.60
5Warm-up250KKeep (dedup)26.8s252,75188.60
6Warm-up50KClean12.6s50,44288.95
7Normal cadence1MClean51.5s1,011,81988.44
8Normal cadence1MKeep (dedup)45.4s1,011,81988.44
9Normal cadence1MClean67.5s1,010,86288.35
10Normal cadence1MClean50.8s1,011,56888.44
11Normal cadence5MClean181.7s5,056,45288.24
12Normal cadence5MKeep (dedup)167.9s5,056,45288.24
13Normal cadence5MClean166.7s5,057,28588.24
14Normal cadence1MClean65.7s1,011,76088.36
15Normal cadence5MClean179.9s5,057,98788.25
16Normal cadence5MClean175.7s5,058,22788.17
17The surprise25MClean975.1s25,295,14288.12
18Recovery1MClean74.6s1,010,90988.43
19Recovery1MKeep (dedup)68.2s1,010,90988.43
20Recovery250KClean40.7s252,49388.76
21Recovery5MClean178.1s5,060,13288.23
22Recovery5MKeep (dedup)160.1s5,060,13288.23
23Recovery1MClean66.8s1,011,11488.45
24Recovery5MClean177.4s5,060,28888.22
25Recovery1MClean66.7s1,011,18588.47

25/25 passed. 75.8 million rows processed. Total elapsed: 76 minutes.


A few things worth noting in that table.

The dedup runs hold. Run 2 re-runs the exact same 50K rows as Run 1. Same row count out. Same health score. Nothing doubles. The same logic holds at 1M (Runs 7/8) and 5M (Runs 11/12, 21/22). Deduplication on work_order_id + operation_sequence, latest export timestamp wins, works every time.

Run 17 is the one that would break a system that wasn’t designed for it. 25 million rows. 975 seconds. Not fast, but correct. The health score comes out at 88.12. Compare that to Run 1 at 89.02. Sixteen months of production scale and the data quality signal barely moves.

Recovery is clean. Run 18 comes after the 25M dump. It runs a 1M clean drop. Seventy-four seconds. Health score 88.43. The system didn’t accumulate anything from the big run that should have been cleared. No orphaned files. Log grows by one row.


What I was really looking for was drift. Does the runtime creep? Does the health score degrade? Does something leak across runs that poisons the next one?

None of that happened.

The health score range across 25 runs is 88.12 to 89.05. Less than a point of variance across 75 million rows in four different phases. That is not an accident. That is what consistent data quality logic looks like under pressure.


I went back and read the notes from that consulting engagement years later. The problem was not that the consultants were incompetent. The problem was that nobody had ever put the software through conditions it would actually face. It passed the demo. It failed the floor.

The stress harness exists so that does not happen with OIM.

When I hand this to a customer and they say “does this actually work,” the answer is not “we think so.” The answer is 25/25. Seventy-six million rows. Laptop only. Every check passed.

That is what ready means.