Why DuckDB — The Dispatch

The first iteration of what would become the OIM started as a Power BI template.

That made sense at the time. My background was in Power BI. It’s what I use at my big electric utility job.

The idea was simple: give customers a pre-built template where they could drop their ERP export into and go.

Power Query was the engine. You pull the CSV, drop it in a folder then refresh Power BI. Familiar, no licensing cost, no new tools to learn.

It worked. Until it didn’t.

The problem wasn’t Power Query itself. The problem was the goal. OIM wasn’t supposed to be a report template. It was supposed to auto-generate an enterprise data model from whatever a customer dropped in. Dimensional modeling. Star schema. Fact and dimension tables. The full Kimball treatment, config-driven, working on any ERP export regardless of column names or structure.

Power Query is a transformation tool. It is not a data modeling engine. The more I pushed it toward what OIM needed to be, the more it fought back. Performance degraded. The logic looked like a bowl of spaghetti. And the moment you wanted to run anything at scale, 10 million rows, 25 million, 50 million, the model started crawling.

I knew I wanted something local, fast, columnar, and SQL-native. No server. No licensing. No cloud dependency.

I found DuckDB.

DuckDB is an in-process analytical database. There is nothing to install, no server to configure, no connection string to manage. The entire database lives in a single file. You query it with standard SQL. It is purpose-built for OLAP workloads, the kind of aggregation-heavy analytical queries that make row-oriented databases beg for mercy.

The first time I ran a 50 million row query on my laptop and watched it return in 7 minutes, I knew this was the tool.

Paired with Parquet as the storage format, the architecture fell into place. Parquet is compressed, columnar, and portable. Power BI can read it. Python can read it. Excel can read it. The Bronze, Silver, and Gold layers of OIM are all Parquet files on disk, no proprietary format, no vendor lock-in.

The things I gave up: Power BI’s drag-and-drop familiarity. The assumption that the customer already has the tools.

The things I gained: 50 million rows processed locally in under 40 minutes on a laptop. Zero cloud dependencies. Zero licensing cost. A SQL-native engine that makes the data model legible and version-controllable. And a foundation that can grow without a re-architecture.

The decision wasn’t really about DuckDB vs. Power Query. It was about what OIM was actually trying to be. Once I was honest about that, the tool choice was obvious.

Pick the tool that fits the problem. Not the tool that fits your comfort zone.