← All posts

Do your users trust your data? More importantly…do YOU trust your data?

How do you handle poor data quality?

I have read Data Vault 2.0 and really liked the idea of a business fingerprint. You don’t miss anything. However, at scale that comes as a cost. To get a business finger print, take a distinct across every dimension per row. At 50M rows, that significantly impacts query speed and efficiency.

With what I was working on, I didn’t want to pay that cost, but still wanted a viable solution. Here’s what I wound up doing:

The result: It’s very easy to identify and fix the bad data. It’s easy for the analyst to surface, and the data owner to go fix it.

Bottom line: You don’t have to follow the fanciest method out there. You just have to find what works for you and your situation.