What is your data quality and validation process?


RESPONSE: Some people in the data warehousing community advocate a position that involves heavy downstream cleansing of source system data. We have found that this is a risky practice that has great potential for damaging credibility with the analyst communities that the EDW is trying to serve. The danger lies in the fact that the downstream data cleansing requires extensive subject matter expertise, is prone to error, and is not transparent to the users. When an analyst community distrusts the data from the EDW, it is very hard to win that trust back. We advocate a methodology that maintains data fidelity with the source system and provides feedback to source system data stewards about data quality issues. Once those data quality issues are fixed, the corrected data automatically flows into the EDW.

We have a data validation framework that helps us maintain the fidelity between the EDW and the source system. One of the primary features compares source table row counts and EDW row counts at varying degrees of granularity to identify when a table is out of sync with the source system.