Home Artificial Intelligence Don’t Fix Bad Data, Do This As an alternative

Don’t Fix Bad Data, Do This As an alternative

1
Don’t Fix Bad Data, Do This As an alternative

People don’t know what they mean after they speak about data quality.

Photo by No Revisions on Unsplash

A number of years ago, our data platform team aimed to pinpoint the first concerns of our data users. We conducted a survey amongst individuals interacting with our data platform, and unsurprisingly, the essential concern highlighted was data quality.

The initial response, characteristic of our engineering mindset, was to develop data quality tooling. We introduced an internal tool named Contessa. Despite being somewhat cumbersome and necessitating significant manual configuration, Contessa facilitated checks for traditional dimensions of knowledge quality, encompassing consistency, timeliness, validity, uniqueness, accuracy and completeness. After running the tool for a few months with a whole lot of knowledge quality checks we concluded that:

  • Data quality checks occasionally assisted data users in discovering, in a shorter timeframe, that the info was compromised and will not be relied upon.
  • Despite the frequent execution of knowledge quality checks, there was no noticeable improvement within the subjective perception of knowledge quality.
  • For a good portion of issues, particularly those identified through automated data quality checks reminiscent of consistency or validity, no corrective actions were ever taken.

Survey and objective measurement are useful tools, but nothing can replace a discussion over coffee and cake, as Jane Carruthers writes in her book, “The Chief Data Officer’s Playbook”. Indeed, I like to recommend this to anybody, as one-on-one conversations helped us discover one other vital angle of the situation. A few of these conversations unfolded as follows:

“Hey, you say, that data quality is poor, what do you mean by that?”

#1 Pricing business analyst: “We’re working on organising price for the ancillary product X. Within the dataset we use, we’re missing data on what was the actual revenue from the product X per each order. We’ve this dataset , however it comprises only expected value of the revenue from X at time of the acquisition. We are able to see also the actual revenue per product, but not on the order granularity.”

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here