Home Artificial Intelligence Methods to Create Helpful Data Tests

Methods to Create Helpful Data Tests

6
Methods to Create Helpful Data Tests

Data Quality dimensions

Taking a consumer viewpoint of information quality is undoubtedly a useful initial step. However it may not cover the completeness of the test scope. Extensive literature reviews have addressed this issue for us, offering a spread of information quality dimensions which are relevant to most use cases. It’s advisable to review the list with data consumers and collectively determine which dimensions are applicable and create tests accordingly.

| Accuracy     | Format           | Comparability     |
| Reliability | Interpretability | Conciseness |
| Timeliness | Content | Freedom from bias |
| Relevance | Efficiency | Informativeness |
| Completeness | Importance | Level of detail |
| Currency | Sufficiency | Quantitativeness |
| Consistency | Usableness | Scope |
| Flexibility | Usefulness | Understandability |
| Precision | Clarity | |

You would possibly find this list too long and wonder how you can start with it. Data products or any information system could be observed or analyzed from two perspectives: external view and internal view.

External view

Dimensions of external view (Created by Writer)

The external view is in regards to the use of the information and its relation with the organization. It’s often considered a “black box” with functionality to represent the real-world system. The size that fall into the external view are highly business-driven. Sometimes, the evaluation of those dimensions could be subjective, so it’s not at all times easy to create automated tests for them. But let’s try a number of well-known dimensions:

  • Relevancy: The extent to which data are applicable and helpful for the evaluation. Considering a market campaign aimed toward promoting a recent product. All data attributes should directly contribute to the success of the campaign similar to customer demographic data and buy data. Data like city weather or stock market prices are irrelevant data on this case. One other example is the extent of detail (granularity). If the business wants the market data to be on the day level, nevertheless it’s delivered on the weekly level, then it’s not relevant and useful.
  • Representation: The extent to which data is interpretable for data consumers and the information format is consistent and descriptive. The importance of the representation layer is commonly neglected when accessing data quality. It includes the format of the information — being consistent and user-friendly, and the meaning of the information — being comprehensible. For example, consider a scenario where data is anticipated to be available in a CSV file with descriptive column descriptions, and the values are expected to be in EUR currency quite than in cents.
  • Timeliness: The extent to which data is fresh for data consumers. For instance, the business needs the sales transaction data with a maximum delay of 1 hour from the purpose of sale. It indicates that the information pipeline needs to be refreshed often.
  • Accuracy: The extent to which data is compliant with business rules. Data metrics are sometimes related to complicated business rules similar to data mapping, rounding modes, etc. Automated tests on data logic are highly beneficial and the more, the higher.

Out of the 4 dimensions, relating to creating data tests, timeliness and accuracy are more straightforward. Timeliness is achieved by comparing the timestamp column with the present timestamp. Accuracy tests are feasible through customer queries.

Internal view

Dimensions of internal view (Created by Writer)

In contrast, the inner view is anxious with the operation that continues to be independent of specific requirements. They’re essential whatever the use cases at hand. Dimensions in the inner view are more technical-driven versus business-driven dimensions within the external view. It also signifies that data tests are less depending on consumers and could be automated more often than not. Listed below are a number of key perspectives:

  • Quality of information source: The standard of the information source significantly impacts the general quality of the ultimate data. The information contract is an excellent initiative to make sure source data quality. As data consumers of the source, we are able to employ an identical approach to watch the source data as data stakeholders do when evaluating the information products.
  • Completeness: The extent to which information is retained in its entirety. Because the complexity of the information pipeline increases, there’s the next likelihood of knowledge loss occurring throughout the intermediate stages. Let’s consider a economic system that stores customer transaction data. The completeness test ensures that every one transactions successfully traverse all the lifecycle without being omitted or not noted. For instance, the ultimate account balance should accurately mirror the real-world situation, capturing every transaction with none omissions.
  • Uniqueness: This dimension goes hand-in-hand with the completeness test. While completeness guarantees that nothing is lost, uniqueness ensures that no duplication occurs throughout the data.
  • Consistency: The extent to which data is consistent across internal systems on a every day basis. The discrepancy is a standard data issue that always stems from data silos or inconsistent metric calculation methods. One other aspect of the consistency issue occurs between days when data is anticipated to have a gradual growth pattern. Any deviation should raise a flag for further investigation.

It’s value noting that every dimension could be related to a number of data tests. What’s crucial is knowing the suitable application of dimensions to specific tables or metrics. Only then, the more tests employed, the higher.

Up to now, we’ve discussed the size of external views and internal views. In future data test designs, it’s necessary to contemplate each the external and internal perspectives. By asking the best inquiries to the best people, we are able to enhance efficiency and reduce miscommunication.

6 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here