Data quality hypotheticals about standard scenarios in huge enterprises are one thing (“oh no, the data from the 17 startups we just acquired doesn’t match”), but what are some of the real problems that a small or medium business might have to face in a DQ program?
Today’s DQ problem for SMBs is: repeating data.
This company needed to submit data to another organization. The data was account data, with each account having one number and one or more code/date pairs.
(Imagine a medical record listing a person’s procedures and when they happened, or a library record of books checked out and when.)
The company’s repeated data came out with one row per account, and the code/date pairs in columns, like this:
Account number, Code1, Date1, Code2, Date2,….CodeN, DateN.
The total number of columns in each record was determined by the account with the most code/date pairs (it was almost 100).
The recipient wanted the data to have one row per code/date pair, with repeated account numbers if necessary, like this:
Our customer had Microsoft Access and Excel as their tools, with limited IT support and little-to-no data wrangling experience.
This is a common scenario for small and medium businesses, or for independently-operating departments within a larger organization.
The company knew exactly what they needed to do, and they had the necessary data: they just didn’t have the tools to do it.
In other words, the data quality dimension that was falling short wasn’t accuracy, or completeness; it was availability, or accessibility.
(This is a good illustration of how factors outside of the data itself can contribute to data quality.)
The company tried everything they knew how to do, but couldn’t get the result they needed.
The company ended up resolving this problem using MIOvantage: because of their relationship with us, one of our data engineers was able to output their data in the form needed in just a few minutes.
This was possible because MIOvantage provides an alternative approach to the strictly-rectangular way of representing data. This feature (called repetitions) allows data to easily be shifted between the two representations in this scenario.
If you do not have access to MIOvantage, you can solve this problem using Excel and Access.
However, it will be more difficult and much slower than the MIOvantage technique.
The most efficient of these options is getting an experienced Excel/Access user to handwrite union queries that produce the needed result.
It requires the budget to hire an experienced Excel user for a short term, but will get you results most quickly, all else being equal. It’s also a solution that can be reused.
You can also achieve the desired result completely DIY in Excel using formulas, paste-special settings, and text-to-columns across multiple sheets.
However, this method is very cumbersome, becomes progressively more difficult as the amount of data increases (especially if your computer is not top-of-the-line hardware), and is less readily reusable later without updates.
You must also be able to use, or willing to learn to use, Excel formulas. However, recent updates to Excel that allow automation, specifically Office Scripts and Power Automation, do make this method more usable than in the recent past.
Finally, there are also third-party plugins for Excel and Access, some of which may be able to do this transformation directly or to simplify the DIY technique.
We can’t vouch for the reliability and/or safety of any of these products, especially if you’re working with sensitive data. Evaluate carefully if you decide to go this route.