Take a close look around your data landscape and you will find many examples of missing data.
Customer information that does not have a phone number or addresses that miss a city name or zip code.
Product information that does not have one or more of basic characteristics such as dimension or packaging.
Vendor information that does not have a category or flag for blocking.
With no disruptions in the way the organization manages its customers, supply chain or its product development, data moves along the organizational conveyor belt. On the surface it is business as usual ; below the covers your enterprise data strategy and your analytics engine is losing credibility at an increasing rate.
By answering three key questions around missing data your Information Management team can build or regain credibility around your data strategy and analytics.
Question #1 : Do we know we are missing data
The old saying "what you don't know won't hurt you" can be disastrous for your data strategy. Key is to create a culture of detecting data issues early and often through a strong data ownership and stewardship program. Data stewards with the help of IT can begin information awareness through periodic data quality assessment for completeness, accuracy and integrity of data.
Question #2 : Why did we miss data
Organizations can miss on collecting complete data for multiple reasons. Let us take an example of Customer data and walk through some examples:
"We wanted to collect only four attributes for our current application"
"Our legacy application did not have provision to collect email address"
"Our source gave us Customer information in two phases. We could not wait for the second phase so we took what we got from the first phase"
"Our application validation routines did not make gathering this information mandatory. Some of our data entry operators skipped on entering data"
"Our validation routines were not strong. Data entry team entered 999999 or AAAAAA to get through"
"We didn't know source had 4 additional attributes that could have completed our Customer data"
"We did not have information for customer description so we used the description field to capture their email address"
Sounds familiar ? Isn't this an interesting challenge that most of us have begun to overcome with data quality and master data management initiatives ? What does that tell about your information acquisition ?
Once we know the root cause of missing values we can begin to carve out ways to resolve the issues and even impute value where appropriate.
Question #3 : How can we impute missing values to complete our data
Imputing missing values is a critical step in analytics and data mining as missing values greatly compromises the accuracy and introduces uncertainty in the rendered observation.
Missing values can be addressed by type of data and the impact it has on the larger data set.
If we are aiming at completeness of data from business transaction standpoint it would be appropriate to initiate a data cleansing or data enrichment program to identify and fill the gaps in data. Data Stewards are in best position to execute this with support from IT and any third party data provider.
Imputing missing values by applying standards techniques from a statistical standpoint is another approach.
Exercise extreme caution when you record missing values. There is a world of difference between leaving a field blank, adding text "N/A" or adding text with one or more single space characters. Resist the temptation to put in a 0 or 99999 for missing values. I will write separately on perils of this decision.
As you continue to address missing values amongst your data, pause and think about one word: Insight.
Insight is not just an immediate understanding of an action or event based on information. It is also the ability to decipher the reasons for missing information and the impact of it on your organization.
Write to me on what information you found missing in your data landscape and what you are doing to impute it.
Sunday, January 9, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment