Sunday, November 27, 2011

Sunday, January 9, 2011

Impute missing values

Take a close look around your data landscape and you will find many examples of missing data.
Customer information that does not have a phone number or addresses that miss a city name or zip code.
Product information that does not have one or more of basic characteristics such as dimension or packaging.
Vendor information that does not have a category or flag for blocking. 

With no disruptions in the way the organization manages its customers, supply chain or its product development, data moves along the organizational conveyor belt. On the surface it is business as usual ; below the covers your enterprise data strategy and your analytics engine is losing credibility at an increasing rate. 

By answering three key questions around missing data your Information Management team can build or regain credibility around your data strategy and analytics. 

Question #1 :  Do we know we are missing data 
 
The old saying "what you don't know won't hurt you" can be disastrous for your data strategy.  Key is to create a culture of detecting data issues early and often through a strong data ownership and stewardship program. Data stewards with the help of IT can begin information awareness through periodic data quality assessment for completeness, accuracy and integrity of data.

Question #2 :  Why did we miss data

Organizations can miss on collecting complete data for multiple reasons. Let us take an example of Customer data and walk through some examples:

 "We wanted to collect only four attributes for our current application"
 "Our legacy application did not have provision to collect email address"
 "Our source gave us Customer information in two phases. We could not wait for the second phase so we took what we got from the first phase"
 "Our application validation routines did not make gathering this information mandatory. Some of our data entry operators skipped on entering data"
 "Our validation routines were not strong. Data entry team entered 999999 or AAAAAA to get through"
 "We didn't know source had 4 additional attributes that could have completed our Customer data"
 "We did not have information for customer description so we used the description field to capture their email address" 

Sounds familiar ? Isn't this an interesting challenge that most of us have begun to overcome with data quality and master data management initiatives ? What does that tell about your information acquisition ?

Once we know the root cause of missing values we can begin to carve out ways to resolve the issues and even impute value where appropriate.


Question #3 :  How can we impute missing values to complete our data

Imputing missing values is a critical step in analytics and data mining as missing values greatly compromises the accuracy and introduces uncertainty in the rendered observation.

Missing values can be addressed by type of data and the impact it has on the larger data set.

If we are aiming at completeness of data from business transaction standpoint it would be appropriate to initiate a data cleansing or data enrichment program to identify and fill the gaps in data. Data Stewards are in best position to execute this with support from IT and any third party data provider.

Imputing missing values by applying standards techniques from a statistical standpoint is another approach.

Exercise extreme caution when you record missing values. There is a world of difference between leaving a field blank, adding text "N/A" or adding text with one or more single space characters. Resist the temptation to put in a 0 or 99999 for missing values. I will write separately on perils of this decision. 

As you continue to address missing values amongst your data, pause and think about one word: Insight.
Insight is not just an immediate understanding of an action or event based on information. It is also the ability to decipher the reasons for missing information and the impact of it on your organization.

Write to me on what information you found missing in your data landscape and what you are doing to impute it. 

Saturday, January 30, 2010

Data Rivalry

At any given time there are multiple initiatives for information management in your organization. They could be operational reporting, integration or consolidation of your data assets, data enrichment or data remediation. 

These programs are tied together with a shiny silver thread of inequity. Inequities come in various forms such as leadership, program sponsor support, availability of business users, clearly defined objectives, team cohesion, technologies et al. Successful programs are those that recognize the inequities, painstakingly evangelize them at appropriate levels, seek guidance and when push-comes-to-shove simply work around the challenges presented by the inequities.  

Why then is inequity a shiny silver thread? Often you hear about the negatives of the inequity, but what about the flip side? Inequities bring out the best in teams in terms of creativity, agility and responsiveness and in turn creates a foundation for rivalry that benefits everyone.  

Rivalry is to get access to common barriers - time, budget and resources. Let us take the example of  asset rationalization program. All of your data revolves around the big five domains of Customer, Product, Supplier, Employee and Finance. Multiple initiatives are vying for the same resources at the same time. There-in lays the opportunity to create a rivalry to harness the competitive nature of teams that eventually benefits the organization.  

We hear about rivalries and their benefits all the time beginning with sibling rivalries when growing up or watching them happen with your friends. Cola rivalries between PepsiCo and Coca-Cola are legendary. So why not start something within your information management programs? How about a Product Vs Customer data rivalry?

Rivalries have created, nurtured and promoted several byproducts including:
  • Establishing relationships with team members or with Customers as a criteria for success
  • Testing and implementing strategic changes in order to gain competitive advantage 
  • Improving awareness and acknowledgment that things around us are different than us 
  • Augmenting our emotional intelligence in dealing with inequities, abilities to resolve conflicts 
  • Ingraining in us the desire to win


Try and begin a rivalry within your data organization. Watch how rapidly your architects, business leads, project managers, designers, developers, project sponsors and data stewards come together and work cohesively to achieve their goals in record time. Reward your teams when this rivalry rakes in positive outcomes e.g. create accelerators, superior design patterns, efficient workflows, optimized governance. Watch out when rivalry takes on a negative connotation of sabotage or incendiary behaviors; do not hesitate to penalize them. 

Next I will write on metrics that you can use to measure the effectiveness and efficacy of this rivalry.
 
Inequities are powerful. Promote "Data Rivalry" in your organization and harness the power of competitiveness of your brilliant and passionate teams. 

Remember you first heard that term here.

Saturday, October 3, 2009

Case for an Information Value Chain

Michael Porter introduced the concept of the Value Chain in his 1985 classic Competitive Advantage: Creating and Sustaining Superior Performance. He described the notion of how a product gains value by activities conducted within the chain. By assigning costs and value drivers at each step, organizations can create a framework to enable powerful analysis and insight into synchronized collaborations, both within the organization and amongst business partners. The concept has been embraced by management strategists worldwide.

Data and information required to support business functions, promote innovation, reduce costs, improve collaboration and enhance responsiveness to marketplace changes can benefit greatly by extending the same concept to establish a business sponsored "Information Value Chain".

The goal behind the value chains - improve visibility, demonstrate value that exceed costs and improve profit margins - allows organizations to strengthen its core competencies and manage competitive differentiators.

Major business data domains of Customer, Supplier (Vendor), Product (Material) are candidates that should be evaluated within the value chain. In my blog on data ownership I identified roles that should participate in each of the standardized elements of the value chain. These roles should be supported by IT roles of architects, administrators, stewards and analysts.

By creating an enterprise data strategy, identifying key information management activities within the chain and enabling performance visualization through periodically evaluated metrics, an Information Value chain will bring improved information awareness, provide actionable insight and help your organization separate information "cash cows" from investment "duds".

In the process they uncover two important truths. All data is not created equal and all data does not contribute equally to the profitability of an organization.

With these truths alone, we begin to answer a very important question. Why dedicate our resources and capital to manage all data equally if they are not equal contributors?

Saturday, August 29, 2009

Trustability - Super KPI of your data

Organizations that view information as a strategic asset, assign the ultimate responsibility of managing their data resources to data stewards. In reality, they are delegating the critical ownership of improving the "Trustability" of data.

Trustability is a super KPI (key performance indicator) of all data. It is also a silent killer of many a data warehousing and business intelligence initiatives.

Metrics of accuracy, timeliness, completeness, availability, accessibility, reliability and supporting elements such as proper definition and classification contribute to trustability. Long term usage, viability and success of any information you want to share (and benefit) with your information consumers depends on this super KPI.

A well thought out architecture, robust design, world class statistical constructs, superlative presentation layers and leading edge technologies will not help if end users do not have confidence in the data they are given and decide to not trust what is given to them next.

The first woman Prime Minister of India, Indira Gandhi, once said "You cannot shake hands with a clenched fist". Make trustability an important factor of your information management initiative and improve your success monitoring it closely.