Saturday, June 5, 2010

A Myth Buster Anthology of Data Warehousing

One of the strongest arguments in favor of data warehousing, which I have often heard from specialists, is that data warehousing can generate golden nuggets of information that can create competitive advantage for the firm. And I've watched several uncomfortable discussions on how to measure return on investment on data warehousing. And I've seen a lot of frustration that several large data warehouses failed to generate return on investment after tens of millions of dollars. Like any other technology, data warehousing has its own share of myths intentionally and unintentionally created by vendors as well as practitioners.

Myth #1 - A data Warehouse can create competitive advantage
Myth #2 - A data Warehouse is required for business intelligence
Myth #3 - Data Warehousing starting point is an enterprise data model
Myth #4 - You need both an Operational Data Store and a Data Warehouse to cover the entire spectrum of business reporting
Myth #5 - Data Warehousing requires an engineering approach
Myth #6 - Data Warehousing fails due to problems with transaction-processing systems
Myth #7 - We can't predict what questions will be asked from a data warehouse
Myth #8 - Data Warehousing improves decision-making
Myth #9 - Data Warehousing empowers front-line and business staff to do their own analysis
Myth #10 - Data Warehousing reduces overall cost of reporting on business performance and opportunities

Myth #1 - A data Warehouse can create competitive advantage

The most often cited example of competitive advantage is the way Wal-Mart created a data warehouse and linked the data warehouse-driven decision support systems with store inventory and transportation logistics to match the demand at various stores with the delivery supply chain to minimize out-of-stock, cost of delivery, excess inventory and maximize sales volume. I don't doubt this story.

There is another often repeated but probably apocryphal story about beers and diapers at Wal-Mart. It is claimed that a serendipitous market-basket analysis showed that on Fridays six-pack beers and diapers were being bought at the same time. Further analysis showed that when young fathers bought diapers on Fridays, they picked a six-pack beer too. Wal-Mart was able to increase sales by moving diapers and six-pack beers in close proximity. I've never seen any data or evidence that would validate this story.

First of all, operational efficiency is not competitive advantage. Wal-Mart's low cost positioning strategy has a large number of components that include marketing, vendor management, contract negotiation, labor and human resources policies, location selection, store management, etc. Inventory management, delivery, transportation and operational logistics are certainly an important part of this overall strategy. In and of itself, Wal-Mart's data warehouse does not create any competitive advantage. In fact, similar integration between decisions and activities can be achieved through other technological means, like, enterprise service bus. The important point to note is that Wal-Mart's strong strategic focus on "low cost" has created a nice "fit" among all its activities that enhance Wal-Mart's competitive advantage. Wal-Mart's data warehouse is a critical part of this overall fit. But to attribute Wal-Mart's competitive advantage to their data warehouse would be no different than saying that Wal-Mart's competitive advantage comes from their vendor negotiation policies.

Sans integration with business activities, a data warehouse is no different than an ATM. An ATM is a cost of entry into consumer banking business. It does not bring any competitive advantage. Similarly, anyone can hire the consultants who designed Wal-Mart's data warehouse and build a data warehouse. But would they get the competitive advantage that comes from Wal-Mart's low cost positioning that permeates millions of activities that Wal-Mart does every day? A competitive advantage is a competitive advantage because it can't be imitated.

A single insight leading to creation of sustainable and long-term competitive advantage is a fiction, though it makes fascinating business stories. In real world, a single insight has to be supported by tons of hard work in analyzing and aligning thousands of activities to fit into an overall coherent business strategy to create competitive advantage. Target has carved its market position in spite of Wal-Mart. Wal-Mart would have a tough time defeating Target's competitive advantage without compromising its position.

Finally, as everyone knows, competitive advantage is not absolute. It is relative to industry competition. It depends upon how you position your business vis-a-vis your competitors.

Myth #2 - A data Warehouse is required for business intelligence
Let's first clarify the vocabulary.

Among senior executives business intelligence means information about markets, customers, suppliers, competitors and business drivers in the industry. For a business process outsourcing firm, an example is a new customer silently searching around to outsource some of the internal processes. Majority of this information is not from inside the firm. This information comes from OUTSIDE. This is an important distinction. Most of the time, business executives rely on their own rolodex to gather such business intelligence. At other times, they engage with industry analysts and review publications to gather this outside information. Astute marketing departments can frequently supplement this business intelligence with frequent surveys of customer tastes and industry trends.

In the vocabulary of data warehousing, business intelligence means consolidation of internal transactional data from multiple systems followed by dissection of data across multiple dimensions of analysis to test hypotheses and derive some high-level conclusions about the state of the business. For example, point-of-sales data could be consolidated across multiple sales territories or lines of service and then analyzed across zip codes, categories, population demographics, etc. In fact, marketing activity tends to be highly data-driven and good marketing departments are all the time measuring the impact of marketing campaigns on sales volumes. I've seen that occasionally this data is supplemented by consolidated industry data but majority of data used in the data warehouse is internally generated. Therefore, business intelligence from data warehouses is predominantly internally focused on business operations, operational efficiencies and internal optimization.

Data warehouses can not provide EXTERNAL business intelligence that senior executives need and I question the premise that a data warehouse is a sine qua non for INTERNAL business intelligence. Most cost effective method of internal business intelligence starts with business analysis and hypothesis development. Data gathering is the next step. Once hypothesis has been defined, data can be gathered from the subject area within the parameters of the hypothesis. You really don't need a data warehouse for that as long as you can enable access to the data required for such analysis. The important point to remember is that you need to enable accessibility to data instead of building a data warehouse and technology presents several options, a data warehouse is one of them.

Myth #3 - Data Warehousing starting point is an enterprise data model
The starting point of any data warehousing effort is not an enterprise data model but a definition of business operations strategy that the data warehouse will support. The data warehouse will be a component of this strategy. Therefore, you can't find ROI on the data warehouse. Instead, you have to work with the business teams to find out ROI on the entire business operations strategy that the data warehouse will be supporting. Not only you have to have a good definition of business operations strategy, but also you have to define the requirements in detail. Data warehousing starts from the top with a clear definition of business issues. I have seen tens of data warehousing efforts fail because enterprise data model was treated as the first step towards data warehousing. At the same time I have seen several data warehousing efforts succeed because business goals, expectations and requirements of data warehousing were defined in clear terms with a business value proposition attached to each requirement. The technical team understood clearly what they were supposed to accomplish.

Generally, enterprise data model is an expensive, time consuming and meandering effort. You start with a conceptual business model, follow-up with a logical data model and then map that to a physical data model. All this is done under a vague premise that somehow once you have done all this, you will be able to get all the data that you need to test any hypothesis across the enterprise. Most of the time, enterprise data models generate a pretty good piece of documentation that is understood by a couple of people in the IT. While it is validated by business analysts, most of the people are unable to make head or tail out of enterprise data models.

Mark Hammond is an Orange County entrepreneur, who founded Risk Data Corporation and later sold it to FICO. He built a data warehouse with a clear definition of business goals, strategy and what was needed for performance benchmarking of insurance and provider claims. The focus was clear and the product was successful. Similarly, Sean Downs founded Enclarity in California with a clear focus on cleaning up provider data in the health care industry. They are some of the entrepreneurs, who have created hundreds of million dollars value by focusing on business strategy and leveraging their understanding of the potential of data warehousing to deliver a business solution.

Harvard Pilgrim was less successful in its enterprise data model driven data warehousing approach. Kaiser Permanente has several data warehouses. Some of them are business-problem driven and some of them are enterprise data model driven. Initiatives driven by enterprise data model have been more expensive and less successful than business-problem driven initiatives.

Myth #4 - You need both an Operational Data Store and a Data Warehouse to cover the entire spectrum of business reporting
In the vocabulary of data warehousing an ODS is defined as a copy of transactional data. Once this data been cleaned, normalized and combined with data from past several years and other business systems, this becomes a data warehouse. Creation of an ODS followed by build-out of a data warehouse leads to huge duplication of effort with feeble business justification. If we accept the premise that we need a clearly defined set of business requirements for driving a data warehousing initiative, then it is hard to prove why an existing ODS cannot justify that data collection effort. Most of the time, if you have an existing ODS, the justification for a data warehouse tends to be pretty weak from a business point of view. Therefore, the right strategy for data warehousing must smartly eliminate technological need for an ODS and a data warehouse.

Myth #5 - Data Warehousing requires an engineering approach
If you are a technology company building an industry-specific data warehouse or a technology consulting firm with a specialization in data warehousing, then you will need to build an engineering approach to data warehousing. For most of non-tech organizations, an engineering approach to data warehousing tends to be an overkill.

An engineering approach to a data warehousing starts with conceptual data mode, logical data model and then a physical data model with lots of arguments among the technical folks about 3NF versus star schema, fully normalized versus fully denormalized, etc. At the same time, meta data management and master data management infrastructure is built. At the end of the day what you get is an overengineered data warehouse that takes too much staffing for support and maintenance, in short a white elephant. A limited approach achieves cheaper, faster and better results by focusing on ETL and reporting tools and allowing the reporting tools to drive the data organization.

Myth #6 - Data Warehousing fails due to problems with transaction-processing systems

Most of the time limitations of business transaction processing systems are already known before a data warehousing effort is launched. Therefore, it beats the logic when a failure of data warehousing is attributed to data integrity issues created by transaction processing systems. The problem lies in overengineered data warehouses. It is a sharp business analyst who provides data analysis, not a data warehouse. The most effective and economical approach to data warehousing is to keep it simple and let it show all the blemishes and warts of the transaction processing systems. The business analysts should have the tools, skills and knowledge to decide which warts and blemishes they want to remove for their analyses. Often success means straight copy of data from transactional systems to a bunch of denormalized tables. I'm sure this will sound as blasphemy to specialists in data warehousing.

Myth #7 - We can't predict what questions will be asked from a data warehouse

If you can't predict what you will do from a data warehouse, then don't build it. This is exactly the reason why so many data warehousing efforts fail. They try to answer "unknown" questions, a very tall order under any circumstances. It becomes hard to calculate ROI or link the data warehouse with a business initiative. Such efforts are often IT-centric and waste of precious dollars.

Myth #8 - Data Warehousing improves decision-making

I have seen that data and analysis driven decision making tends to be cultural in organizations. Possibly, at the top of the hierarchy are McKinsey & Co., who can take measurement to ridiculous levels and at the other end are several small and mid-sized service organizations, who have no concept of measurement-driven decision making. You can't improve decision making in organizations without a cultural shift. This is an issue that is generally bigger than the mother of all business issues. Without a clear cultural awareness of decision-making processes within an organization, a data warehouse will be like an impulse buy that gathers dust after initial excitement.

Myth #9 - Data Warehousing empowers front-line and business staff to do their own analysis
You can bring water to a horse but you can't make it drink. Empowerment used in data warehousing means as follows:
(i) Business users will be able to generate their own reports instead of sending a request to IT
(ii) IT will have less work
(iii) Business users will be able to do more analysis
(iv) Business users will be able to generate more insights into business operations

Here are the assumptions behind the empowerment:
(i) Business users are sitting idle waiting for IT to provide reports
(ii) Business users have the skills and knowledge to do extensive hypothesis testing and data analysis
(iii) IT takes more time in generating reports than the effort that business users will have to make once they have a data warehouse
(iv) Necessary organization alignment (euphemism for laying off IT data analysts and replacing them with business analysts) will occur after data warehouse is ready

Empowerment won't produce the expected results unless all assumptions are clarified and necessary organization changes are made. A data warehouse won't do it ipso facto.

Myth #10 - Data Warehousing reduces overall cost of reporting on business performance and opportunities
A careful measurement of the expenses involved in making data accessible to business analysts and building a data warehouse shows that data accessibility is a much better option than a data warehouse, unless data warehouse is a carefully designed top-down as part of an overall business strategy with clearly defined business requirements.

In fact, analysis of firms who have built "overengineered, enterprise data model-driven, data warehouses to provide answers to unknown questions" shows that they had to hire many more staff to maintain and support such data warehouses.

1 comment:

  1. Interesting list of myths.


    I think the key problem you are trying to highlight is that a data warehouse is a processing solution that should be used for a clearly defined set of problems.


    However because a data warehouse is seen as a single repository and a single source of truth attempts are made to "leverage" extra value by enabling ad-hoc and exception processing.


    Often data, information and knowledge warehouse are used interchangeably. There is a big difference between a process driven data warehouse and the concept of an exception based knowledge repository. However because modern knowledge is seen as being "electronic" and electronic means data. The terms data, information and knowledge are unfortunately used as if they were the same.


    So once you have met your processing requirements with a data/information/knowledge warehouse you will need a non-process knowledge repository.


    This knowledge repository will be for requirements outside of process, looking inside the "black boxes" scattered around the enterprise, dealing with exceptions and all those other knowledge related items that are time consuming and expensive but have to be done.


    The catch is that IT still orientated, and rightly so, around process & procedure. Their procedure is to implement your procedure. More ITIL anyone?

    ReplyDelete