If the pundits are to be believed, big data is the retailer's nirvana, telling you the exact profile that will buy a specific product at a specific time of day through a specific channel at a specific price. There is, to be sure, more than a grain of truth in that, but the path from data collection to customer fulfillment is fraught with potential pitfalls. As anyone who has had experience with Google's AdWords’ algorithms can attest, one wrong assumption can easily send you wide enough of the target to lose tens of thousands of dollars in sales, which would be a huge setback for most retailers.
Yes, retailers can reap significant benefits by uncovering meaningful patterns in their data. Consider Target's ability to predict pregnancies by analyzing customer consumption patterns. However, as we've all learned in our life on the web, garbage in, garbage out. If you don't ask the right questions and collect the right data, the value of the information returned to you will be questionable. Even the most sophisticated machine learning technology can't read minds.
Market research 101 teaches us how to frame questions that will deliver the answers we're looking for, and that's fine as long as everyone is aware of the bias being introduced. But when the results will be used to define retail strategy for a few years, bias must be eliminated as much as possible.
What causes data bias syndrome? Data bias can arise from a number of sources, all of them human in origin.
On the business operations side, it can come from poorly defined objectives or from deciding to collect data you want rather than data you need (often because it's the easy route). On the analysis side, data scientists rely on data that hasn't been subverted by assumptions coming from the business side of the house.
The scientists themselves aren't immune to generating their own bias either. They may have preconceived ideas about a particular market sector or demographic profile. Too much knowledge, especially if it's based on incorrect assumptions, can result in useful data being filtered out or analyzed using pre-existing algorithms that may not be valid for a particular scenario.
And then there's the question of data volume. The International Production and Dissemination of Information estimates that the world produced 14.7 exabytes of new information in 2008, nearly triple the volume of information in 2003. Furthermore, the volume is doubling every two years according to an EMC study. There's a big difference between quantity and quality, however.
When selecting data for unbiased analysis, it's important to differentiate between a dense data set that contains a large number of similar data points and the far more diverse data points present in the real world. The gaps in data formed by either the lack of data or the density of similar data points produce a condition known as data sparsity. Those gaps are filled in by machine-learning algorithms that can easily bring with them the kind of bias baggage outlined above.
So how do you remove the bias? Whether retailers hire their own data scientists or purchase applications that allow merchandisers and other business personnel to interact directly with the data, bias prevention techniques need to be built in from the outset. Follow these steps:
- Data selection: Carefully select relevant data and take the time to explore which combinations of inputs produce the most accurate results. This may include the use of heuristics, or educated guesswork, to fill the sparsity gaps. Combining analytics expertise with business and targeting expertise — two attributes that merchandisers have in spades — greatly improves the chances for optimal results. This puts the data scientist and the operational expert on the same team, with the same goal, with each bringing different, complementary skills.
- Data improvement: Aim to improve the data sets. Don't settle for what you have; instead, look for ways to fill any data sparsity gaps with real information. Adding an intuitive user interface to a software service that combines targeting expertise, data aggregation and the ability to use heuristics intelligently can also help to counteract data sparsity by providing the optimum "fuel" for machine learning.
- Analytics feedback loop: Build a continuous feedback loop into the analytics process so that output is always optimized for maximum performance. This will help modify and adapt algorithms to changing business requirements, or seasons in retail's case, and keep machine learning fresh.
Exploration is the key to finding the right balance of hard data and heuristics in order to drive machine learning to optimal results, so don't be afraid to boldly go where no retailer has gone before! But whatever you do, don't ignore the bias factor in your big data analytics projects. Build in safeguards against bias from the outset, and smart machine learning will be the closest thing to a silver bullet you're likely to find this side of nirvana.
Dan Darnell is the vice president of marketing and product at Baynote, a provider of personalized customer experience solutions for cross-channel retailers.