This article will give you a basic understanding of how housefile statistical models work. Additionally, it will show the various ways a marketer unfamiliar with statistical models can provide input to the development process to optimize model performance.
It seems there’s a word for every type of phobia known, including “triskaidekaphobia,” which is a fear of the number 13. Yet I was surprised I couldn’t find a word for the fear of statistics — indeed, “statisticaphobia” is a word that really should exist.
From experience, I know that catalog marketers without backgrounds in statistics often are ill at ease when participating in model-development projects. The good news is that you can be a significant contributor to the model-building process without being a statistical genius. In fact, it would be a mistake to shy away from the task, because for modeling to be successful, statistical experts want and need your marketing expertise and input from start to finish.
A customer list segmentation model typically is designed to help marketers predict the likelihood of some future customer behavior by using actual behavioral activity that customers have shown in the past. While different modeling techniques exist, the most widely used method for housefile segmentation is regression modeling.
Simply put, a regression model identifies the optimal combination of predictive variables and determines the relative significance of each. Through the use of a mathematical equation, a raw score is calculated based on numerical values associated with these predictive variables and their weights. Customers with like scores then are rolled up into ranks. For example, if Rank 1 names are the highest-scoring names with a response model, they’re predicted to be the most likely to respond.
I hope you’re still with me, because it gets easier from here. The various planning and tactical decisions surrounding the actual regression can make or break a modeling project. I’ve identified five primary model-development success factors where your participation and marketing expertise is beneficial, regardless of statistical prowess.
Define the Project
A statistician building a model needs to be in sync with marketers who will be using the model to drive campaign-management decisions. For starters, there should be a mutual understanding of how the model(s) will be used. Key discussion items among statisticians and marketers may include the following:
Agree on the behavior you’re trying to predict. Housefile models can be designed to predict virtually any behavior, such as response to a promotion, demand dollars, credit risk, returns, product category purchases, etc.
Define the housefile segments that are candidates for modeling. Traditionally, unique and separate models are built for different segments. For example, three separate models could be built to rank one-time direct-channel buyers, two-time-plus direct buyers and retail buyers.
Identify segments where models are most needed. Focus your attention on the housefile groups with the greatest potential return on the modeling effort. Usually these will be your marginal performers based on your current segmentation criteria. It may not make sense to develop response models into your pockets of best buyers if you know they’ll always be mailed regardless of the model score assigned.
Share what you know. Talk to the statisticians about what you’ve learned regarding your customers that may be relevant to the project at hand. Review historical performance results from your current selection scheme.
Establish checkpoints in the process. Don’t let the model builder run off into isolation until his or her model is done. Actions and findings should be reviewed at each step along the way to keep everything on course and to attain the best end result.
Prepare for Implementation Up Front
Don’t wait until after the model is completed — have a scheme in mind for applying the new model. Here are some steps to follow:
Map out the selection criteria that will be used. Will model score be the primary driver of your keycoding structure, or do you still want to use recency and frequency as your primary criteria for performance measurement? Will it differ from what you’re doing today?
Create a schedule for database scorings. Ideally, scorings should be done after new transactions have been added to the database and immediately prior to housefile selects.
Choose the Right Campaigns for Modeling
Since the model will rely on past customer activity to predict the future, each prior campaign needs to be carefully scrutinized for inclusion in (or exclusion from) the modeling sample. Here’s what to consider:
Look at the scope and depth of house names mailed. Names that weren’t mailed aren’t available for inclusion in a model-development sample.
Avoid anomalies, that is, situations where purchase behavior may have been significantly altered by some other event.
Look at seasonality. If you’re building a model to be used for holiday season mailings, include prior holiday campaigns in the model-development process.
Factor in the length of time that has passed since the mailing occurred. The more time gone by, the more likely it is that things have changed.
Take into account changes to merchandising, creative, promotional offers and the like.
Help Create the Best Possible Analysis Sample
Most of the heavy lifting in the development process occurs here in Step No. 4. The statistician will conduct a three-way match of names promoted in the selected mailings, to the corresponding orders from those campaigns, and to all candidate model predictor variables. Brainstorming and diligence really can pay off here. Collectively ensure that you:
Consider all possible data elements. Look for any and all data that correlate to response, including:
- historical customer activity across all brands, titles and purchase channels;
- history of prior promotions;
- demographic and lifestyle overlays;
- store proximity;
- geographic and weather information; and
- cooperative database scores.
Re-create dynamic data variables. They should be “frozen” at the point in time when the names were selected to be mailed.
Verify the quality of available data. Stay away from data elements that are of low caliber. The elements must be accurate, consistent and readily available in the future for periodic rescoring of the database.
Create data elements that don’t exist today. Powerful new customer variables can be devised for input to the modeling process. For instance, set up new variables to define the number of orders within the past 12 months or the number of promotions sent since the last purchase for each customer record.
Evaluate and Test
Now you’re at the point where the new model can be built and the statistician is showing you a couple of tables suggesting that it works great. But there’s too much at stake to simply roll out a new modeling scheme. Test into it first. Here are two suggestions:
Compare old to new segmentation. Run a simple head-to-head test of your previous selection criteria against the new scheme that incorporates your model to identify/analyze the unique names selected. Then set up in-the-mail test keys comprised of nth name selections from both unique name populations to get a good measure of the response lift and return on investment the new model is achieving.
Be cautiously optimistic. Before you have live modeled results available for future circulation planning, retain a somewhat conservative stance regarding your modeled name selects. Don’t assume that the performance results by score on future mailings will be as robust as what you saw when the development sample was modeled.
Keith Pietsch is vice president of analytics and marketing at Donnelley Marketing Catalog Vision, an infoUSA company. He wrote this article at the request of Catalog Success editors. He can be reached at (952) 541-6548, or via e-mail: keith.pietsch@donnelley.infousa.com.
- Companies:
- Catalog Vision
- Donnelley Marketing