
A World Without Models & Overlay Data?
By: David Shepard
This article first appeared in Direct Magazine
I owe Jim Rosenfield one. Here I was, fresh out of ideas for this month’s column when I spotted the latest issue of Direct Marketing and Jim’s lead article on the future of E-Commerce and Relationship marketing. I won’t take you through it, but I will strongly recommend it – and if you take my advice and read it – you may wonder how I could recommend an article that contained the following two heretical recommendations:
Recommendation #2 of 11: Forget the pseudo-science
“If marketing research worked, new products wouldn’t fail. If data mining, fractal analysis, neural nets, etc. worked, no one would get irrelevant mailings.”
Do I agree with that? To paraphrase you know who, it depends on the meaning of “worked”. If by “worked” Jim meant, “worked perfectly” then he would be right, but nothing works perfectly. Modeling does indeed work. (I hate the expression data mining, never used fractals and I’m not convinced that neural nets make a predictable improvement over statistical methods.) But modeling certainly doesn’t work perfectly, and fortunately it doesn’t have to work perfectly in order to be a very powerful marketing tool. So yes, people will continue to receive irrelevant mailings, but significantly fewer people will get them because modeling works “good enough” to meaningfully reduce the number of irrelevant promotions direct marketers make. It’s almost always the case that a model can identify at least ten percent and usually closer to twenty percent of the names on a database that would benefit by receiving fewer, if any, promotions.
I remember going to a conference a number of years ago and two of the industry’s leading lights were asked if, in view of all the new technology coming on the scene, would mailers be able to stop doing mailings that produced only a 2% response rate. Naturally, under pressure to respond immediately and intelligently, both experts agreed and predicted that response rates would rise to the 10% level in the predictable future. And, of course, that response (no pun intended) makes no sense. As long as marketers can make money by mailing lists that pull 2% they will continue to do so – remember Economics 101 -- production increases until marginal costs exceed marginal revenues!
So, while I don’t want to be in a position of defending the most esoteric modeling tools, I do know that relatively straight forward response modeling, essentially logistic regression, can almost always, if not always, identify customer or prospects that should receive either more or fewer promotions than they would receive if modeling was not a part of the decision making process.
Recommendation #4 of 11: Build your database but only use transaction data and opt-in information
What I think Jim is recommending is that we discontinue the practice of overlaying our customer files with information contained on other databases.
Well, if we had to we could. Face it, working with overlay data is not easy, and the results are not always that powerful, or worth the costs in time and money given the alternatives available to direct marketers.
It’s not easy because not all of the names on your file will match all of the names on the overlay file(s) and even where there is a match, not all fields on the overlay file will be populated equally, so lots of problems do arise. For example, you need one model to handle matches and another model to handle the non-matches. And, then what do you do about the missing data problem on the matched file. It gets very hairy very quickly. Personally, we have found block group or nine-digit census data to be as powerful and certainly easier and less expensive to use than individual household level demographics.
It’s also been our experience that more often than not overlay variables do not add significantly to the predictive power of the response and performance variables we build for our clients. Nevertheless, we almost always find that these variables are useful in describing the differences between responders and non-responder and between profitable customers and unprofitable customers. (Yes, it’s possible for a variable to be an interesting descriptor, and not an important predictor.) This finding applies to both internal customer models and external new customer acquisition models.
On the other hand, exact age and automobile registration data are powerful predictors, but then again, if you need to know this bit of information about your customers, do you really think they won’t tell you, if you give them a reason for your wanting to know. Especially in the Internet Age were data gathering from your customers is economical and relatively easy to do.
As for Jim’s nine other recommendations and his analysis of where we’ve been and where we seem to be going, I’ll let you decide. It’s definitively thought provoking.