Some Observations On Using Predictive ModelsApril 9th, 2012 by: DSA
At a recent Statistics & Modeling Course it became apparent that there were different understandings as to how a scoring model should be used.
Everyone understood that when the response rate to a promotion is the variable being modeled then the correct statistical approach is logistic regression, because the scores that result from the model (logits) can be translated into probabilities of response. What was not clear is that while the scores are indeed probabilities of response, the score themselves depend upon the average score of all persons promoted.
In other words, if a promotion to a given universe of names pulled 2%, then a model to predict response, would assign each person a score such that the average of all scores would be 2%. What if the promo pulled 4%? Then the average of all scores would be 4%. If the promotion averaged 1% then the average of all individual scores would equal 1%, and so on. So, the obvious conclusion is that a score
depends on the dataset, or the promotion, from which it was created.
Now suppose you build a model based on an August promo that pulled 3% , and now you want to use the model in January. You score everyone on your database. The average of all scores will be close to 3% percent. What if you only want to promote to persons whose expected response rate will be above 2% . The obvious answer, actually the obvious incorrect answer, is to promote to everyone whose score is equal to or greater than 2%.
Because this answer assumes that if the entire file were promoted in January, the average response would again be 3%. What if January is a really good month and August was a really bad month. What if promoting to the whole file pulled 5%. The assumption we have to make is that a person whose expected response in a 3% environment is 2% will do better in a 5% environment.
How much better?
The correct answer is that we are really not sure, but we assume, unless we’ve proven otherwise, that the improvement in response will be proportional. So, instead of using a person’s raw score, it’s better to think in terms of a relative score. A person’s relative
score is equal to their score divided by the average of all persons scored. In our example, a person with a 2% score coming from a promotion that averaged 3% would have a relative score, or an index or 2% divided by 3% or .667. And, in a 5% environment, i.e., in a month in which we expected the entire promotable universe, if contacted, to pull 5%, we would estimate this person’s probability of response at 3.3% (.667*5%).
Conversely, if a person’s expected response rate was 4% stemming from a promotion that averaged 3 %, in a 5 % environment they would be expected to respond at a rate of (4%/3%*5%) 6.7%.
The rule then is to think of model scores as indices or relative scores, as opposed to absolute scores. This means that before using a model you must not only score the entire promotable universe, but you must also calculate each individual’s relative score or
index (their score divided by the average score) and multiple this index by your own forecast of what would be the average response if the entire file were contacted.
Now, what if you didn’t make decisions based on the expected response rate, but on some other criteria, such as contacting the top 40% of the available names. Then the above procedure is not necessary because the value of the score does not count.
This notion of relative scores also applies to zip code models that are used for new customer acquisition promotions. Let’s assume that you’ve built a zip code model and that for every zip code in your mailing universe you’ve calculated an expected response
rate and an expected average order, and that the zip’s overall expected dollars per name mailed is the product of the two scores. Again, assuming that the model is based on one or more promotions, the values for each zip code reflect the average
achieved across all lists in the particular mailings that were used to build both the response and the performance models.
As with individual level models, the scores associated with each zip code have to be adjusted for seasonal and other economic conditions that might increase or decrease the overall level of response and performance. But, with zip code models you have to
go one step further.
The most important factor in estimating a response and performance in an acquisition model is the quality of the list itself. So, while a zip code may have an index value of 150 indicating that across all lists it’s expected to do 50% better than average, it’s critical to apply this index to the expected value of the entire list. In a list expected to pull 2 % , mailing into a zip with an index of 150 results in an expected response rate of 3%.
This means that zip code models have to be implemented on a list by list basis. Some lists might be fine for all zip codes, others because of their lower averages can only be mailed into the top responding or performing zip codes, perhaps the top one or two deciles, when results are expressed in deciles.
What’s more, in working with zip code models to be used in conjunction with many lists, the typical new customer acquisition mailing, it’s not possible to be concerned with only relative rankings, i.e., lets rank all zips and delete the bottom 20%. The differences across lists are too important to ignore and models have to be, or should be, implemented on a list by list basis.
In summary, the point to remember is that the initial scores resulting from scoring a file have to be adjusted to reflect the overall expected level response or performance and where geographic areas are concerned, be they zip codes or lower levels of geography, adjustments have to be made to take the quality of the list or media source into account.