Time to Revisit an Old Application… Building & Implementing Zip Code ModelsMarch 7th, 2012 by: DSA
One of the earliest database applications was the modeling of zip code data for use in new customer acquisition promotions, direct mail or telemarketing. Therefore, it’s a little surprising to still hear direct marketers complain that their zip code models validated well but didn’t hold up well on repeat usage. A little probing frequently discovers the following set of potential reasons why the modeling effort failed.
- The modeler forgot to weight each observation, each zip code promoted, by the number of pieces mailed, or the number of calls completed.
- The modeler used dollars per name mailed as the dependent variable, rather than build separate models for response and back-end performance.
- The modeler “cheery picked” individual rows of the frequency distributions that make up the census data, as opposed to modeling the entire distribution.
- The models were not applied on a list-by-list basis taking seasonality into account.
- The modeler failed to build historical response and performance indices at the Zip, SCF or commercial cluster segment.
Each of these items could cause a Zip Code Model (shorthand for a model built using the census data associated with a Zip Code) to fail. When two or more of these factors come into play, it’s real easy for a zip code model to “stop working” assuming it ever started to work in the first place. Let’s go through each item one at a time.
Not weighting for the number of names mailed or called.
This is just a simple mistake. It should be obvious that a zip code that receives 10,000 pieces of mail and has a 2% response rate should in some way count for more than a zip code that receives only 1,000 pieces. The recommended weighting scheme is to multiple the number of pieces mailed by the response rate and then by the non-response rate. The formula to remember is N*p*q where N stands for the number of pieces mailed, p is the response rate and q is equal to (1-p)
Not using separate models for response and performance
One could argue that there is nothing theoretically wrong with modeling dollars per name mailed or called. But, its been our experience that the modeling exercise produces more information and more strategic insights if response and performance are modeled separately. Then if you wish to calculate dollars per name mailed you can do so by multiplying the zip code’s expected response rate by a measure of the zip code’s expected revenue per responder, be it sales or payments or contribution.
The intuitive reason for separate models is that a variable such as income may be negatively correlated with response and positively correlated with performance, if dollars per name mailed is chosen as the dependent variable, the effect of income may be left out of the model because of the potential canceling out effect.
Not modeling complete distributions of census data
The census data comes to us in the form of frequency distributions. For example income may be expressed as a frequency distribution with as many as 18 rows, the percent of the population earning between $24,000 and $35,000 may be one such row.
Clearly two zip codes might have the same percentage of the population earning that amount, but the zip codes could be completely opposite each other in that in one zip code most earn less than $24,000 and in the other zip code most earn more than $35,000. To eliminate the possibility of this happening and to build stronger more stable models we recommend using Principal Components Analysis to model each zip code’s distribution against the distribution of the average zip code in the population being promoted.
Not using historical indices
For companies that have a great deal of promotion history its useful to create historical response and performance indices calculated at the zip code level and summarized at the Sectional Center level (SCF), or even better, summarized at a commercial cluster level, such as Prism or MicroVision. (Each zip code is associated with a cluster level, so it’s easy to map zip codes to clusters.)
In our experience this tactic has produced some very important and very stable variables. Sometimes the zip code indices won’t work themselves because coverage is too thin, but the SCF and Cluster segmentations will usually work.
Not implementing list by list
Finally, the worst mistake of all is to forget to implement on a list by list basis. This means that the first step in the implementation or rollout process is to estimate the expected response rate and the expected performance for each list segment considered for inclusion in the promotion.
After this is done, the response and performance models can be applied to each list. What you’ll find is that some lists, your very best lists, can still be mailed in their entirety, other good lists will drop 10% to 30% of the names that would otherwise be mailed or called, and that the top 10% to 30% of your marginal lists, lists that might not otherwise be used at all, can now be used.