David Shepard Associates, Inc. Database Marketing Consultants (Marketing Strategy, Analytics & Statistical Models, Marketing Database Systems)
Exceptional experience ...
    Exceptional results.
 

So What If It All Went Away?

By: David Shepard

This article first appeared in Direct Magazine


The trend is definitely not good. Privacy is not an issue that will go away and consequently there will be less and less data available for individual and/or  household level overlays. If Auto Registration data will no longer be available except on an opt-in basis, can credit card data, except for the explicit purpose of granting credit, be far behind? Probably not.  But, the real question is -- could direct marketers live without household level demographic and financial data? Of course they could. And, if they honed their modeling skills, they might even be better off.

Let’s examine some of the ways household level data is used.

1. To build better customer response and performance models.

    With some notable exceptions, such as the acceptance of a new product targeted at a specific demographic group, customer transaction data (RFM data, product purchase data, tenure, source and a handful of other transaction variables, are all that are necessary to build more than satisfactory response and performance models. Additional demographic variables much more often than not do not result in a larger spread or a more accurate model.

2. To profile customer response and performance deciles.

    Based on the auto registration case, it may be possible to use overlay data for research purposes, and profiling would qualify as research. But, even if overlay data could not be used for profiling a Gains Chart, short telemarketing surveys could provide all the profiling data required, at a reasonable cost and perhaps with more accuracy.

3. To build new customer acquisition models to be used against response lists.

    For those not familiar with this application, the idea here is to append household level data to the names coming out of a merge purge of multiple response lists.

    Then these names are scored using a model, built from prior mailings, that gives specific weights to the demographic variables contained in the model. Prospects with scores that are lower than some criteria (perhaps the bottom two deciles) are dropped from the promotion.

    This process is both time consuming and complicated. Time-consuming because the scoring and the suppression have to take place after the merge, and complicated because of a number of issues: (a) how will non-matched names be scores, (b) how will missing-data be handled within the matched population.

Some Alternatives

Working on the assumption that we will continue to have access to the detailed census data that is collected at the block group level and re-compiled at the zip plus 4 level, direct marketers should be able to build response and performance models that for all practical purposes (suppression of the bottom two or three deciles from a merged-purged set of prospects selected from response lists) are as effective as models built upon household level data. What’s more these zip or zip plus 4 models are much easier and much less expensive to implement than household level models. (List owners are sent selection or suppression tapes, prior to shipping their names to your merge-purge house.)

Tips for Building Zip and Zip Plus 4 Models

There are two keys to building good models based on census data. The first has to do with variable creation, the second with technique.

Companies should build their own historical response and performance indices based on past promotions and customer behavior. Working at the Zip Plus 4 level its possible to build historical indices, or simply historical response and/or performance  rates, which can then be aggregated at either the 5 digit Zip Code level, the Sectional Center Level, or what’s frequently even better, indices or historical rates aggregated at a Prism or a MicroVision segment level. (Each commercial clustering scheme associates a demographic or lifestyle segment with a zip plus 4 code). These historical results are then treated as potential independent variables in your response or performance models. And, in our experience one or more of these historical variables will enter a model as one of the model’s most important variables.

For example, a response model we built for a continuity program (the model had a top decile lift of 270) contained only three variables and two of them were historical indices.  The third variable was a Principal Component Analysis (PCA) variable that compared each zip code’s educational level with the average educational level within the entire mailing population.  Which brings us back to the subject of modeling technique – the second component of good census data models.

If you’ve dealt with census data you know that while there are some 300 to 400 hundred census variables, there are only about 20 major categories of data, and the categories are presented as frequency distributions. For example, Education (a major Census Category) is made up of four separate Census Variables: (1) percent of population with less than a High School degree; (2) percent with a High School Degree; (3) percent with some College; (4) percent with a College degree, or more. In our experiences models built on individual census variables, as opposed to a PCA analysis of the census category, while much easier to build, are much less stable and produce poorer models.

So, what’s the bottom line. While we certainly don’t wish to see the demise of individual or household level overlay data for direct marketing purposes, should it happen, to one degree or another, if we’re smart and take advantage of the data and techniques at our disposable, we’ll be able to make up most if not all of the losses imposed upon us.