Working to Build Better Predictive Models (Pt 1)

July 6th, 2011 by: DSA


It’ pretty surprising that a recent survey of CRM practices reported that 30%-40% of the companies surveyed indicated that they use predictive regression models.  By way of contrast, close to 50% were using RFM models. If statistical projection is really a better tool, for no other reason than the obvious observation that regression models can call on variables other than RFM, why this disparity?

I don’t know.  But, part of the answer may have to do with modeling attempts that did not work, or did not work better than RFM.

For starters it should be clear that in order for a regression model to “work better” than a RFM model, the regression model has to incorporate variables other than RFM variables that aid in the prediction of the dependent variable.

To keep things relatively simple, let’s just concentrate on response models, because most RFM models are used to predict response. Let’s further stipulate that for the purpose of this discussion to “work better” means to improve the “Lift”, or the ratio of responders to names promoted at some agreed upon depth of file.

For example for a regression model to “work better” than an RFM model at a depth of say 30% of the file, the regression model would have to identify significantly more responders than a RFM model would have identified at the same depth. Also, the argument that it’s easier to score a file with a single regression equation than it is to manage a RFM process, won’t count in this discussion – even though it’s true.

So, we get back to question of identifying more variables, variables other than RFM variables (Recency of purchase, Frequency of purchase and some measure of Monetary Value).

One way to do this is simply to create new variables out of RFM variables. For example, variables such as: the total number of purchases or total sales divided by months on file or divided by the number of times promoted.

Another key variable that frequently appears is Tenure, or the length of time a customer has been on the database. This is such an important variable that it is frequently the basis for creating separate models, one for relatively new customers, and one or more models for customers that have been on the file a longer period of time.

Then there is product purchase data, which particular products or product categories has the customer purchased. This variable can be handled through the use of “dummy or 0/1 coded variables”. And, as we have mentioned in the past, the best way to handle this data is through the use of Principal Components Analysis, a technique which gets at the pattern of purchases over the entire set of purchase possibilities.


Leave a Reply

You must be logged in to post a comment.


Rss Feed Tweeter button Facebook button Technorati button Reddit button Myspace button Linkedin button Webonews button Delicious button Digg button Flickr button Stumbleupon button Newsvine button Youtube button