Archive for the ‘Marketing Analytics & Modeling’ Category

So What If It All Went Away?

Wednesday, January 4th, 2012


The trend is definitely not good.  Privacy is not an issue that will go away and consequently there will be less and less data available for individual and/or  household level overlays.   But, the real question is — could direct marketers live without household level demographic and financial data?  Of course they could.  And, if they honed their modeling skills, they might even be better off.

Let’s examine some of the ways household level data is used.

1. To build better customer response and performance models.

With some notable exceptions, such as the acceptance of a new product targeted at a specific demographic group, customer transaction data (RFM data, product purchase data, tenure, source and a handful of other transaction variables, are all that are necessary to build more than satisfactory response and performance models. Additional demographic variables much more often than not do not result in a larger spread or a more accurate model.

2. To build new customer acquisition models to be used against response lists.

For those not familiar with this application, the idea here is to append household level data to the names coming out of a merge purge of multiple response lists.

Then these names are scored using a model, built from prior mailings, that gives specific weights to the demographic variables contained in the model. Prospects with scores that are lower than some criteria (perhaps the bottom two deciles) are dropped from the promotion.

This process is both time consuming and complicated. Time-consuming because the scoring and the suppression have to take place after the merge, and complicated because of a number of issues: (a) how will non-matched names be scores, (b) how will missing-data be handled within the matched population.

Some Alternatives

Working on the assumption that we will continue to have access to the detailed census data that is collected at the block group level and re-compiled at the zip plus 4 level, direct marketers should be able to build response and performance models that for all practical purposes (suppression of the bottom two or three deciles from a merged-purged set of prospects selected from response lists) are as effective as models built upon household level data.

What’s more these zip or zip plus 4 models are much easier and much less expensive to implement than household level models. (List owners are sent selection or suppression tapes, prior to shipping their names to your merge-purge house.)

Tips for Building Zip and Zip Plus 4 Models

There are two keys to building good models based on census data.  The first has to do with variable creation, the second with technique.

Companies should build their own historical response and performance indices based on past promotions and customer behavior. Working at the Zip Plus 4 level its possible to build historical indices, or simply historical response and/or performance  rates, which can then be aggregated at either the 5 digit Zip Code level, the Sectional Center Level, or what’s frequently even better, indices or historical rates aggregated at a Prism or a MicroVision segment level. Each commercial clustering scheme associates a demographic or lifestyle segment with a zip plus 4 code.

These historical results are then treated as potential independent variables in your response or performance models. And, in our experience one or more of these historical variables will enter a model as one of the model’s most important variables.

For example, a response model we built for a continuity program (the model had a top decile lift of 270) contained only three variables and two of them were historical indices. The third variable was a Principal Component Analysis (PCA) variable that compared each zip code’s educational level with the average educational level within the entire mailing population.  Which brings us back to the subject of modeling technique – the second component of good census data models.

If you’ve dealt with census data you know that while there are some 300 to 400 hundred census variables, there are only about 20 major categories of data, and the categories are presented as frequency distributions. For example, Education (a major Census Category) is made up of four separate Census Variables: (1) percent of population with less than a High School degree; (2) percent with a High School Degree; (3) percent with some College; (4) percent with a College degree, or more. In our experiences models built on individual census variables, as opposed to a PCA analysis of the census category, while much easier to build, are much less stable and produce poorer models.

So, what’s the bottom line. While we certainly don’t wish to see the demise of individual or household level overlay data for direct marketing purposes, should it happen, to one degree or another, if we’re smart and take advantage of the data and techniques at our disposable, we’ll be able to make up most if not all of the losses imposed upon us.

The Benefits Of Cohort Group Reporting (Pt 2)

Tuesday, October 4th, 2011


In the first part of this post we saw that basic cohort group reporting based on enrollment period can produce very powerful insights into the customer base. Generally, direct marketers do not stop at reporting at the enrollment group level. The more common practice is to subdivide the enrollment group by major media source so that the performance of say all direct mail or all print acquired customers can be tracked.

Frequently a change in enrollment group behavior can be quickly traced to a change in the mix of new customers — customers acquired from direct mail generally perform better than customers acquired from print, and they tend to perform better than customers acquired from broadcast, who tend to perform better than customers acquired from outbound telemarketing and so on. (Remember I said generally, so your experience may differ.)

So, the usual cohort group is an enrollment group broken down by major media. Cohort reporting is of course not limited to attrition reporting. One may track overall sales, sales mix, store visits, average purchases or returns, or complaints, or anything else that’s relevant to one’s particular business.

For example a cable company may want to track upgrade or downgrade behavior as well as overall disconnect rates, and the cohort group could be traced to a particular sales territory or to a particular sales person. Companies with reward programs may want to track points earned or points redeemed by cohort group to get an early warning reading on changes in customer behavior.

Cohort group reporting can also be carried down to the keycode-enrollment group level. At this level cohort reporting is not used for overall trend analysis but for forecasting the lifetime value of individual customer groups. The weighted average projection of all of the cohort groups acquired from the same source (same keycode) represents the lifetime value of the average customer acquired from that particular source. And, this average value can be compared to the cost per order to measure the profitability of the promotion.

Cohort group reporting has been around since the 1970’s. In those days the reporting was done in batch mode at the end of each cycle update. A set of hard copy reports was produced and distributed to the marketing managers who were responsible for new customer acquisition and customer marketing.

Today the same information can be produced in a variety of ways including everything from the original hard copy reports to drill down exercises using star schemas to multi-dimensional presentations that represent cohort groups in three dimensional cubes floating around the front of your PC. (In fact, if you were of a mind to do so, you could probably turn your cohort reporting into a screen saver.)

The problem with the newer representations of the same old concepts is that one might not recognize the need for the drill down exercise, or the multi-dimensional presentation, so I’m partial to old fashioned hard copy reporting, updated with modern graphics that make changes in performance obvious to anyone willing to look. And, the need to look is just as important today as it always was.

The Benefits of Cohort Group Reporting (Pt 1)

Wednesday, September 7th, 2011


As more and more “new media” marketers get involved with database marketing applications the more important it becomes to remember some of the key lessons painstakingly learned by our direct marketing predecessors.  Remembering to measure the performance of cohort groups is one of those lessons.  (A cohort group is defined as a group of individuals with one or more common characteristics. ) 

In the world of direct marketing an enrollment date, or more accurately an enrollment period defines the basic cohort group. For example, all of the new customers that came on the database in the month of August, would be a cohort group. Later we will expand the definition of cohort groups to include attributes other than enrollment period.
 
Measuring the performance of cohort groups, or let’s call them enrollment groups for a little while longer, is the best way to monitor the performance of any direct marketing business that continually acquires new customers, new members, or new subscribers and is concerned with possible attrition. For all practical purposes that means all direct marketers.  Yet not all direct marketers have systems in place that monitor cohort performance. 

The chart below shows monthly attrition rates for individual enrollment groups and the average attrition rate by month for all enrollment groups (the last set of bars).  A marketer looking at this graph would immediately notice that enrollment groups 14, 15 and 16 are performing well below the average behavior of prior enrollment groups. 

Without enrollment group reporting a marketer would have to rely on monitoring trends in overall or average churn rates.  (The average churn rate is defined by the number of attriters in a period divided by the number of customers at the start of the period.) Measuring overall churn will frequently miss trends that are due to changes in acquisition strategy or competitive conditions.
 

enrollment group

To be continued… 

Working To Build Better Predictive Models (Pt 2)

Wednesday, August 3rd, 2011


In the first part of this discussion we outlined ways to increase the number of available predictor variables. Of course, what’s needed next is a repeatable process for identifying key variables from the host of variables that appear on our databases. Here statistical techniques like “correlation tables” and simple cross tabs, which show the relationship between potential variables and response can help. And, of course, the marketing people should always tell the modeler which variables they either know or think to be significant predictors.

However, we think the best technique for identifying potential variables is CHAID.

CHAID can be used to pictorially display the differences in response rates looking at each potential variable, one at a time. When used in this manner, the marketing person is on an equal footing with the analyst or statistician, because the results, with just a little bit of explanation, are so easy to understand. (Whether CHAID should be used beyond this point as a replacement for a regression model is a subject we won’t get into here.)

Needless to say, a CHAID can’t be done for every conceivable potential variable, so some combination of judgement and reliance on the correlation table will be required in this initial variable selection process.

Now, let’s assume for the purpose of this discussion that we identify 20 to 30 or even 50 variables, other than the basic RFM variables, that are each individually related to response. The last thing in the world we would want to do is use all of them in a model at the same time. The model would so “overfit” the data that while a Decile Analysis of the Calibration sample (the sample upon which the model was built) or even the Validation sample (the hold-out sample intended to prove the validity of the model) would look wonderful, the results of the model would never be replicated upon roll-out.

To at least some degree, this is a danger you never have to worry about, because the programs that produce regression models, if used correctly, will prevent this from happening. But, what may happen is that these very same programs (Step Wise Regression Programs) will frequently produce models that contain “too many” variables – even though the statistics describing these variables will suggest that they are significant.

When this happens, even though the Decile Analysis done on the Validation sample will look good, the model will have less than an optimum chance to hold up on roll-out promotions. To prevent this from happening, or to at least reduce the chances of this happening, we suggest ”pruning away” the least significant of the
significant variables and observing the effect on the Decile Analysis.

If the Decile Analysis is not significantly affected (made worse) than drop the variable, and as often as not you will find that dropping the unnecessary variables actually improves the Decile Analysis – increase the spread and removes “bumps” in the model. If all of these steps are followed, you will have a good chance of replacing your RFM models.

Working to Build Better Predictive Models (Pt 1)

Wednesday, July 6th, 2011


It’ pretty surprising that a recent survey of CRM practices reported that 30%-40% of the companies surveyed indicated that they use predictive regression models.  By way of contrast, close to 50% were using RFM models. If statistical projection is really a better tool, for no other reason than the obvious observation that regression models can call on variables other than RFM, why this disparity?

I don’t know.  But, part of the answer may have to do with modeling attempts that did not work, or did not work better than RFM.

For starters it should be clear that in order for a regression model to “work better” than a RFM model, the regression model has to incorporate variables other than RFM variables that aid in the prediction of the dependent variable.

To keep things relatively simple, let’s just concentrate on response models, because most RFM models are used to predict response. Let’s further stipulate that for the purpose of this discussion to “work better” means to improve the “Lift”, or the ratio of responders to names promoted at some agreed upon depth of file.

For example for a regression model to “work better” than an RFM model at a depth of say 30% of the file, the regression model would have to identify significantly more responders than a RFM model would have identified at the same depth. Also, the argument that it’s easier to score a file with a single regression equation than it is to manage a RFM process, won’t count in this discussion – even though it’s true.

So, we get back to question of identifying more variables, variables other than RFM variables (Recency of purchase, Frequency of purchase and some measure of Monetary Value).

One way to do this is simply to create new variables out of RFM variables. For example, variables such as: the total number of purchases or total sales divided by months on file or divided by the number of times promoted.

Another key variable that frequently appears is Tenure, or the length of time a customer has been on the database. This is such an important variable that it is frequently the basis for creating separate models, one for relatively new customers, and one or more models for customers that have been on the file a longer period of time.

Then there is product purchase data, which particular products or product categories has the customer purchased. This variable can be handled through the use of “dummy or 0/1 coded variables”. And, as we have mentioned in the past, the best way to handle this data is through the use of Principal Components Analysis, a technique which gets at the pattern of purchases over the entire set of purchase possibilities.

Building and Monitoring Profitable, Technology-Based, Multichannel Marketing

Wednesday, June 1st, 2011

Without question there is an urgent need among direct marketers to prove that their investments in technology (databases, websites, social media, email, SEO, kiosks, call centers, catalogs and mailings, …) are more than paying for themselves.  How, then, should companies that transact and communicate with their customers through multiple channels evaluate the cost-effectiveness of their multi-channel marketing strategy?

In this article we suggest two metrics that managers should use to better understand how well their multichannel efforts are paying off:

    1. Cost to serve – The customer specific marketing and servicing costs typically incurred by multichannel marketers to initiate and maintain a business relationship with individual customers.  Examples are: freebies and promotions (shipping and handling costs, two for ones, cents or dollars off), fees and commissions (to affiliates, retailers, etc.), customer service and support (returns, call center support usage), loyalty costs (miles redeemed, gifts), etc., etc.

2. Realized revenue – The revenues actually realized by the company from a given customer.  This is determined by subtracting the cost to serve from the invoiced, or the contracted, price (which itself can differ by channel, retailer, or if the product was bought through an online auction).

The realized revenue from customers who routinely buy only when products are being promoted, return goods frequently, or require heavy levels of support services could be much lower than the invoiced revenue – severely impairing the lifetime attractiveness of such a customer.As the number of channels through which customers communicate and transact with companies continues to explode, the number of offers and communications companies present to their customers has grown exponentially. 

New database and CRM technologies make it possible to track customers by revisit behavior, allowing targeted promotions for newer versus existing customers, or for particular products. Additional offers still are communicated to various segments through e-mails, print and mass media ads, and direct mail pieces. 

While differences in offers have always existed, CRM technologies and new media have greatly increased the numbers of offers presented to customers. Left unmonitored, such complexity has the potential of severely increasing cost to serve, eroding realized revenues and greatly impairing profitability.

Cost-Revenue Analysis

Consider the situation described in the table below. While both customers, A and B, paid the invoiced price of $100, the realized revenues from customer A were only half as much as those from customer B. Further, notice that while costs such as promotion discounts would normally be visible to the manager, others such as affiliate fees and costs of returns are often missed in assessing the value of a given customer. New database and eCRM technologies make it possible to track these costs, often at the individual customer level.

The tracking system can be implemented by building your own software to tag each cost category with a unique customer I.D. Reports such as the one above can then be created using standard business intelligence tools. Third party software and services are also available (Return.com, ReturnBuy, etc.) that provide software or hosted services designed to monitor customers and their return habits, granting return merchandise authorization numbers, and reducing cases of fraud. Others such as CommissionJunction and Linkshare provide services related to affiliate marketing programs. 

Cost to Serve and Realized Revenue

  A B
Price Paid 100 100
Promo Discounts 12 8
Credit Card Fees 3 3
Shipping and Handling Discounts 25 22
Loyalty Payouts 8 12
Affiliate Fees 15 7
Returns 15 10
Customer Service Contacts 7 3
Realized Revenue 15 35

Once created, such a breakdown of the paid and realized prices can provide several meaningful insights.  For example, suppose now that the columns marked A and B represent the same customer (or cohort of customers), but at different points in time.

The evident improvement in realized price would, of course, represent welcome progress for the company.  But, more importantly, such a table also shows progress with respect to each of the components of cost to serve. The decreases in returns and affiliate fees probably indicate that the customer is more satisfied with the products bought, and relies less frequently on affiliate sites to find the target site.  These component level trends can then be compared versus target levels for each of the costs across time.  Necessary corrective action could then be taken to bring aberrant costs under control.

Strategic Implications

The analysis can also help the company develop the appropriate strategies for enhancing customer satisfaction and profitability. Based on the separate tables for each customer (where necessary, some of the costs could be inferred at the segment level), it is now possible to create a map such as the one shown below.  In this map, the horizontal axis represents the cost to serve and the vertical axis represents the revenues realized. Each customer can be plotted as a point in the cost-revenue space. Each of the four quadrants, then, becomes the basis for creating a segmentation scheme. 

Cost – Revenue Strategy Map

For example, the customers in the yellow “watch-out” quadrant have not yielded a great amount of realized revenue, but have cost a great deal to serve. These might be customers who demand a lot of call center services, use coupons extensively, and manage to convince the telesales rep to throw in free shipping.

They may have high invoiced revenues, might even have bought more than once, but are very expensive to maintain as customers. The company may want to consider teaching them how to use automated/online support and services. Alternately, they might be aggressive users of returns, discounts and promotions because such customers do not see real value in current offerings. Instituting “low-cost” marketing research approaches to better learn the kinds of products and services that represent real value for them should help the company improve realized revenues. But, not understanding how many such customers there are, and failing to devise the appropriate teaching, learning, or divestiture program for them will certainly prove unprofitable for the company. 

The “keep-em” customers in the top-left quadrant are obviously the most desirable. Programs aimed at retention, such as providing preferred services, and (especially in business to business applications) joint development of new products and services should be important.
 
Because the realized revenues are not out of line with the cost to serve, both of the remaining two quadrants are in balance. However, the high cost to serve customers in the top right quadrant suggests that ways of automating purchase orders (these are frequent buyers) and customer service, and replacing their use of discounts with attractive rewards for loyalty should result in significant bottom line gains. Finally, customers in the bottom left quadrant should be given incentives to increase the size of orders, or be cross-sold. But unless there is clear indication of high potential for future sales, failing to control cost to serve will have immediate negative bottom line impact.
 
Conclusion 
Much to the dismay of some (and gleeful satisfaction of others), technology based multichannel marketing is neither free nor easy. Because it is not free, it is imperative to understand how each customer impacts the bottom line. Fortunately, the same technology that has created so many ways of communicating with customers (each a potential money sink) also permits the marketer to record much better individual level data about the relevant costs and revenues over time. However, as many have discovered, from the gigamounds of bits generated, pulling the relevant data together to yield actionable results is not easy. The approach described here provides:

- a simple way of summarizing the relationship between marketing activities, customer responses, and the company’s bottom line. The approach emphasizes the importance of going beyond the invoiced price to the revenues actually realized from each customer

- a useful tool for monitoring and controlling the various costs incurred in selling and servicing individual customers over time

 - a strategic approach for creating four distinct segments of customers which yield actionable recommendations based on the value each customer provides the company

 

Modeling Product Purchases

Tuesday, May 3rd, 2011


After the big three modeling variables, Recency, Frequency and Monetary Value some analysts rank Product Purchase Data as the next most important potential predictive variable. I’m not sure that its number four on the modeling hit parade, but it’s certainly in the top ten, and for some businesses ranks in the top five.

In any event, it’s an important source of customer information, and thus the question of how to deal with it.  There are three or four choices:

1. Create a variable for each product and on each customer’s record code this variable a one (1) if the customer has purchased this product or a zero (0) if the customer has not purchased the product. This is called the Dummy Variable approach. So, if you have say forty products from which your customers can choose, you will set up forty  Dummy Variables.

2. The second approach is similar, but makes more sense. Suppose your customers can buy from each product line, or each product multiple times. It’s intuitive that it would make more sense to still set up forty variables, but instead of coding each variable a “1” or a “0”, count the number of times each customer bought each product and enter that count into the customer’s record.

3. A slight variation of this approach would be to record the dollars spent on each product, rather than just the count of the number of purchases. This approach would make more intuitive sense if the products differed significantly in price.

4. The last method is to use a technique called Principal Components Analysis, sometimes casually referred to as Factor Analysis, or as a particular type of Factor Analysis. To keep the purists happy we’ll just call it PCA.

In a PCA of product data the idea is too capture the product purchase behavior of a customer across the range of products offered. What we’re eventually hoping to discover is whether or not the purchase, or lack of purchase of different combinations of products will give us a clue as to the future behavior of individual customer, or of groups of customers, if we are doing the analysis at the source key or at some geographic (zip code) level.

Without getting too technical (you can skip this paragraph if you like) the PCA program creates a new set of Principal Component Variables and related Principal Component Scores that can be used later on in a regular or logistic regression prediction model.

Again, lets assume we’re working with forty product lines and we know how many times each customer has purchased each product, the program will initially generate forty Principal Components, but each PC will contain a different amount of information. In general, maybe four to eight of the forty Principal Components will contain most (70% or more) of the information contained in the entire set of PC’s. And these four to eight PC’s can be used in regression modeling just like any other “continuous” variable: Recency, Frequency, Monetary Value, Income, Age, etc.

Obviously it takes some time to transform raw product purchase data into Principal Components that can be used in scoring models, and the scoring procedures will become more difficult and so on. So, the question is, is it worth the extra effort to convert product purchase data into Principal Components?

To help answer this question we’ll look at some recent modeling results and you can decide for yourself. The problem was to predict the lifetime value of different customer groups based on all available customer data, including product purchase data.

As described above, we isolated and modeled just the product purchase data three (3) ways: (1) Using simple Dummy Variables, (2) Using Counts of the number of times each product line was purchased, and (3) Principal Components.

The quick answer is that the model built on just Dummy Variables had an R-Squared of 11% (the model explained 11% of the variation in lifetime value), the Count Approach had an R-Squared of 41% and the Principal Components method produced and R-Squared of 53%.

In addition, looking at a Decile Analysis of the Residual Errors (Table 1 below) produced by each approach argues for the Principal Components method over the Counting method, and most important, the use of simple Dummy Variables is shown not to be very effective in this type of application.

Table 1

Average Error In Each Decile For Three Modeling Techniques

Working With Tricky Segments

Tuesday, April 5th, 2011


In a previous post I suggested that modelers could improve their results by splitting
their datasets according to some critically important variable, such as Tenure (the length of time a customer has been on the file) and then build separate models for each major segment.

The argument being that it is intuitive that the usual set of modeling suspects (Recency, Frequency, Monetary Value, Products Purchased, Source and the whole set of Demographic Variables) will display different relationships with Response or Sales, depending upon the Tenure Segment, and that just adding Tenure as a variable, without taking interactions into account, isn’t sufficient to capture the full effect of this variable.

As if this isn’t complicated enough, I came across an article that questioned fundamental direct marketing beliefs, including the belief that there is a strong positive relationship between customer lifetime and profitability in a non-contractual relationship. In other words, they think that direct marketers think that customers that kind of hang around a long time, buying every once in a while, are profitable and every effort should be made to enhance the relationship between buyer and seller.

Of course, direct marketers who have looked closely at the data know that the costs of servicing infrequent buyers may indeed exceed the margins they yield; and the authors discovered for themselves that the simple relationship between lifetime months on file and lifetime profits is relatively weak (r = about .2 for the two groups studied).

What I did find interesting and potentially actionable was that they could divide a
significant number of customers into four meaningful groups:(Some 9000 households were studied over a three-year period. The households were correctly split into two cohort groups, January and February starters.)

Segment 1. Those that had relatively Long Active Lives and High Lifetime Revenue

Segment 2. Those that had relatively Long Active Lives and Low Lifetime Revenue

Segment 3. Those that had relatively Short Active Lives and High Lifetime Revenue.

Segment 4. Those that had relatively Short Active Lives and Low Lifetime Revenue.

The Graph below indicates that customers in Segments 1 and 3 kind of look alike, behave in a similar fashion, over their first 12 months and then begin to separate over time. No doubt that this is true, the operable question is can this disparity be predicted, and predicted early enough in customer’s life so that corrective action taken be taken.

The argument is that simple RFM analyses will miss this phenomena, and that database marketers, as a consequence of their not understanding that their database consists of these segments, will overspend on the Short Life-High Revenue segment, before traditional RFM analysis will depress mailings to this segment.

So, the key question for marketers is, if this effect is widespread — if there really are customers that come in for a short while, buy a lot and then leave — can they be detected? Will modeling Tenure Segments, as suggest above, and in last month’s article capture this effect.  Probably not, at least not by itself. What might work is a Principal Component Analysis of the available purchase behavior data over the last six months.

This approach might discern either a trend in dollars spent, or a trend in the particular products purchased that would indicate that the customer was displaying a pattern associated with customers that buy heavily for a short while and then switch to someone else – for reasons we can only speculate about.

To Straighten Or Not To Straighten That Is the Question

Wednesday, February 2nd, 2011


If you’re a marketer who uses or commissions regression models you need to understand the topic of non-linearity, what is it, why is it important, how it could improve your models, and why it doesn’t happen automatically. This article will address all of these issues.If you’ve built or used regression models to predict response or sales you know that a regression equation looks like this: 

Y = a +b1*X1 + b2*X2 + b3*X3…bn*Xn 

In this equation Y is the “thing” you’re trying to predict (the dependent variable)  and the X’s represent the “things” (independent variables) you know about your customers or prospects that allow you to make the predictions. Typical independent variables include performance indicators such as recency, frequency and dollar sales; demographics such as age, and income, and promotion history, such as the number of times called, etc. 

The” b’s” are called regression coefficients and you can think of them as weights assigned to each variable in the model, the assignment is generated by a regression program. The bn*Xn notation simply means that there could be up to some number (n) of variables in the model. The “a” is a constant that we can skip over for now.
 
The job of the statistician, working with a particular dataset, such as the results of a past promotion, is to discover which independent variables have a significant effect on the dependent variable and then feed this information to the regression program which will produce the regression equation.
 
One of the keys to a “good” long lasting model is to find the right set of predictive variables given the hundreds if not thousands of potential predictors from which to select.
But, in addition to finding the right variables it’s important to determine if the relationship between a predictor variable such as AGE and the a dependent variable such as SALES is best described by a simple straight line relationship, or whether some other “non-linear” relationship makes for a better, more accurate prediction.
 
When a non-linear relationship exists, it’s the job of the modeler to try different transformations of the data to determine the best fit. You as a user can tell if this has been done in one of your models if you see something like this: 

Sales = a +b1*Log of Recency +b2*Square Root of Prior Sales  

What this equation tells you is that the modeler determined that that relationship between Sales and Recency is best described by replacing Recency (number of months since the last purchase) by the log of Recency, and that the relationship between Sales and Prior Sales is best described by replacing Prior Sales by the Square Root of Prior Sales. 

Exhibits 1 and 2 show how the log transformation works to straighten the relationship between Sales and Recency.. Exhibit 1 is a plot of Sales against Months Since Last Purchase, Exhibit 2 is a plot of Sales against the Log of the Months Since Last Purchase. The Log transformations straightens the data and results in a better fit as indicated by the R Squared value of 1 versus an R Squared value of .86 for the original or untransformed data.
 
Exhibit 1

Exhibit 2

If nothing else, the above equation (with transformations) certainly looks more impressive than the equation below, without the data transformations. 

Sales = a +b1*Recency + b2*Prior Sales 

But apart from looking impressive, the real question is: does finding the right shape of a relationship, correcting for non-linearity, or straightening, three different ways to say the same thing, really make a difference?
 
To answer this question we created two data sets. Each data set has 400 observations representing 400 customers, each of whom responded to a mailing and purchased some amount of product. As is customary, the first data set will be used to build the model the second to test or validate the model.
 
But, to make sure that we could prove our point we cheated.  Instead of searching for variables that had a non-linear relationship with sales, and developing an equation, we started with the correct model! 

In the Correct Model each customer’s sales is determined by this formula 
Sales = 75 –30 times the log of the  number of days since last purchase + 5 times the square root of Prior Orders +.5 times the exponential value of Prior Sales/million + 6  if age is greater than 45 + a random error that ranges between –50 and +50.
 
To determine the effect of correcting for non-linearity we simply ran the data through an Excel spreadsheet and had the program calculate a regression model, using the four variables (Recency, Orders, Prior Sales and Age) but with no attempt to incorporate their known non-linear relationships.
 
The program produced the following equation.
 
Sales = 64 – .58*Recency + .20*Orders +2.61*Prior Sales + .085*Age 

The Model had an R Squared of 33%. (In other words the simple model explained 33% of the variation in Sales.
 
Then we ran the data through the program again, this time substituting the correct form of the relationship for the original uncorrected data.
 
The same program produced the following equation. 

Sales = 84 –29.29*the log of the number of days since last purchase + 4.39*square root of Prior Orders +.48*the exponential value of Prior Sales/million + 4.05 if age is greater than 45 

The Model’s R Squared was 79%. (Even though we knew the correct form of the only four variables affecting the model, the model was not perfect because of the random error. 
So, it would appear that knowing the correct shape of the relationship between independent variables and the dependent variable makes a huge difference—at least to a statistician, but how about the difference it makes to a direct marketer. 

To answer this question we applied both models to our second data set of 400 different customers and produced the two decile analyses shown in Tables 1 and 2. 


 

As you can see by comparing Tables 1 and 2, the Correct Model results in a greater spread and a closer fit and is therefore the better model. But don’t draw the wrong conclusions from this example. In the real world the search for the correct relationship is not done just to get a better fit. In fact that is a relatively weak reason for going through all the work that it takes to find and correct for non-linearity. In the real world, many relationships are so non-linear that these important variables will not appear in a regression model at all… unless their non-linearity is first identified and then corrected for. 

Why is that? Because the regression programs are expecting linear relationships and a relationship that is in fact very strong, but very non-linear may be missed entirely by an analyst just running data through a regression program. (And, most importantly, the regression programs don’t do this automatically by themselves, this work has to be done by an analyst working with the data.) 

So, how does the analyst discover these non-linear relationships? By using a number of graphical techniques and/or CHAID.  The lesson for the direct marketer is that these non-linear relationships exist. We find one or two in nearly every model we do. If you don’t see them in yours, that does not mean they are not there, they just may have been overlooked and your models could be significantly improved. 

One last note, correcting for non-linearity is a central part of what statisticians call Exploratory Data Analysis (EDA). This practice is recommended even when the modeling technique does not assume that the relationships it’s being asked to analyze are linear. For example, artificial neural net solutions do not assume linear relationships.

Nevertheless, straightening complicated non-linear relationships prior to submission of data to the neural net is a commonly recommended procedure. It makes it easier for the Net to arrive at a reliable solution, and there’s nothing wrong with that.   

Separate Models for Separate Segments?

Tuesday, January 4th, 2011


One of the ways in which you can improve your modeling results is to look for segments within your customer database that have different relationships to potentially predictive variables such as  Recency, Frequency, Monetary Value and Products purchased.

The trick is to determine if the strength of the relationship is equally strong across all segments, or whether the strength of the relationship differs from segment to segment.

For example, lets suppose you believe that your sales are correlated with two variables, will call them variables X1 and X2. What you might do is ask your statistician to draw a sample of data, create a Scatter Diagram so that you can see the relationship and calculate the Correlation Coefficient so that you can quantify the relationship as well as visualize it. We did that for a dataset we created for this article.

So far so good. Your hunch was correct your sales (Y) are positively correlated with X1 and also with X2. And while the correlation statistics are not great (.7 to .9) they are not weak (.1 to .3) either. They are moderate, .45 and .64. (The absolute value of a correlation coefficient can not be less than 0 or more than 1.)

Now that you’ve discovered two variables that are related to sales you would want to build a two variable regression model of the form Y = A +b1X1 + b2X2.  Using the same data set that produced the above results you have your statistician run the data through the a Regression procedure and produce the following results.

Y = 31.5 + 9.2*X1 + 6.7*X2 with an R-Squared of 59%.

Not Bad. Our simple two variable example produced an equation or a model which explains 59% of the difference we see among our customers’ behavior.

Suppose it now dawned upon you that while sales of your customers were correlated with variables X1 and X2, your customer file was really made up of three distinct segments: that you call: Young, Middle and Old and that you suspect that the relationship between sales and X1 and X2 might not be the same for each segment.

What could you do?
Since you’ve identified three segments you could use this information in your model. How? Have your statistician create two new “Dummy Variables” and code your young customers DY and your middle aged customers DM. You don’t need to code your old customers DO, because if they are not Young (Coded DY) or Middle (coded DM) then they must be in the segment called Old. Your statistician runs the data through the regression program again and arrives at the following equation:

Y = 428 + 8.4*X1 + 7.6*X2 – 539.5*DY – 804.4*DM and R-Squared goes to 86%.

Your hunch was correct each segment has a different relationship with X1 and X2. Your statistician now suggests that the results could be improved even more if we looked for the interaction between the segment identifiers and the individual variables themselves. You have no idea what this means but it sounds good so you try it and this is what you come up with.

Y = 4 + 7*X1 + 13*X2 –1*DY +1*DM -2*DY*X1 –5*DY*X2 +4*DM*X1 -10*DM*X2 and R-Squared =100%

What happened? What happened is that we discovered, in our made up example, that each segment behaves differently with regard to variables X1 and X2. And, that by understanding the relationship between X1 and X2 and sales in each segment we were able to build, in this artificial case, a perfect model! Of course in real life you will never be able to build anything close to a perfect model.

But the lesson to be learned is that if you suspect that different demographic or lifestyle or attitudinal segments might display different relationships with regard to your key performance variables, try building separate models for each segment.

Building separate models, as opposed to building one equation with all dummy and interaction variables, as we did above, is a simpler solution and one that is more likely to be understood and less prone to implementation errors.


Rss Feed Tweeter button Facebook button Technorati button Reddit button Myspace button Linkedin button Webonews button Delicious button Digg button Flickr button Stumbleupon button Newsvine button Youtube button