So What If It All Went Away?

January 4th, 2012 by: DSA


The trend is definitely not good.  Privacy is not an issue that will go away and consequently there will be less and less data available for individual and/or  household level overlays.   But, the real question is — could direct marketers live without household level demographic and financial data?  Of course they could.  And, if they honed their modeling skills, they might even be better off.

Let’s examine some of the ways household level data is used.

1. To build better customer response and performance models.

With some notable exceptions, such as the acceptance of a new product targeted at a specific demographic group, customer transaction data (RFM data, product purchase data, tenure, source and a handful of other transaction variables, are all that are necessary to build more than satisfactory response and performance models. Additional demographic variables much more often than not do not result in a larger spread or a more accurate model.

2. To build new customer acquisition models to be used against response lists.

For those not familiar with this application, the idea here is to append household level data to the names coming out of a merge purge of multiple response lists.

Then these names are scored using a model, built from prior mailings, that gives specific weights to the demographic variables contained in the model. Prospects with scores that are lower than some criteria (perhaps the bottom two deciles) are dropped from the promotion.

This process is both time consuming and complicated. Time-consuming because the scoring and the suppression have to take place after the merge, and complicated because of a number of issues: (a) how will non-matched names be scores, (b) how will missing-data be handled within the matched population.

Some Alternatives

Working on the assumption that we will continue to have access to the detailed census data that is collected at the block group level and re-compiled at the zip plus 4 level, direct marketers should be able to build response and performance models that for all practical purposes (suppression of the bottom two or three deciles from a merged-purged set of prospects selected from response lists) are as effective as models built upon household level data.

What’s more these zip or zip plus 4 models are much easier and much less expensive to implement than household level models. (List owners are sent selection or suppression tapes, prior to shipping their names to your merge-purge house.)

Tips for Building Zip and Zip Plus 4 Models

There are two keys to building good models based on census data.  The first has to do with variable creation, the second with technique.

Companies should build their own historical response and performance indices based on past promotions and customer behavior. Working at the Zip Plus 4 level its possible to build historical indices, or simply historical response and/or performance  rates, which can then be aggregated at either the 5 digit Zip Code level, the Sectional Center Level, or what’s frequently even better, indices or historical rates aggregated at a Prism or a MicroVision segment level. Each commercial clustering scheme associates a demographic or lifestyle segment with a zip plus 4 code.

These historical results are then treated as potential independent variables in your response or performance models. And, in our experience one or more of these historical variables will enter a model as one of the model’s most important variables.

For example, a response model we built for a continuity program (the model had a top decile lift of 270) contained only three variables and two of them were historical indices. The third variable was a Principal Component Analysis (PCA) variable that compared each zip code’s educational level with the average educational level within the entire mailing population.  Which brings us back to the subject of modeling technique – the second component of good census data models.

If you’ve dealt with census data you know that while there are some 300 to 400 hundred census variables, there are only about 20 major categories of data, and the categories are presented as frequency distributions. For example, Education (a major Census Category) is made up of four separate Census Variables: (1) percent of population with less than a High School degree; (2) percent with a High School Degree; (3) percent with some College; (4) percent with a College degree, or more. In our experiences models built on individual census variables, as opposed to a PCA analysis of the census category, while much easier to build, are much less stable and produce poorer models.

So, what’s the bottom line. While we certainly don’t wish to see the demise of individual or household level overlay data for direct marketing purposes, should it happen, to one degree or another, if we’re smart and take advantage of the data and techniques at our disposable, we’ll be able to make up most if not all of the losses imposed upon us.

Do You Really Want To Wait For A Data Warehouse?

December 6th, 2011 by: DSA


We were recently asked to critique a plan to have a database system developed that would support database marketing programs. The recommended solution included upgrading a corporate data warehouse that would be the source of data required to support a marketing decision support system. The plan called for a three-tiered relational architecture. The time frame: 18 to 24  months. 

The essential requirements of the system included the ability to: profile, model, segment and score customers, plan, design and execute promotions, track results and provide management reports.
 
In order to accomplish these limited objectives what’s required is: the identification, collection and storage of relevant marketing data from a variety of internal operational systems and external data sources, including mail houses, telemarketing suppliers, and data providers, and the development of the required modeling, promotion and reporting systems.
 
Given all of the direct marketing systems available today (both relational and non-relational or proprietary) there’s no need (our opinion) to spend years to build a direct marketing database and decision support system from the ground up that would meet the requirements outlined above, unless of course you enjoy doing that kind of thing. 

So, why did the plan we reviewed suggest that it would take one to two years to implement? 
First of all, it could take a year or more just to upgrade the corporate data warehouse, let alone the associated data marts and end user software tools.

Why it takes so long has to do with the amount of change required in the operational systems providing data to the data warehouse.  For example, if a number of source systems contain name and address data, then it will be the job of the warehouse to first identify which fields are considered attributes of a customer, then identify which customers can be defined as duplicates, and then based on some agreed upon logic determine the “correct” name and address for that customer. 

Whether this “correct” name is feed back to each input or source system, or whether the logic in the source system is changed to access the data warehouse for name and address data is an open question. The central fact is that operational systems have to be modified to take advantage of the benefits of the consolidation. The extent that common data is found in more than one input file determines the work that has to be done in the data warehouse and in modifications to the operational systems.
 
Hence, the decision to integrate with a data warehouse as opposed to building a simple marketing database (defined here as a one way consolidation of data from multiple sources, without a feedback mechanism to the source operational systems) caused what was initially a relatively simple request to explode into a complicated, time consuming and expensive project.
 
What’s more, because the repository was large and could be used for other activities in addition to marketing, a secondary repository, just containing the data needed for marketing was envisioned — a data mart or marketing database.
 
Finally, in order to meet performance demands for decision support and management reporting, a third data level, containing summary data and the tools required to access summary data was required.  In other words, because the volumes were large, a database with more than 5 million customers is considered large, and because database management system was relational, using the detailed data available in the marketing database would not be practical, i.e. it could take hours to answer even relatively simple queries. 

Looking at it objectively, what had started as a request to support marketing, grew inadvertently or otherwise into a much larger project.  Couple this factor with the choice of relational technology and you will wind up with a two or a three-year project. 

So, how to get around this problem?  Assuming you are forced or choose to use a relational database management system as your base platform, as opposed to one the many proprietary systems designed for direct marketing applications, you can eliminate the need for a data warehouse/data mart configuration by limiting the data going into the warehouse to just data needed for marketing purposes.

Just make sure that the interfaces are one way feeds from the legacy systems to the database. Thus the data warehouse and the data mart become one in the same thing. Another approach might be to have a relational warehouse and use a proprietary system as the platform for the second tier, eliminating the need for summary data and the third tier tools necessary to access data within acceptable performance standards. 

The third choice is to use to one of the proprietary platforms as your marketing database and significantly reduce your development time and benefit from their high performance characteristics. Of course, in order to go proprietary you will probably have to overcome the objections of your IT department which may insist on using a specific relational product regardless of its performance characteristics, because they have settled on that relational standard for the their operational systems and don’t wish to support another piece of technology. 

Prototype Databases – Try before you buy!

November 3rd, 2011 by: DSA

“I’m sorry, I know you need this customer profile report, but it requires adding a new data source.  The marketing database design specs were frozen in order to make the system delivery date. If we add this new requirement, the system will be delayed for 2 more months.  You want to explain that to the boss?”

How many of us have heard these words?

The majority of companies implementing a new marketing database system will end up doing a major redesign, sometimes called “Phase 2”, within the first 12 months… and they are the lucky ones, many other companies cannot afford the budget or staff time to do a “Phase 2” causing their marketers have to live with the system as-is.  In defense of the system implementers, there comes a time when you have to stop building requirements and start building the system if you are to going to meet your target date.

Sometimes the problem is that the development schedule did not include sufficient time to collect end user requirements and reconcile them into a comprehensive set of  system requirements.  But, most of the time the problem is that the marketing and implementation teams have an insufficient base of experience to draw upon.

In short, they don’t fully understand what type of data is available, how that data would be combined and updated, what new kinds of marketing programs the data would support, how they would go about implementing those new marketing programs and how best to track and evaluate results. In this situation, defining requirements is much more of a process than an event. The more that new capabilities are considered, the more new opportunities are seen… and this never happens all at once.

In implementing marketing databases, there is usually a push by senior management to get the database running A.S.A.P. so that the system can start generating a return on its investment. As a result, many, companies fall victim to the old problem where there is not enough time to do it right the first time, but there is enough time to do it twice.  Shifting to a two step strategy where a prototype database is built before the final system can help alleviate many of these issues.

A prototype database is essentially a “throw away” system that is built quickly, supports most of the required database functions, doesn’t have to worry about being efficient and represents a short term (not a long term) investment.  A prototype database provides immediate operational benefits while helping to build a base of knowledge to use in defining requirements for the final system.  This system can also be used to develop “proof of concept” information to support the decision to fund a marketing database system.

Depending on available data sources, a prototype database can immediately provide information to support strategic planning through demographic profiles, purchase activity profiles, customer value, customer segments,  etc. The system can also be used tactically to implement promotions, track promotion history, track promotion results and do backend responder analysis. A prototype database offers many significant advantages.

Fast startup.  Since the goal is to get a quick “80% solution”, not to implement a complete, efficient long term production system, the prototype database does not need all of the same operational processes as the final system.

Flexible implementation.  Prototype databases use a large number of ad-hoc processes to make them flexible, since efficiency is not a key issue. Therefore, it is much easier to include new sources of data, add new calculated fields, process new list selection instructions, produce new reports, etc.

Hands-on experience with the data.  Building the prototype database gives a preview of which data is available from which internal and external sources and how it can be combined. This involves investigating file matching criteria (customer number, telephone number, name and address, etc.), determining how data fields from different sources should be combined or reconciled against each other and understanding how each field is coded and what the codes mean. During this process, each data field is profiled to review data formats, identify data errors and define options for resolving errors.

Experience using a marketing database system
Besides the production-oriented issues identified above, the end users in the marketing department will start gaining hands-on experience of a different nature. Once the database is built, the marketing staff will start understanding what types of data are available.

They will also gain considerable experience with using the data effectively to run their business. Very often, the new range of possibilities will force a re-thinking of how marketing does its job, ie: the business process.  Through a combination of planning and trial & error, new customer segmentations will be created, new list selection criteria will be implemented, new reports will be created and marketing program success will be defined differently.

We have to recall that the speed and flexibility of implementing prototype databases comes at a price. These systems rely on a number of ad-hoc processes, manual intervention and outside services.  Prototype databases often are based on batch processing and having an experienced IT staff member write programs for each update step, list selection and requested report. This results in slower than desired turnaround times. 

Depending on the type of software used to implement the prototype database, the system may or may not support direct access by end users along with the ability to directly use marketing-oriented software tools.  Finally, depending on the ease of gaining access to data sources, the prototype database may not include all of the data that is desired by the marketing staff.

The Benefits Of Cohort Group Reporting (Pt 2)

October 4th, 2011 by: DSA


In the first part of this post we saw that basic cohort group reporting based on enrollment period can produce very powerful insights into the customer base. Generally, direct marketers do not stop at reporting at the enrollment group level. The more common practice is to subdivide the enrollment group by major media source so that the performance of say all direct mail or all print acquired customers can be tracked.

Frequently a change in enrollment group behavior can be quickly traced to a change in the mix of new customers — customers acquired from direct mail generally perform better than customers acquired from print, and they tend to perform better than customers acquired from broadcast, who tend to perform better than customers acquired from outbound telemarketing and so on. (Remember I said generally, so your experience may differ.)

So, the usual cohort group is an enrollment group broken down by major media. Cohort reporting is of course not limited to attrition reporting. One may track overall sales, sales mix, store visits, average purchases or returns, or complaints, or anything else that’s relevant to one’s particular business.

For example a cable company may want to track upgrade or downgrade behavior as well as overall disconnect rates, and the cohort group could be traced to a particular sales territory or to a particular sales person. Companies with reward programs may want to track points earned or points redeemed by cohort group to get an early warning reading on changes in customer behavior.

Cohort group reporting can also be carried down to the keycode-enrollment group level. At this level cohort reporting is not used for overall trend analysis but for forecasting the lifetime value of individual customer groups. The weighted average projection of all of the cohort groups acquired from the same source (same keycode) represents the lifetime value of the average customer acquired from that particular source. And, this average value can be compared to the cost per order to measure the profitability of the promotion.

Cohort group reporting has been around since the 1970’s. In those days the reporting was done in batch mode at the end of each cycle update. A set of hard copy reports was produced and distributed to the marketing managers who were responsible for new customer acquisition and customer marketing.

Today the same information can be produced in a variety of ways including everything from the original hard copy reports to drill down exercises using star schemas to multi-dimensional presentations that represent cohort groups in three dimensional cubes floating around the front of your PC. (In fact, if you were of a mind to do so, you could probably turn your cohort reporting into a screen saver.)

The problem with the newer representations of the same old concepts is that one might not recognize the need for the drill down exercise, or the multi-dimensional presentation, so I’m partial to old fashioned hard copy reporting, updated with modern graphics that make changes in performance obvious to anyone willing to look. And, the need to look is just as important today as it always was.

The Benefits of Cohort Group Reporting (Pt 1)

September 7th, 2011 by: DSA


As more and more “new media” marketers get involved with database marketing applications the more important it becomes to remember some of the key lessons painstakingly learned by our direct marketing predecessors.  Remembering to measure the performance of cohort groups is one of those lessons.  (A cohort group is defined as a group of individuals with one or more common characteristics. ) 

In the world of direct marketing an enrollment date, or more accurately an enrollment period defines the basic cohort group. For example, all of the new customers that came on the database in the month of August, would be a cohort group. Later we will expand the definition of cohort groups to include attributes other than enrollment period.
 
Measuring the performance of cohort groups, or let’s call them enrollment groups for a little while longer, is the best way to monitor the performance of any direct marketing business that continually acquires new customers, new members, or new subscribers and is concerned with possible attrition. For all practical purposes that means all direct marketers.  Yet not all direct marketers have systems in place that monitor cohort performance. 

The chart below shows monthly attrition rates for individual enrollment groups and the average attrition rate by month for all enrollment groups (the last set of bars).  A marketer looking at this graph would immediately notice that enrollment groups 14, 15 and 16 are performing well below the average behavior of prior enrollment groups. 

Without enrollment group reporting a marketer would have to rely on monitoring trends in overall or average churn rates.  (The average churn rate is defined by the number of attriters in a period divided by the number of customers at the start of the period.) Measuring overall churn will frequently miss trends that are due to changes in acquisition strategy or competitive conditions.
 

enrollment group

To be continued… 

Working To Build Better Predictive Models (Pt 2)

August 3rd, 2011 by: DSA


In the first part of this discussion we outlined ways to increase the number of available predictor variables. Of course, what’s needed next is a repeatable process for identifying key variables from the host of variables that appear on our databases. Here statistical techniques like “correlation tables” and simple cross tabs, which show the relationship between potential variables and response can help. And, of course, the marketing people should always tell the modeler which variables they either know or think to be significant predictors.

However, we think the best technique for identifying potential variables is CHAID.

CHAID can be used to pictorially display the differences in response rates looking at each potential variable, one at a time. When used in this manner, the marketing person is on an equal footing with the analyst or statistician, because the results, with just a little bit of explanation, are so easy to understand. (Whether CHAID should be used beyond this point as a replacement for a regression model is a subject we won’t get into here.)

Needless to say, a CHAID can’t be done for every conceivable potential variable, so some combination of judgement and reliance on the correlation table will be required in this initial variable selection process.

Now, let’s assume for the purpose of this discussion that we identify 20 to 30 or even 50 variables, other than the basic RFM variables, that are each individually related to response. The last thing in the world we would want to do is use all of them in a model at the same time. The model would so “overfit” the data that while a Decile Analysis of the Calibration sample (the sample upon which the model was built) or even the Validation sample (the hold-out sample intended to prove the validity of the model) would look wonderful, the results of the model would never be replicated upon roll-out.

To at least some degree, this is a danger you never have to worry about, because the programs that produce regression models, if used correctly, will prevent this from happening. But, what may happen is that these very same programs (Step Wise Regression Programs) will frequently produce models that contain “too many” variables – even though the statistics describing these variables will suggest that they are significant.

When this happens, even though the Decile Analysis done on the Validation sample will look good, the model will have less than an optimum chance to hold up on roll-out promotions. To prevent this from happening, or to at least reduce the chances of this happening, we suggest ”pruning away” the least significant of the
significant variables and observing the effect on the Decile Analysis.

If the Decile Analysis is not significantly affected (made worse) than drop the variable, and as often as not you will find that dropping the unnecessary variables actually improves the Decile Analysis – increase the spread and removes “bumps” in the model. If all of these steps are followed, you will have a good chance of replacing your RFM models.

Working to Build Better Predictive Models (Pt 1)

July 6th, 2011 by: DSA


It’ pretty surprising that a recent survey of CRM practices reported that 30%-40% of the companies surveyed indicated that they use predictive regression models.  By way of contrast, close to 50% were using RFM models. If statistical projection is really a better tool, for no other reason than the obvious observation that regression models can call on variables other than RFM, why this disparity?

I don’t know.  But, part of the answer may have to do with modeling attempts that did not work, or did not work better than RFM.

For starters it should be clear that in order for a regression model to “work better” than a RFM model, the regression model has to incorporate variables other than RFM variables that aid in the prediction of the dependent variable.

To keep things relatively simple, let’s just concentrate on response models, because most RFM models are used to predict response. Let’s further stipulate that for the purpose of this discussion to “work better” means to improve the “Lift”, or the ratio of responders to names promoted at some agreed upon depth of file.

For example for a regression model to “work better” than an RFM model at a depth of say 30% of the file, the regression model would have to identify significantly more responders than a RFM model would have identified at the same depth. Also, the argument that it’s easier to score a file with a single regression equation than it is to manage a RFM process, won’t count in this discussion – even though it’s true.

So, we get back to question of identifying more variables, variables other than RFM variables (Recency of purchase, Frequency of purchase and some measure of Monetary Value).

One way to do this is simply to create new variables out of RFM variables. For example, variables such as: the total number of purchases or total sales divided by months on file or divided by the number of times promoted.

Another key variable that frequently appears is Tenure, or the length of time a customer has been on the database. This is such an important variable that it is frequently the basis for creating separate models, one for relatively new customers, and one or more models for customers that have been on the file a longer period of time.

Then there is product purchase data, which particular products or product categories has the customer purchased. This variable can be handled through the use of “dummy or 0/1 coded variables”. And, as we have mentioned in the past, the best way to handle this data is through the use of Principal Components Analysis, a technique which gets at the pattern of purchases over the entire set of purchase possibilities.

Building and Monitoring Profitable, Technology-Based, Multichannel Marketing

June 1st, 2011 by: DSA

Without question there is an urgent need among direct marketers to prove that their investments in technology (databases, websites, social media, email, SEO, kiosks, call centers, catalogs and mailings, …) are more than paying for themselves.  How, then, should companies that transact and communicate with their customers through multiple channels evaluate the cost-effectiveness of their multi-channel marketing strategy?

In this article we suggest two metrics that managers should use to better understand how well their multichannel efforts are paying off:

    1. Cost to serve – The customer specific marketing and servicing costs typically incurred by multichannel marketers to initiate and maintain a business relationship with individual customers.  Examples are: freebies and promotions (shipping and handling costs, two for ones, cents or dollars off), fees and commissions (to affiliates, retailers, etc.), customer service and support (returns, call center support usage), loyalty costs (miles redeemed, gifts), etc., etc.

2. Realized revenue – The revenues actually realized by the company from a given customer.  This is determined by subtracting the cost to serve from the invoiced, or the contracted, price (which itself can differ by channel, retailer, or if the product was bought through an online auction).

The realized revenue from customers who routinely buy only when products are being promoted, return goods frequently, or require heavy levels of support services could be much lower than the invoiced revenue – severely impairing the lifetime attractiveness of such a customer.As the number of channels through which customers communicate and transact with companies continues to explode, the number of offers and communications companies present to their customers has grown exponentially. 

New database and CRM technologies make it possible to track customers by revisit behavior, allowing targeted promotions for newer versus existing customers, or for particular products. Additional offers still are communicated to various segments through e-mails, print and mass media ads, and direct mail pieces. 

While differences in offers have always existed, CRM technologies and new media have greatly increased the numbers of offers presented to customers. Left unmonitored, such complexity has the potential of severely increasing cost to serve, eroding realized revenues and greatly impairing profitability.

Cost-Revenue Analysis

Consider the situation described in the table below. While both customers, A and B, paid the invoiced price of $100, the realized revenues from customer A were only half as much as those from customer B. Further, notice that while costs such as promotion discounts would normally be visible to the manager, others such as affiliate fees and costs of returns are often missed in assessing the value of a given customer. New database and eCRM technologies make it possible to track these costs, often at the individual customer level.

The tracking system can be implemented by building your own software to tag each cost category with a unique customer I.D. Reports such as the one above can then be created using standard business intelligence tools. Third party software and services are also available (Return.com, ReturnBuy, etc.) that provide software or hosted services designed to monitor customers and their return habits, granting return merchandise authorization numbers, and reducing cases of fraud. Others such as CommissionJunction and Linkshare provide services related to affiliate marketing programs. 

Cost to Serve and Realized Revenue

  A B
Price Paid 100 100
Promo Discounts 12 8
Credit Card Fees 3 3
Shipping and Handling Discounts 25 22
Loyalty Payouts 8 12
Affiliate Fees 15 7
Returns 15 10
Customer Service Contacts 7 3
Realized Revenue 15 35

Once created, such a breakdown of the paid and realized prices can provide several meaningful insights.  For example, suppose now that the columns marked A and B represent the same customer (or cohort of customers), but at different points in time.

The evident improvement in realized price would, of course, represent welcome progress for the company.  But, more importantly, such a table also shows progress with respect to each of the components of cost to serve. The decreases in returns and affiliate fees probably indicate that the customer is more satisfied with the products bought, and relies less frequently on affiliate sites to find the target site.  These component level trends can then be compared versus target levels for each of the costs across time.  Necessary corrective action could then be taken to bring aberrant costs under control.

Strategic Implications

The analysis can also help the company develop the appropriate strategies for enhancing customer satisfaction and profitability. Based on the separate tables for each customer (where necessary, some of the costs could be inferred at the segment level), it is now possible to create a map such as the one shown below.  In this map, the horizontal axis represents the cost to serve and the vertical axis represents the revenues realized. Each customer can be plotted as a point in the cost-revenue space. Each of the four quadrants, then, becomes the basis for creating a segmentation scheme. 

Cost – Revenue Strategy Map

For example, the customers in the yellow “watch-out” quadrant have not yielded a great amount of realized revenue, but have cost a great deal to serve. These might be customers who demand a lot of call center services, use coupons extensively, and manage to convince the telesales rep to throw in free shipping.

They may have high invoiced revenues, might even have bought more than once, but are very expensive to maintain as customers. The company may want to consider teaching them how to use automated/online support and services. Alternately, they might be aggressive users of returns, discounts and promotions because such customers do not see real value in current offerings. Instituting “low-cost” marketing research approaches to better learn the kinds of products and services that represent real value for them should help the company improve realized revenues. But, not understanding how many such customers there are, and failing to devise the appropriate teaching, learning, or divestiture program for them will certainly prove unprofitable for the company. 

The “keep-em” customers in the top-left quadrant are obviously the most desirable. Programs aimed at retention, such as providing preferred services, and (especially in business to business applications) joint development of new products and services should be important.
 
Because the realized revenues are not out of line with the cost to serve, both of the remaining two quadrants are in balance. However, the high cost to serve customers in the top right quadrant suggests that ways of automating purchase orders (these are frequent buyers) and customer service, and replacing their use of discounts with attractive rewards for loyalty should result in significant bottom line gains. Finally, customers in the bottom left quadrant should be given incentives to increase the size of orders, or be cross-sold. But unless there is clear indication of high potential for future sales, failing to control cost to serve will have immediate negative bottom line impact.
 
Conclusion 
Much to the dismay of some (and gleeful satisfaction of others), technology based multichannel marketing is neither free nor easy. Because it is not free, it is imperative to understand how each customer impacts the bottom line. Fortunately, the same technology that has created so many ways of communicating with customers (each a potential money sink) also permits the marketer to record much better individual level data about the relevant costs and revenues over time. However, as many have discovered, from the gigamounds of bits generated, pulling the relevant data together to yield actionable results is not easy. The approach described here provides:

- a simple way of summarizing the relationship between marketing activities, customer responses, and the company’s bottom line. The approach emphasizes the importance of going beyond the invoiced price to the revenues actually realized from each customer

- a useful tool for monitoring and controlling the various costs incurred in selling and servicing individual customers over time

 - a strategic approach for creating four distinct segments of customers which yield actionable recommendations based on the value each customer provides the company

 

Modeling Product Purchases

May 3rd, 2011 by: DSA


After the big three modeling variables, Recency, Frequency and Monetary Value some analysts rank Product Purchase Data as the next most important potential predictive variable. I’m not sure that its number four on the modeling hit parade, but it’s certainly in the top ten, and for some businesses ranks in the top five.

In any event, it’s an important source of customer information, and thus the question of how to deal with it.  There are three or four choices:

1. Create a variable for each product and on each customer’s record code this variable a one (1) if the customer has purchased this product or a zero (0) if the customer has not purchased the product. This is called the Dummy Variable approach. So, if you have say forty products from which your customers can choose, you will set up forty  Dummy Variables.

2. The second approach is similar, but makes more sense. Suppose your customers can buy from each product line, or each product multiple times. It’s intuitive that it would make more sense to still set up forty variables, but instead of coding each variable a “1” or a “0”, count the number of times each customer bought each product and enter that count into the customer’s record.

3. A slight variation of this approach would be to record the dollars spent on each product, rather than just the count of the number of purchases. This approach would make more intuitive sense if the products differed significantly in price.

4. The last method is to use a technique called Principal Components Analysis, sometimes casually referred to as Factor Analysis, or as a particular type of Factor Analysis. To keep the purists happy we’ll just call it PCA.

In a PCA of product data the idea is too capture the product purchase behavior of a customer across the range of products offered. What we’re eventually hoping to discover is whether or not the purchase, or lack of purchase of different combinations of products will give us a clue as to the future behavior of individual customer, or of groups of customers, if we are doing the analysis at the source key or at some geographic (zip code) level.

Without getting too technical (you can skip this paragraph if you like) the PCA program creates a new set of Principal Component Variables and related Principal Component Scores that can be used later on in a regular or logistic regression prediction model.

Again, lets assume we’re working with forty product lines and we know how many times each customer has purchased each product, the program will initially generate forty Principal Components, but each PC will contain a different amount of information. In general, maybe four to eight of the forty Principal Components will contain most (70% or more) of the information contained in the entire set of PC’s. And these four to eight PC’s can be used in regression modeling just like any other “continuous” variable: Recency, Frequency, Monetary Value, Income, Age, etc.

Obviously it takes some time to transform raw product purchase data into Principal Components that can be used in scoring models, and the scoring procedures will become more difficult and so on. So, the question is, is it worth the extra effort to convert product purchase data into Principal Components?

To help answer this question we’ll look at some recent modeling results and you can decide for yourself. The problem was to predict the lifetime value of different customer groups based on all available customer data, including product purchase data.

As described above, we isolated and modeled just the product purchase data three (3) ways: (1) Using simple Dummy Variables, (2) Using Counts of the number of times each product line was purchased, and (3) Principal Components.

The quick answer is that the model built on just Dummy Variables had an R-Squared of 11% (the model explained 11% of the variation in lifetime value), the Count Approach had an R-Squared of 41% and the Principal Components method produced and R-Squared of 53%.

In addition, looking at a Decile Analysis of the Residual Errors (Table 1 below) produced by each approach argues for the Principal Components method over the Counting method, and most important, the use of simple Dummy Variables is shown not to be very effective in this type of application.

Table 1

Average Error In Each Decile For Three Modeling Techniques

Working With Tricky Segments

April 5th, 2011 by: DSA


In a previous post I suggested that modelers could improve their results by splitting
their datasets according to some critically important variable, such as Tenure (the length of time a customer has been on the file) and then build separate models for each major segment.

The argument being that it is intuitive that the usual set of modeling suspects (Recency, Frequency, Monetary Value, Products Purchased, Source and the whole set of Demographic Variables) will display different relationships with Response or Sales, depending upon the Tenure Segment, and that just adding Tenure as a variable, without taking interactions into account, isn’t sufficient to capture the full effect of this variable.

As if this isn’t complicated enough, I came across an article that questioned fundamental direct marketing beliefs, including the belief that there is a strong positive relationship between customer lifetime and profitability in a non-contractual relationship. In other words, they think that direct marketers think that customers that kind of hang around a long time, buying every once in a while, are profitable and every effort should be made to enhance the relationship between buyer and seller.

Of course, direct marketers who have looked closely at the data know that the costs of servicing infrequent buyers may indeed exceed the margins they yield; and the authors discovered for themselves that the simple relationship between lifetime months on file and lifetime profits is relatively weak (r = about .2 for the two groups studied).

What I did find interesting and potentially actionable was that they could divide a
significant number of customers into four meaningful groups:(Some 9000 households were studied over a three-year period. The households were correctly split into two cohort groups, January and February starters.)

Segment 1. Those that had relatively Long Active Lives and High Lifetime Revenue

Segment 2. Those that had relatively Long Active Lives and Low Lifetime Revenue

Segment 3. Those that had relatively Short Active Lives and High Lifetime Revenue.

Segment 4. Those that had relatively Short Active Lives and Low Lifetime Revenue.

The Graph below indicates that customers in Segments 1 and 3 kind of look alike, behave in a similar fashion, over their first 12 months and then begin to separate over time. No doubt that this is true, the operable question is can this disparity be predicted, and predicted early enough in customer’s life so that corrective action taken be taken.

The argument is that simple RFM analyses will miss this phenomena, and that database marketers, as a consequence of their not understanding that their database consists of these segments, will overspend on the Short Life-High Revenue segment, before traditional RFM analysis will depress mailings to this segment.

So, the key question for marketers is, if this effect is widespread — if there really are customers that come in for a short while, buy a lot and then leave — can they be detected? Will modeling Tenure Segments, as suggest above, and in last month’s article capture this effect.  Probably not, at least not by itself. What might work is a Principal Component Analysis of the available purchase behavior data over the last six months.

This approach might discern either a trend in dollars spent, or a trend in the particular products purchased that would indicate that the customer was displaying a pattern associated with customers that buy heavily for a short while and then switch to someone else – for reasons we can only speculate about.


Rss Feed Tweeter button Facebook button Technorati button Reddit button Myspace button Linkedin button Webonews button Delicious button Digg button Flickr button Stumbleupon button Newsvine button Youtube button