Two Step Cluster - Customer Segmentation in Telecom

I love Cluster Analysis because unlike a lot of other techniques, I don’t have to make any assumptions about the underlying distribution of the data. Though there are a few assumptions for best performance, it’s perfectly okay to cluster data that may not meet these assumptions. Only the business requirements/goals can determine whether the clusters/segments are useful or the solution is satisfactory.

Customer Segmentation is the process of splitting a customer database into distinct, meaningful, and homogenous groups based on specific parameters or attributes. At a macro level, the main objective for customer segmentation is to understand the customer base, monitor and understand changes over time, and to support critical strategies and functions such as CRM, Loyalty programs, and product development.

At a micro level, the goal is to support specific campaigns, commercial policies, cross-selling & up-selling activities, and analyze/manage churn & loyalty

SPSS has three different procedures that can be used to cluster data: hierarchical cluster analysis, k-means cluster, and two-step cluster. The two-step cluster is appropriate for large datasets or datasets that have a mixture of continuous and categorical variables. It requires only one pass of data (which is important for very large data files).

The first step – Formation of Preclusters
Preclusters are just clusters of the original cases that are used in place of the raw data to reduce the size of the matrix that contains distances between all possible pairs of cases. When preclustering is complete, all cases in the same precluster are treated as a single entity. The size of the distance matrix is no longer dependent on the number of cases but on the number of preclusters. These preclusters are then used in hierarchical clustering.

The second step – Hierarchical Clustering of Preclusters
In the second step, the standard hierarchical clustering algorithm is used on the preclusters.

The dataset I am going to use has information on 75 attributes for more than 70,000 customers. Product/service usage variables for all customers in the dataset are averages calculated over a period of four months.

In SPSS Clementine, the Data Audit available under the Output nodes palette gives the basic/descriptive statistics (mean, min, max…) and the quality (outliers, missing values…) of the variables.

Out of the 75 variables in the dataset, I used about 15 original variables and 3 new derived variables after considering their quality and business relevance. These selected variables were a combination of demographic, billing, and usage information.

The two-step cluster analysis produced 3 clusters. A very interesting difference was observed between Clusters 1 and 2.

Customers in Cluster 2 display the following characteristics:
– few of them are married
– few of them have children
– few of them have a credit card
– owns the most expensive mobile set

– maximum # of incoming & outgoing calls
– maximum # of roaming calls
– maximum MOU (minutes of usage)
– maximum # of active subscriptions
– maximum recurring charge (or, subscribes to the most expensive calling plan)
– maximum revenue

– maximum # of calls to customer care
– has the largest proportion of customers with low credit rating

Customers in Cluster 1 display characteristics that were exactly the opposite in ALMOST all of the areas mentioned above. So we have these customers who are married with children, posses a credit card, own a cheap mobile set, subscribe to the least expensive calling plan, make the minimum # of calls (incoming, outgoing, roaming & customer care), and has the highest credit rating.

Customers in Cluster 3 follow the middle path (in almost all the attributes) and offered no interesting or meaningful insights.

So what can be the business application of this exercise?
To put it simply, cluster analysis has thrown up two very distinct groups of customers – highly profitable but high risk customers in Cluster 2, and low profitable and low risk customers in Cluster 1.

For the highly profitable but high risk customers, one or more of the following actions can be implemented:
– Enhance credit risk monitoring
– Establish stringent usage thresholds
– Educate customers about alternative payment options, or make CC a mandatory payment method
– Migrate to pre-paid plans

For the low profitable and low risk customers, usage stimulation campaigns can be attempted with or without further segmentation.

This is one of the most basic examples of customer segmentation. If we consider traffic analysis information by taking ratios of certain call/service usage parameters, we can identify customer groups who have increased or decreased their usage. If we consider customer tenure, we can have an understanding of customer loyalty. Accordingly, specific actions can be taken for these groups.