Know your customers - The Twitter way

The more i analyze tweets on Twitter, the more interesting i find the whole process. First it was clustering analysis of specific thoughts expressed from Twitter users and then it was Sentiment Mining for Amazon’s Kindle. It was just a matter of time from having the urge to analyze Tweets on a broader perspective.

So i decided to perform a segmentation of the Twitter users : extract common groups of users but this time not for specific thoughts or specific products but a segmentation based on a more generic basis.

I had two goals in this clustering analysis :

1) Cluster the biographies of users
2) Cluster the tweets of the users.

I then decided that the more information i could collect the better, so the first thing i did was to make a ‘spider’ program to extract 10,000 twitter user names. Then for each twitter user the software visits his/her page and extracts :

a) The user’s bio
b) Number of followers
c) Number of people following
d) Number of updates
e) 20 latest Tweets
f) Number of re-tweets
g) Number of replies to other users (ex when @user directive exists)

Let’s see now what we could -potentially- do with such information :

1) Clustering analysis on user bios

2) Clustering analysis on u…

I had two goals in this clustering analysis :

1) Cluster the biographies of users
2) Cluster the tweets of the users.

Let’s see now what we could -potentially- do with such information :

1) Clustering analysis on user bios

2) Clustering analysis on user tweets

3) Classification analysis for identifying the common characteristics of users with many followers

4) Associations discovery between products : Which products tend to be mentioned together in each user’s tweets?

5) Identification of common keywords per cluster : If we identify a cluster of users that we characterize as the “Parents”, what keywords do “Parents” tend to use more? What about the “Tech junkies” cluster?

But let’s start with the first analysis : Clustering the biographies of Twitterers. The analysis generated 30 clusters of users. Some of them are :

1) The Parents
2) The computer Geeks
3) The students
4) The social media addicts
5) The entrepreneurs

I looked at the “Parents” cluster more closely and wanted to find keywords that this cluster is associated with : Single and Jesus where some of them.

So we immediately identify one of the many customer groups : The parents, of which a significant percentage of them are single. The “Parents” cluster also expresses one of its values : Christianity.

By moving on to each generated cluster and finding the associated keywords, i was able to retrieve the values and beliefs of each cluster. Knowledge Extraction at its best…

Link to original post