So i decided to perform a segmentation of the Twitter users : extract common groups of users but this time not for specific thoughts or specific products but a segmentation based on a more generic basis.
I had two goals in this clustering analysis :
1) Cluster the biographies of users
2) Cluster the tweets of the users.
I then decided that the more information i could collect the better, so the first thing i did was to make a ‘spider’ program to extract 10,000 twitter user names. Then for each twitter user the software visits his/her page and extracts :
a) The user’s bio
b) Number of followers
c) Number of people following
d) Number of updates
e) 20 latest Tweets
f) Number of re-tweets
g) Number of replies to other users (ex when @user directive exists)
Let’s see now what we could -potentially- do with such information :
1) Clustering analysis on user bios
2) Clustering analysis on u…
So i decided to perform a segmentation of the Twitter users : extract common groups of users but this time not for specific thoughts or specific products but a segmentation based on a more generic basis.
I had two goals in this clustering analysis :
1) Cluster the biographies of users
2) Cluster the tweets of the users.
I then decided that the more information i could collect the better, so the first thing i did was to make a ‘spider’ program to extract 10,000 twitter user names. Then for each twitter user the software visits his/her page and extracts :
a) The user’s bio
b) Number of followers
c) Number of people following
d) Number of updates
e) 20 latest Tweets
f) Number of re-tweets
g) Number of replies to other users (ex when @user directive exists)
Let’s see now what we could -potentially- do with such information :
1) Clustering analysis on user bios
2) Clustering analysis on user tweets
3) Classification analysis for identifying the common characteristics of users with many followers
4) Associations discovery between products : Which products tend to be mentioned together in each user’s tweets?
5) Identification of common keywords per cluster : If we identify a cluster of users that we characterize as the “Parents”, what keywords do “Parents” tend to use more? What about the “Tech junkies” cluster?
But let’s start with the first analysis : Clustering the biographies of Twitterers. The analysis generated 30 clusters of users. Some of them are :
1) The Parents
2) The computer Geeks
3) The students
4) The social media addicts
5) The entrepreneurs
I looked at the “Parents” cluster more closely and wanted to find keywords that this cluster is associated with : Single and Jesus where some of them.
So we immediately identify one of the many customer groups : The parents, of which a significant percentage of them are single. The “Parents” cluster also expresses one of its values : Christianity.
By moving on to each generated cluster and finding the associated keywords, i was able to retrieve the values and beliefs of each cluster. Knowledge Extraction at its best…