First, the data used can be summarized with the following table :
You can immediately see problems in the ranges of the data used especially on the number of “followers” and “following”. This is something to be expected since among the users captured were Jack Dorsey (founder of Twitter), Sen. McCain and George Stephanopoulos – users that obviously have a huge amount of followers.
Before finding which usage behavior attracts many followers, one should be able to identify what exactly is a “popular twitter account”. Is it just the absolute number of followers? Perhaps it could be equally important -or at least interesting- to also look at :
1) The followers/following ratio
2) The number of followers per day
First, the data used can be summarized with the following table :
You can immediately see problems in the ranges of the data used especially on the number of “followers” and “following”. This is something to be expected since among the users captured were Jack Dorsey (founder of Twitter), Sen. McCain and George Stephanopoulos – users that obviously have a huge amount of followers.
Before finding which usage behavior attracts many followers, one should be able to identify what exactly is a “popular twitter account”. Is it just the absolute number of followers? Perhaps it could be equally important -or at least interesting- to also look at :
1) The followers/following ratio
2) The number of followers per day
Some usage patterns that raise the chance of having a successful Twitter account are the following :
- Having a bio is an absolute must : 82.3% of unsuccessful Twitter accounts have their biography information missing.
- You should provide more than 3 links per 20 tweets and also more than 0.960 updates per day
- If you don’t want to provide more than 3 links per 20 tweets, then try to post more than 5.857 updates per day.
- Users that post more than 3 links per 20 tweets but post less than or equal to 0.960 updates per day, will need more than 222.5 days of usage to get an adequate amount of followers.
By using Feature Selection we are able to look also at the relevant importance of each parameter on achieving many followers : Here are the results of Feature Selection from using ChiSquare, GainRatio and InfoGain attribute evaluators.
=== Attribute selection 10 fold cross-validation (stratified), seed: 1 ===
average merit average rank attribute
362.743 +-10.419 1 +- 0 4 numberOfLinks
319.397 +-10.133 2.4 +- 0.49 6 hasBlankProfile?
311.661 +- 8.612 2.6 +- 0.49 7 updatesPerDay
192.525 +- 7.481 4.1 +- 0.3 3 retweetsNumber
178.236 +- 5.963 4.9 +- 0.3 1 elapsedDays
36.148 +- 3.579 6 +- 0 2 otherUsersTalk
17.843 +- 4.475 7 +- 0 5 questionsAsked
average merit average rank attribute
0.1 +- 0.003 1 +- 0 6 hasBlankProfile?
0.042 +- 0.001 2.4 +- 0.49 4 numberOfLinks
0.039 +- 0.002 3.2 +- 0.6 3 retweetsNumber
0.04 +- 0.004 3.4 +- 0.92 7 updatesPerDay
0.025 +- 0.001 5 +- 0 1 elapsedDays
0.011 +- 0.001 6 +- 0 2 otherUsersTalk
0.005 +- 0.001 7 +- 0 5 questionsAsked
average merit average rank attribute
0.082 +- 0.002 1 +- 0 4 numberOfLinks
0.074 +- 0.003 2.1 +- 0.3 6 hasBlankProfile?
0.071 +- 0.002 2.9 +- 0.3 7 updatesPerDay
0.044 +- 0.002 4.1 +- 0.3 3 retweetsNumber
0.041 +- 0.001 4.9 +- 0.3 1 elapsedDays
0.008 +- 0.001 6 +- 0 2 otherUsersTalk
0.004 +- 0.001 7 +- 0 5 questionsAsked
We see that all three attribute evaluators agree that the number of links provided on Tweets and whether the profile of the user is filled in are the two most important parameters in achieving many followers. Notice also that sending messages to other users (otherUsersTalk) and asking questions (questionsAsked) is not as important as one would expect.
The analysis shown above gives many insights but it does not take into account what the users say and how this affects the popularity of a Twitter account. Text Mining will try to give some answers for this question and also identify which keywords on Twitter profiles seem to be associated with many followers.