b) Telenor and VIP Mobile are not found as frequently as MT:S in PostPaid package conversations.
c) We see several problems from insufficient pre-processing : Kredit and Kredita (=credit) should merge into one word, the same applies for telefona – telefon, internet – interneta and mts – mtsa.
Notice that we can perform the same High-level analysis for several Telco Topics such as Network, Billing, Customer Care, Promotions, Questions of subscribers and so on. The next task is to identify the reason(s) why MT:S was found to have more mentions about PostPaid packages. Note that at this point we do not know why this is so : It could be the fact that MT:S prices of prepaid packages are high, very cheap or something else is happening that needs to be identified.
The Serbian Language poses extra work because it is a highly inflected language : Even the ending of Brand names change according to the usage. Consider the following :
U mts-u (at mts)
Sa mts-om (With mts)
Bez mts-a (Without mts)
It is evident that a highly inflected language explodes our feature space and for this reason R can come to the rescue with some success. We can use R for changing several synonyms to one word, removing (Serbian) stop words, removing URLs and performing several other pre-processing steps that are necessary prior to an extensive analysis. More on the next post.