So let us see the steps required :
First Step : The first thing of course is to actually find the data : User forums where people talk about mobile phones and mobile companies is obviously the place to look and there are lots of those places. Perhaps the volume of the messages is not enough but usually the available information is more than enough. Special code can be written to extract text from posts but without loss of the nature of the posting. As an example, the fact that a post has generated 20 replies is considered valuable information. The more posted replies, the more sentiment exists and this information has to be taken into consideration.
Second Step : Deploy information extraction techniques to identify phrases of good or bad sentiment (and actually many other things) about Telecom keywords such as :
– Signal
– Customer Care
– Billing
….etc
The following screen capture shows an example which is in Greek but i will provide all necessary explanation – Please also note that this is a simplified version of the process :
Notice that on the right hand-side there are some bars that denote the type of keywords found : The first category is called “Characterization” and if it is checked (which on the above screen capture it is) the software will highlight posts that only have some kind of characterization, whether good or bad. Notice also the yellow bar which has the name “Network”. Because it is checked, words that are synonyms of “Network” are highlighted and indeed this is the case because
Signal = σήμα (in Greek) and
Flawless = άψογο
so the highlighted phrase άψογο σήμα means “flawless signal”, which is a good characterization for the signal of two particular telecom companies. Notice also a line under the “Features” tab which says that between positions 3425 to 3429 there is a mention about signal (“mentionsSignal = true”).
Again, i have to point out that this is a simplified version of the process. Text Mining and Information Extraction is actually very hard work but it is also very rewarding for those that ultimately deploy and use it. On the next post we will see the problems (and there are many of them) but also how this unstructured information is turned to “nuggets of gold”.