The web is a huge source of information. It stores facts, thoughts, feelings and intentions of people. It also records what people like and what they don’t in an indirect way – something that we are going to be looking at shortly. Some of the examples on harnessing this information were shown previously in this blog, such as :
- Extraction of user opinions, beliefs and values from Twitter
- Prediction of popular stories on Digg
- Prediction of popular Tweets
Consider the following snapshot from a BBC webpage :
The table above shows a representation of the most popular business stories on BBC on the 22nd December 2009. Even though we do not have specific metrics, we intuitively understand that the order with which the stories are listed also tell us the popularity of each post. Notice that the first post on the most-read stories talks about the British economy while the last one is a title regarding football …
The web is a huge source of information. It stores facts, thoughts, feelings and intentions of people. It also records what people like and what they don’t in an indirect way – something that we are going to be looking at shortly. Some of the examples on harnessing this information were shown previously in this blog, such as :
- Extraction of user opinions, beliefs and values from Twitter
- Prediction of popular stories on Digg
- Prediction of popular Tweets
Consider the following snapshot from a BBC webpage :
The table above shows a representation of the most popular business stories on BBC on the 22nd December 2009. Even though we do not have specific metrics, we intuitively understand that the order with which the stories are listed also tell us the popularity of each post. Notice that the first post on the most-read stories talks about the British economy while the last one is a title regarding football.
This is knowledge that we can harness. No doubt it is a very specific kind of knowledge because it tells us only what – mostly – British readers of BBC have found interesting. In other words, this is knowledge for a specific population. Most likely, in another country – say, France – the title about UK being still in recession would not be so interesting, but a title about France being in the same situation would. Subject, Time and Location are all important parameters that need to be captured and taken into account.
Let’s consider the idea of creating a Knowledge Hub: This could be done by collecting massive amounts of information from social media, blogs, comments from forums and news titles (and their popularity). Techniques such as Information Extraction with concept annotation, Data and Text Mining could be used to extract knowledge by combining facts, incidents, opinions, intentions and emotions found from different sources.
I have been monitoring and collecting for the past three months news and forum posts generated from/for a specific country. The information collected is then annotated in such a way to extract concepts. This text annotation is matched with keywords of concepts, facts, incidents and intentions. Over the past month there has been a considerable amount of increase in negative economy sentiment, crime-related incidents and/or terms that communicate future social instability and uneasiness.
It is a very interesting fact that our behavior is recorded – up to a point – by the web. Again, the key is the way that we are able to organize this information into logical chunks and then use this representation to find possible insights.
2009 has been a year of big changes. Best wishes for a Happy and Prosperous New Year for everyone.