Over the last 5 posts I have described how unstructured text information from Twitter can be used for Knowledge Extraction. Specific examples were given such as Sentiment Mining for products (Amazon’s Kindle), Segmentation of Twitter users, and finally cluster analysis of the emotions and thoughts expressed from twitter users.
So far I have discussed some ways that text mining could help us in getting more insight on how people think. Now it is time to put Information Extraction and Ontologies to the equation.
Information Extraction (IE) is the automated extraction of any information such as (to name a few) Names (first names, city names, country names etc), facts or events from unstructured text. An example of IE was given in these posts where thousands of adverts of flats are extracted and then data mining analysis is performed to identify what characteristics are important for achieving a high renting price.
Ontologies are used for knowledge representation and may also be used for structuring the information that exists on the web…
Over the last 5 posts I have described how unstructured text information from Twitter can be used for Knowledge Extraction. Specific examples were given such as Sentiment Mining for products (Amazon’s Kindle), Segmentation of Twitter users, and finally cluster analysis of the emotions and thoughts expressed from twitter users.
So far I have discussed some ways that text mining could help us in getting more insight on how people think. Now it is time to put Information Extraction and Ontologies to the equation.
Information Extraction (IE) is the automated extraction of any information such as (to name a few) Names (first names, city names, country names etc), facts or events from unstructured text. An example of IE was given in these posts where thousands of adverts of flats are extracted and then data mining analysis is performed to identify what characteristics are important for achieving a high renting price.
Ontologies are used for knowledge representation and may also be used for structuring the information that exists on the web. To give an example, consider the following product keywords :
- Coke
- Sprite
- Dr Pepper
If one asks you what is common about them, your brain looks for generalizations and comes up with the following answers :
- They are all Carbonated Drinks
- (Possibly) they all contain sugar since the word “Diet” or “Zero” or “Light” is not mentioned.
Now let’s assume having an Ontology Engine that is able to do this and to be able to infer automatically that all these products are sugar-carbonated drinks. Such an action enables us to extract facts in a more coherent way. The reason behind this is that we lessen the effect discussed on The Statistics of Everyday Talk and thus are able to capture growing trends such as people expressing their thoughts regarding carbonated drinks rather than matching “Coke”, “Sprite” and “Dr Pepper” individually. Without Ontologies such a trend could be easily missed.
By using Ontologies or taxonomies where applicable, an associations discovery algorithm can search in different levels of information detail. For example data miners usually employ taxonomic information (ex. Sprite, Coke, Pepsi = carbonated drinks) when performing associations discovery analysis on Super Markets and the effort of applying taxonomies almost always pays back in terms of the knowledge extracted regarding consumer behavior.
I have used Ontologies over the past three years and have seen them in action. The fact that with Ontologies one could possibly have access to inference and deductive reasoning techniques is of great use. The application of Information Extraction, Natural Language Processing and subsequent insertion of this information in an Ontological setting has many potential applications.