In the previous post we have seen an example of analyzing messages sent from citizens regarding a new taxation plan. We identified some correlations between keywords and concepts but there are more ways to gain knowledge from such unstructured information.
By using Cluster Analysis we can extract groups of similar concepts among thousands of comments written by citizens but also presenting an order within them. Let’s assume that Cluster Analysis reveals the following clusters (or similar concepts) within submitted messages :
- battling tax fraud
- requests for a fair tax plan
- requests for less taxation for large families
- various incentives for citizens
Our problem is finding the order of importance that people place on the various concept categories shown above : Is battling tax fraud considered more important (=discussed more frequently by citizens) than requesting a fair tax plan? How about taxation for larger families?
A cluster analysis can reveal to us the size of each cluster and -as a consequence- how important each cluster is :
We make the assumption that in the text representation shown above …
In the previous post we have seen an example of analyzing messages sent from
citizens regarding a new taxation plan. We identified some correlations
between keywords and concepts but there are more ways to gain knowledge
from such unstructured information.
citizens regarding a new taxation plan. We identified some correlations
between keywords and concepts but there are more ways to gain knowledge
from such unstructured information.
By using Cluster
Analysis we can extract groups of similar concepts among thousands of
comments written by citizens but also presenting an order within them.
Let’s assume that Cluster Analysis reveals the following clusters (or
similar concepts) within submitted messages :
Analysis we can extract groups of similar concepts among thousands of
comments written by citizens but also presenting an order within them.
Let’s assume that Cluster Analysis reveals the following clusters (or
similar concepts) within submitted messages :
- battling tax fraud
- requests for a fair tax plan
- requests for less taxation for large families
- various incentives for citizens
Our problem is finding the order of
importance that people place on the various concept categories shown
above : Is battling tax fraud considered more important (=discussed more
frequently by citizens) than requesting a fair tax plan? How about
taxation for larger families?
importance that people place on the various concept categories shown
above : Is battling tax fraud considered more important (=discussed more
frequently by citizens) than requesting a fair tax plan? How about
taxation for larger families?
A cluster analysis can reveal to us the
size of each cluster and -as a consequence- how important each cluster
is :
size of each cluster and -as a consequence- how important each cluster
is :
We make the assumption that in the
text representation shown above Cluster 5 (which contains 329 citizen
messages) is about requests for a fair tax plan and Cluster 10 contains
messages with requests that tax fraud should be minimized. It appears
that significantly less people are concerned with a battle against
fraudulent activity but they request -more immediate- benefits through a
fair tax plan.
text representation shown above Cluster 5 (which contains 329 citizen
messages) is about requests for a fair tax plan and Cluster 10 contains
messages with requests that tax fraud should be minimized. It appears
that significantly less people are concerned with a battle against
fraudulent activity but they request -more immediate- benefits through a
fair tax plan.
Collecting and analyzing information found
in blogs and forum entries is another area of analysis that could prove
very interesting. Let’s see an example with the Political / Social /
Economic situation in Greece : The goal is to identify and extract
trends and co-occurences of key concepts from blog titles and forum
posts such as :
in blogs and forum entries is another area of analysis that could prove
very interesting. Let’s see an example with the Political / Social /
Economic situation in Greece : The goal is to identify and extract
trends and co-occurences of key concepts from blog titles and forum
posts such as :
- Names of major Political parties
- Names of Politicians
- Economy (words/phrases such as “austerity
plan”) - Negative
characterizations - Company
Names…etc
For
this kind of data several applications can emerge. We could track
specific concepts through time and see their trends. We can also
identify which concepts are discussed together. As an example we could
identify the reasons on why Giorgos Papandreou (PM of Greece) is
characterized in a negative way in blog posts. (= what other concepts are
found in Blog posts containing keywords ‘Giorgos Papandreou’ AND Bad
Characterizations?) :
this kind of data several applications can emerge. We could track
specific concepts through time and see their trends. We can also
identify which concepts are discussed together. As an example we could
identify the reasons on why Giorgos Papandreou (PM of Greece) is
characterized in a negative way in blog posts. (= what other concepts are
found in Blog posts containing keywords ‘Giorgos Papandreou’ AND Bad
Characterizations?) :
(Note : PASOK = Governmental Political Party )
Politics
= 120
= 120
Economy=72
Economy, Politics=40
PASOK=24
Politics, PASOK, Referendum=8
Economy, Politics,PASOK,Referendum, Immigrants=8
Economy, Politics, Society=8
Society, PASOK=4
In other words : Giorgos Papandreou is
criticized mainly for his Political decisions and the Economy followed
by criticism on PASOK. Negative sentiment also exists because of the
fact that a percentage of Greek citizens require that a referendum
should take place concerning the latest decision of the Greek government
to give to a large proportion of Immigrants the Greek citizenship.
criticized mainly for his Political decisions and the Economy followed
by criticism on PASOK. Negative sentiment also exists because of the
fact that a percentage of Greek citizens require that a referendum
should take place concerning the latest decision of the Greek government
to give to a large proportion of Immigrants the Greek citizenship.