Tom’s company, Anderson Analytics, is the leader in NGMR implementations and develops OdinText, a text-analytics software package designed for market researchers. I reached out to him in the course of researching topics for my next Sentiment Analysis Symposium, a business-focused conference that explores technologies and solutions that harvest attitudes, opinions, and emotions from online, social, and enterprise sources. Sentiment analysis, especially as implemented via text analytics, forms part of the NGMR toolkit. Tom plans to attend the May 8, 2013 symposium in New York, but even if you can’t make it, you can still benefit from Tom’s insights on research concerns and methods, as relayed in a Q&A I recently conducted with him, as follows.
Seth Grimes> Tom, what’s your assessment of Big Data as a research concern? Are conventional descriptions of Big Data useful, or is the emphasis skewed or misplaced? Are there types or sources of data we should ignore and others that are essential?
Tom H.C. Anderson> I recently blogged about how “Mid Data” is actually a more realistic and useful term that takes into account ROI. There are IT vendors out there currently selling text analytics solutions with the main value proposition being that they will somehow make your data more valuable by simply combining it with other data. Unfortunately it in reality it doesn’t work that way. You need to stop thinking Big Data and start thinking Smart Data.
If you’ve never gotten much value out of a particular data source, then adding it to other sources is not likely to make it any more useful. Companies need to think first about which of their individual sources of unstructured data is most important and how they can get more value out of that. Then and only then should you consider whether or not marrying it with any other data makes sense, in most cases (and I’m talking about analytics/insights now) it doesn’t.
Seth> What’s your view as a practitioner of the research value of new methods in neuroscience, of facial measurement in images for emotion detection from images, of speech analytics? If your company applies them, how do you use them?
THCA> I find some of them more interesting than others, but certainly keep my eyes on things to see where there is anything we can learn. I’m more interested in speech analytics as a way to economically transcribe audio for use in text analytics. I’m hopeful but haven’t seen anything being broadly adopted in that area yet.
Neuroscience contains so much, a lot of it unproven and more than a bit. Hokey in my opinion. That said, I do certainly believe that measuring emotion in text is worthwhile and it’s something we have been working with for several years now. It’s not the most important aspect of text analytics, but a part that has value none the less.
Seth> We’ve been talking about triangulation for a number of years, the combination of data from multiple sources, for instance, from attitudinal analysis (surveys and content), behavior tracking, and psychological and demographic profiling. Are we there yet?
TomHCA> I first advocated the idea of triangulation in individual text analytics projects at a software conference in 2005. But this was more in terms of using it within a single text analytics data source, and triangulating using more than one text analytics approach, and usually also with a human POV. I think our software, as well as understanding of how to leverage best leverage text analytics now has gotten better so that this triangulation approach is now less important than it was then.
But the triangulation you are talking about here, basically the other way around, adding multiple data sources together as I mentioned earlier, well I have not found that to be very effective. Usually it’s a bit of a boondoggle and net negative ROI.
Once in a while we’ve seen cases were marrying two related sources of data makes a lot of sense. Usually each has individual value and we have explored and understand each set well first, and then we look at them combined.
Where I’ve seen it make far less sense is in adding three or more data sources with weak connections. Social Media data, which is just Tweets and blog posts (mainly spam) if we’re honest, is one such source.
There’s lot’s of talk about combining social media data with other sources like consumer survey data and all sorts of other internal metrics. But in actuality the data is typically not worth much more when combined with other data then it is when separate.
Seth> What are the most important research insights that can be derived from attitudinal data — from opinions, emotions, and other forms of sentiment?
THCA> Basically any feedback that you typically get from human communication can benefit from text analytics. The “What” and “Why” questions are certainly important. What customers want, what your competitors are doing, how to improve satisfaction and increase purchasing, these are just a few of the things we regularly answer with OdinText.
Seth> Tom, thanks for participating in this Q&A. Last question, given the focus of my own work: Where should we head next with sentiment analysis? (Please note that this question doesn’t assume an analysis method, which could involve expert analysis, crowd-sourcing, or automated NLP.)
THCA> I’ve never viewed sentiment analysis as separate from text analytics, it’s just one part of it. I prefer to put the clients business objectives and data ahead of the technique. So where we take sentiment will depend on what is needed. We’re focusing more on how we can help clients best with the current accuracy of sentiment as I believe it’s really already good enough to do the job, and contrary to what some say I haven’t seen anyone improve is in any significant way without a lot of data specific customization.