Text analytics was one of those things I heard about every so often. Like so many terms in this business, the term comes out of a speaker’s mouth or PR person’s press release only to blow away. There’s no story, no context, nothing to chew on.
Then came a press release at BI This Week with a rare combination: surprise and concreteness. It said text analytics would help with food safety. I’m all for food, but I had no idea what text analytics had …
Text analytics was one of those things I heard about every so often. Like so many terms in this business, the term comes out of a speaker’s mouth or PR person’s press release only to blow away. There’s no story, no context, nothing to chew on.
Then came a press release at BI This Week with a rare combination: surprise and concreteness. It said text analytics would help with food safety. I’m all for food, but I had no idea what text analytics had to do with it.
I emailed UK-based Linguamatics, publisher of the nifty tool they call I2E. What’s this I hear about food? Product manager Phil Hastings, ready to call it a day in Croatia, called to explain the features to me, barely post-breakfast and not fully verbal. I2E was indeed a powerful little thing, but I still didn’t get the food angle.
It wasn’t until I got William Hayes on the phone that things started making sense. He’s director of library and literature informatics at pharmaceutical research company Biogen Idec. They don’t do food, but close enough.
If you think the Sunday New York Times is enough for one day, consider what the research community has to bear. Hayes says, “If you’ve got 20 million articles to read, where do you start?’
“The research industry works under a tougher knowledge model than terrorist intelligence gathering,” says Hayes. “Our ability to tap that ocean of literature is like dropping a line into the ocean for fish.”
In general, a scientist can read 150 to 200 full text journal articles a year, he explains. A curator can review about 100 abstracts a day “for a few days before you start going nuts.” Text mining is the only way to keep up with the ocean of literature produced each year.
The food industry fries potatoes, but it also has to keep a lookout on research.
TNO information analyst Fred van de Brug told me the acrylamide story: Most people in the food industry missed the first warning. Scientists had published a discovery in 2000 about a carcinogen known as acrylamide, which can develop in starch-rich foods like potatoes as they are fried. By the time the danger finally hit the public media in 2002, millions of people had been exposed unnecessarily. Text mining would have helped.
I2E is more agile than standard text mining. You can learn to use it in a few hours. Hayes told me, “If you can remember bits of grammar and have some concept of what you’re researching, it’s a piece of cake.”
It’s a story in progress for BI This Week.