Transparent Text Symposium: Day 2 - SmartData Collective

Given how intense yesterday was at the Transparent Text symposium, I couldn’t imagine that today would match it. But it did!

The morning kicked off with a series of 18 lightning talks in 90 minutes – that was 5 minutes apiece, with a ruthless gong for anyone who went overtime. The presentations were consistently intense, and I had the misfortune to follow one of the best talks – a very passionate presentation about crowd-sourced translation by IBM’s Uyi Stewart. Other notable presenters included design ninja Alexis Lloyd from the New York Times R&D Lab, Karrie Karahalios from the University of Illinois talking about the experimental WeMeddle Twitter client, MIT Media Lab professor and Berkman Fellow Judith Donath showing a stunning gallery of “data portraits,” and Dragon Systems co-founder Janet Baker explaining how the brain recognizes speech – with an skull as a prop! The session was incredible, and I hope other conferences adopt this model.

Given how intense yesterday was at the Transparent Text symposium, I couldn’t imagine that today would match it. But it did!

After the coffee break, there was a session on Text Analysis in the Large, featuring Dan Gruhl (IBM), Gary King (Harvard), and David Ferrucci (IBM). Dan Gruhl talked about web-scale text analysis–a topic up his alley, considering his role in architecting the IBM WebFountain project. Gary King gave a fascinating talk about using ensemble methods to improve on existing clustering methods – the idea is to synthesize a collection of derived clusterings and place them in an explorable metric space. You can read the full paper here. But the winner for this session was definitely David Ferrucci, who described the work IBM Research is doing to develop a machine Jeopardy player. He spent much of the talk building a case for the difficulty of the problem – and then delivered the punchline: In less then three years of research, they’ve developed a machine player whose performance is comparable to that of jeopardy winners. Hopefully they’ll be competing on live television by next year!

After lunch, there was a session on Investigation, featuring MAPLight Research Director Emily Calhoun, UC Berkeley law professor Kevin Quinn, and Guardian news editor Simon Rogers. Emily Calhoun showed how MAPLight illuminates the connections between money and politics – it was great seeing data to correlate who supports and opposes bills with the associated campaign contributions from interest groups. Kevin Quinn’s presentation was a bit more technical, but his work reminds me a lot of Miles Efron’s work on estimating political orientation in web documents – but Quinn’s work is more general and goes beyond co-citation analysis to analyze the actual language of the documents. Great application of topic modeling! But my favorite presentation in this session was the one from Simon Rogers: he told the story of how the Guardian successfully crowd-sourced a project to investigate the expenses of UK Parliament members.

The final session was a panel discussion about how visualization might elevate or advance the debate over health care policy. The panelists were Ben Fry, Marti Hearst, Gary King, and Simon Rogers; Fernanda Viégas and Martin Wattenberg moderated. Unfortunately, the overwhelming sentiment from the panel was pessimism that anything we could do might actually lead to improved outcomes. Nonetheless, it’s clear that a lot of people are going to try.

Again, I want to thank Fernanda, Martin, Irene Greif, and everyone at IBM for organizing this fantastic event – and for inviting me to attend! I am impressed that anyone could manage to assemble such an impressive set of speakers in one place, and I appreciate the effort that everyone put into making the past two days so worthwhile. I look forward to seeing the videos available online, and I hope those who weren’t able to attend take the opportunity to watch some of them. I also encourage you to check out the live Twitter stream at #tt09 while it’s still available.

Link to original post

More Read

Norbert Fuhr’s Probability Ranking Principle for Interactive Information Retrieval

Data quality webcast next week

Opt Out

Is the Cloud Secure Enough for the Financial Industry?

The future of cyber security