There was a time that when you wanted a quick lunch, you told the cook behind the counter exactly what you wanted. ...'Easy on the onions, and why don't you slice one of those pickles real fine and put it in there, with just a dab of mayonnaise?'...Then we industrialized the process, and the people behind the counter at McDonalds don't even have to know about food or money: They just hit a code for the order on the register.
J. Stoors Hall uses that example in Beyond A.I. to introduce what he calls 'formalist float.' We formalize information for efficiency, either communication or logistics, and in the process we lose customized detail. That's the price we pay for the systems we build. We attempt to codify justice into laws, education into curricula, and information into ones and zeros.
In each of these examples, there's a size-able gap between what the system decrees and what the individual wants or needs, or is attempting to communicate.
I've been thinking about this formalist float as I write about computers struggling with human language. The 'real' world, with all of its complexity, cannot be rendered accurately in symbols. That's why we use the symbols in the first place: to generalize. In a sense, each word we use is as imprecise as that key the McDonald's worker punches for the quarter pounder. There's something I'm thinking, it's as unique to me as the sandwich with the pickle and the dab of mayonnaise. But you and I don't share a word for that exact thought. So I come as close as I can with our formal vocabulary, and then we use gestures, voice tone, context, and shared memories to narrow the gap between the formal and the individual case. It's that cultural negotiation that the computer cannot understand.
We need words to be inexact, because if they were too precise we'd each have a unique vocabulary of several billion words, all of them intelligible to every one else. (Maybe that's what animals have.) I'd have a unique word for the sip of coffee I just took at 6:59 on this fifth of July, which was flavored with the anxiety that I'd better get out on my bike before the day heats up. (That would be as useless to me as to everyone else. A word has to be used at least twice to have any purpose.)
If you think about it, each word is a lingua franca, a fragment of a clumsy common language we struggle with. Imagine I say that I'm 'weary.' I'm thinking one thing, and you might have a very different idea. Maybe I carried a load a long way in the sun. I may have a troubled child. I may have argued with my editor or spent fruitless hours trying to balance my checkbook. You certainly have different ideas, based on your own experience, about what 'weary' means. In addition to all different meanings, it might also send other signals to you. Maybe where you come from, it has a slightly rarefied feel, and you're wondering whether I'm signaling my sophistication. In any case, we don't know what each other is thinking. But that single word 'weary' extends a tiny bridge between us.
Now, with that bridge in place, the word shared, we dig deeper to see if we can agree on its meaning. You study my expression and my tone of voice. That communicates a lot. Someone who has won the Boston Marathon might look contentedly weary. Another, in a divorce hearing, looks anything but. I may slack my jaw in an exaggerated way, illustrating the word with a gesture, as if to say, 'Know what I mean?' In this tiny negotiation, we're bridging the formalist float. And closing that gap is the challenge for computers like IBM's Watson, the one I'm writing about.
As computers struggle to bridge the formalist float, millions of humans are making it even more difficult for them. We're distancing ourselves from formal structures. With shorthand and abbreviations in text messages, many of us are creating our own patois. Humans have done this forever. It's how Spanish, Portuguese, Italian and French all grew out of Latin. But technology is speeding it up. The meaning of a single emoticon--;...gt;)--evolves day by day, tribe by tribe.
Verbally, we're making it even harder. I hear conversations all the time in which people bypass the formal vocabulary altogether and rely entirely on sounds, gestures and tone. 'So I'm like uuuun, and she's like hhhmmm?' Characters in Jane Austin's novels would find words for these feelings, perhaps 'befuddled' and 'huffy.' Computers could look those words up and have at least an inkling of what we're talking about. They'll never bridge the formalist float entirely--our complexity cannot be reduced to ones and zeros. But eliminating words from our discourse makes their job even tougher.
Why computers can't figure out words
July 5, 2010 by Stephen Baker
1
Stephen Baker, author of The Numerati, is a journalist based in the the New York area with 20 years of experience at BusinessWeek.Read more...
Gary Cokins is an internationally recognized expert, speaker, and author in advanced cost management and enterprise performance management systems.Read more...
Jill Dyché is an internationally recognized author, speaker, and business consultant as well as partner and co-founder of Baseline Consulting which specializes in enterprise information design and deployment.Read more...
James Taylor is CEO and Principal Consultant at Decision Management Solutions and one of the leading experts in decision management and decisioning technologies.Read more...
Members, click here to find out what you need to know about our new site!
Predictive Analytics World in Washington, DC
When: Tue, 2010-10-19 08:00
- YOU
- Abhishek Tiwari
- Akin Arikan
- Alberto Roldan
- Alec Gardner
- Bob Gourley
- Brad Terrell
- Bruce Richardson
- Cari Birkner
- Carole-Ann Matignon
- Charles-Yves Baudet
- Chris Dixon
- Colleen Quinn
- Daniel Gent
- Daniel Tehan
- Daniel Tunkelang
- Darryl McDonald
- David Bakken
- David Bremstaller
- David M. Smith
- Dean Abbott
- Doug Lautzenheiser
- Eric Siegel
- Evan Levy
- Gary Cokins
- Geoff Dyer
- Gwen Thomas
- Jack Mason
- James MacLennan
- James Taylor
- Jason Burke
- Jill Dyché
- Jim Harris
- Jon Peck
- Jonathan Tebay
- John Jordan
- Karen López
- Kevin O'Marah
- Korhan Yunak
- Lyndsay Wise
- Mario Bonardo
- Mark Masterson
- Max Dama
- Melissa Dutmers
- Michael Fauscette
- Michael Wexler
- Michael Zeller
- Michele Goetz
- Niall O'Doherty
- Patti Anklam
- Paul Barrett
- Paul Barsch
- Paul O'Carroll
- Phil Fersht
- Phil Simon
- Peter Skomoroch
- Peter Thomas
- Rebecca Bucnis
- Rick Sherman
- Robert Segat
- Romakanta Irungbam
- Roman Stanek
- Ron Dimon
- Ron Swift
- Sandro Saitta
- Simon Doherty
- Stephen Baker
- Steve Bennett
- Steve Sarsfield
- Sundeep Kapur
- Themos Kalafatis
- Theodore Omtzigt
- Tim Manns
- Ted Cuzzillo
- Teradata Anz
- Teradata EMEA
- Teradata Experts
- Timo Elliott
- Tom H.C. Anderson
- Tony Bain
- Tracy Gumm














TheodoreOmtzigt said:
I love this article as it gave me a new appreciation of the difficulties to make Web 3.0 a reality. I just came out of a huge deep dive into the intelligence community SNA research and tool chains where I had made up my mind that automatic text comprehension had become better and more quantitative than a human being could ever be. Armed with your interpretation of human language inherent ambiguity I can now better place the semantic web as we have it: clustering a million documents and quantifying the relationships between hundreds of thousands of subjects mentioned in these documents is an act of information layering that can side step the ambiguity trap as repetition can resolve the ambiguity. Extracting the intent, poetry, or savagery of a single subject is much harder.
- reply
- 0 points
Wed, 2010-07-07 20:25 — Theodore OmtzigtPost new comment