I’ve used Siri for 3ish weeks now and this morning, as I was “talking to” my iPhone, I was thinking about the way we interact with computing devices and how much that has changed in only a few years.
I’ve used Siri for 3ish weeks now and this morning, as I was “talking to” my iPhone, I was thinking about the way we interact with computing devices and how much that has changed in only a few years. I remember clearly my reaction to the scene in the 1986 Star Trek IV: The Voyage Home that’s shown in the pic, with Scotty trying to talk to the mouse on an Apple Mac and of course finding earth of the past so primitive as to need keyboards. To the crew of the Starship Enterprise talking to computers was simply the way you interacted with them. That was, of course, very far in the future and even though I’d seen it on the show for years, not really that believable at the time.
That was then and this is now, as they say. Over the last 100 years or so, as the general purpose computing device has evolved, the bulk of the time the user interface was some form of text, most often captured on a punch card. Over the last 40+ years we have moved through several advances in the trip from text to graphics and the pervasive use of the mouse. The last 10 or so years though, have seen a tremendous increase in the rate of change in the ways that users use computers. While you could argue that the technology was a little older than 10 years, the general use of touch, motion and sound / voice really has blossomed in that short time.
Touch (and more importantly multi-touch) as an interface gained popularity with Apple’s iPhone / iPod Touch / iPad, which only came on the scene a little over 4 years ago. Until then, there were no credible touch driven consumer devices in wide distribution. Motion is similar, with the initial developments in motion technology tied to the video gaming market. Nintendo Wii was the first generation of this technology, launching for the Christmas buying season of 2006. Microsoft provided the next generation of motion interfaces with it’s Kinect technology last year, which uses a natural user interface methodology. That technology will shortly move beyond the gaming world, as Microsoft released a non-commercial software development kit (SDK) a few months ago and is planning on releasing the commercial SDK shortly. This will allow 3rd party developers to write Kinect apps outside of the gaming space. Moving beyond the Xbox platform, Kinect will support the Windows OS in early 2012. I should add that Kinect also supports some basic voice commands.
Next we move to audio interfaces. Voice recognition itself isn’t new of course, but only in the last few years has the technology gotten to the point that it was reasonably reliable. The advances on the recognition side have been matched on the processing side with dramatic improvements in the ability to process speech and take subsequent action. IBM’s Watson and the famous Jeopardy showdown earlier this year is a great example of what can happen on the high end of computing. Now Watson is a great story and shows what potential voice really has, but Watson consists of 90 high end servers and hundreds of custom algorithms…makes for excellent TV but not something I plan on having in my home anytime soon. To me, the real innovation comes when I can carry Watson-like capabilities in my pocket (and at a price I could actually afford). That’s exactly what Siri is proving to be (and I think Scotty would be proud to talk into an iPhone). To be fair though, Siri isn’t perfect but I have to say it’s (or should I say she’s) pretty impressive. Just the other day I safely carried on an almost flawless SMS conversation for over an hour while driving using Siri and a headset. Siri reads the SMS “out loud” and I respond and send, all by voice. The only key stroke required is holding down the home button. Compare that to my attempts to voice dial with the 1st generation iPhone (which was only 2007) and we’ve come a long way in voice recognition (I say attempts BTW, because I was never able to successfully voice dial…which was very frustrating when the Apple commercial showed a person with a heavy British accent demonstrating the feature…although, maybe that was the problem).
The first observation about all of this change in user interface capabilities is related to change itself. Technology change tends to be exponential but as observers we tend to think of and predict change at it’s current pace, which leads us to think of linear change and underestimate the impact of change by quite a bit. This UI story is a great example of that. Text was the method for probably 80 years and the remaining four advances in 20 years, but the bulk of that advance happened in the last 4-5 years. Makes you wonder what might come out of this accelerated rate of change over the next few years.
I think the real advances come from combining touch / motion and voice into this new user interface paradigm. Siri starts to get to the point of action-reaction, which opens up all sorts of possibilities. Consider that this is all in a mobile package that is reasonably affordable and it’s even more impressive. The opportunity and most likely the next advance is to take the back end ability for the device to interpret what it’s instructed to find / do, and then take action; and add in motion and touch (which the iPhone certainly does already). Expanding this from device to the software and taking it to other form factors could prove interesting in the enterprise as well. I also didn’t yet mention organic user interface research which adds display technology to the human-computer interface but that also has some promising developments like displays that become the input device and/or have the capability to take on alternate shapes. Okay, I have to finish up now, Siri just reminded me I have a previous appointment.