Back in the late 90s, my uncle proudly pulled out his flip phone at a family reunion to show me– and whoever else would listen– the “future of tech.” He proceeded to shout very limited voice commands (“CALL…..BOB!”), which the phone could register but often got wrong (“Calling…Mom”).
Fast forward a few years, and I remember getting the RAD 4.0 robot for Christmas. The TV commercials made that toy seem like a perfect robot companion and servant, like Rosey from the Jetsons or Goddard from Jimmy Neutron. RAD could respond to voice commands, move autonomously (or with a remote control), and had robot arms with clamps to pick up your toys, laundry, soda cans, etc. It even came equipped with NERF-style bullet launchers on his chest for security measures! However, after testing it out around the house, I remember being a little underwhelmed with its efficiency. I wore myself out yelling repeated commands until it would respond with an action that was usually not exactly what I had commanded. Below you can see RAD’s design and its simplistic “speech tree chart” which outlines all the verbal cues it could (supposedly) respond to.
Even as a 10 year old kid, I understood that Natural Language Processing technology wasn’t yet advanced enough to accurately understand more than a handful of commands. But I was patient, and a few years later I came across the chatbot SmarterChild, who was developed by Colloquis (acquired by Microsoft in 2006) and released on early instant messaging platforms like AIM and MSN Messenger (Gabriel, 2018). While entirely text-based (not voice-activated), SmarterChild was able to play games, check the weather, look up facts, and conversate with users to an extent. One of its more compelling canned responses came if you asked about sleep:
This was about the same time that the movie i, Robot (Proyas, 2004) came out, which contained another (somewhat chilling) quip about robots dreaming and the future of artificial intelligence:
Detective Spooner: Robots don’t feel fear. They don’t feel anything. They don’t get hungry, they don’t sleep-
Sonny the Robot: I do. I have even had dreams.
Spooner: Human beings have dreams. Even dogs have dreams, but not you. You are just a machine; an imitation of life. Can a robot write a symphony? Can a robot turn a… canvas into a beautiful masterpiece?
Sonny: [with genuine interest] Can you?
Over the next decade, AI began to evolve at an unprecedented pace. Nowadays, Google Assistant has a much more complex algorithmic process (see below) for decoding language than my old friend RAD 4.0, and can provide much more natural and sophisticated interaction than SmarterChild.
These virtual assistant technologies haven’t been without hiccups in their integration, such as when I first got the updated version of iPhone with Siri included. I remember ordering at a Taco Bell drive thru while my phone was in the cupholder of the car. My order included a “quesarito” (pronounced “K-HEY-SIRI-TOE”), and when I got home I realized that Siri had “woken up” in the drive thru and was running searches on everything that was said on the car radio from the drive back. It’s incidents like these, and many other with far more sensitive or compromising information at stake, that have given people concerns about our virtual assistants always listening. But Apple has recognized these, and has gone to lengths to reduce such concerns, such as two pass detection, personalized “Hey Siri” trigger phrases, and cancellation signals for common pronunciation similarities, such as “Hey, seriously” (Siri Team, 2017).
Now, building off their popular devices such as the Echo and Alexa, Amazon is rolling out programs like Amazon Lex, where the general public can create their own conversational and text interfaces for their websites, apps, and other technologies (Barr, 2017). This is a huge step for the integration of AI, machine learning, and deep neural networks into the public sphere, making it accessible on a much wider scale than the computer scientists in Silicon Valley.
The big question that comes to mind, as always, is what’s next? Despite most of the above evidence being anecdotal, it does show a massive progression in the field of artificial intelligence over the past 20 years. Does the evolution of virtual assistant technologies continue to accelerate behind the rapid progress in fields like machine learning and natural language processing? Where does it end? Will we become too dependent on these technologies? If so, what if they fail? Will there eventually be a cultural backlash?
“Hey Siri, what will the future look like?”