By Subbarao Kambhampati, PhD
Professor of computer science at Arizona State University and chief AI officer for AI Foundation, which focuses on the responsible development of AI technologies
Indisputably, computers in their myriad forms helped improve our lives in the last century, and especially in the past decade. Much of our interaction with computers, however, has long been stilted and unnatural.
The means of natural interaction we evolved for human communication generally were not of much use in dealing with computers. We had to enter their “land” to get our work done — be it typing, clicking buttons or editing spreadsheets. While our productivity increased, so did the time we spend in these unnatural modes of interaction. Communicating with computers sometimes is such a soul-draining activity that, over time, we even created special classes of computer data-entry positions.
Thanks to recent strides in artificial intelligence (AI) — especially in perceptual intelligence — this is going to change drastically in coming years, with computers entering our “land,” instead of the other way around. They will be able to hear us, to speak back to us, to see us and to show us back. In an ironic twist, these "advanced" capabilities finally will allow us to be ourselves, and to have computers deal with us in modes of interaction that are natural to us.
We won't need to type to them or to speak in stilted, halting voices. This will make computer assistants and decision-support systems infinitely more human-friendly — as witnessed by the increasing popularity of "smart speakers." As computers enter the land of humans, we might even reclaim some of our lost arts, such as cursive script, since it will become as easy for computers to recognize handwriting as it is for humans.
Granted, the current recognition technology still has many limitations — but the pace of improvement has been phenomenal. Despite having done an undergraduate thesis on speech recognition, I have scrupulously avoided most all the dictation/transcription technologies. Recently, however, the strides in voice transcription have been quite remarkable — even for someone with my accent. In fact, I used Pixel 4 Recorder to transcribe my thoughts for this article!
Beyond the obvious advantages of easy communication with computer assistants, their entry into our land has other important benefits.
For a long time now, computers have foisted a forced homogenization among the cultures and languages of the world. Whatever your mother tongue, you had to master some pidgin English to enter the land of computers. In the years to come, however, computers can unify us in all our diversity, without forcing us to lose our individuality. We can expect to see a time when two people can speak in their respective mother tongues and understand each other, thanks to real-time AI transcription technology that rivals the mythical Babel Fish from “The Hitchhiker’s Guide to the Galaxy." Some baby steps towards this goal are already being taken. I have a WeChat account to keep in touch with friends from China; they all communicate in Chinese, and I still get a small percentage of their communications thanks to the "translate" button.
Seeing and hearing the world as we do will allow computers to take part in many other quotidian aspects of our lives beyond human-machine communication. While self-driving cars still may not be here this coming decade, we certainly will have much more intelligent cars that see the road and the obstacles, hear and interpret sounds and directions, the way we do, and thus provide much better assistance to us in driving. Similarly, physicians will have access to intelligent diagnostic technology that can see and hear the way they themselves do, thus making their jobs much easier and less time-consuming (and giving them more time for interaction with patients!).
Of course, to get computers to go beyond recognition and see the world the way we do, we still have some hard AI problems to solve — including giving computers the “common sense” that we humans share, and the ability to model the mental states of those humans who are in the loop. The current pace of progress makes me optimistic that we will make important breakthroughs on these problems within this decade.
There is, of course, a flip side. Until now it was fairly easy for us to figure out whether we are interacting with a person or a computer, be it the stilted prose or robotic voice of the latter. As computers enter our “land” with natural interaction modalities, they can have significant impact on our perception of reality and human relations. As a species, we already are acutely susceptible to the sin of anthropomorphization. Computer scientist and MIT professor Joseph Weizenbaum is said to have shut down his Eliza chatbot when he was concerned that the office secretaries were typing their hearts out to it. Already, modern chatbots — such as Woebot — are rushing onto the ground where Weizenbaum feared to tread.
Imagine the possibilities when our AI-enabled assistants don't rely on us typing but, instead, can hear, see and talk back to us.
There also are the myriad possibilities of synthetic reality. In order to give us some ability to tell whether we are interacting with a computer or the reality it generated, there are calls to have AI assistants voluntarily identify themselves as such when interacting with humans — ironic, considering all of the technological steps we took to get the computers into our land in the first place.
Thanks to the internet of things (IoT) and 5G communication technologies, computers that hear and see the world the way we do can also be weaponized to provide surveillance at scale. Surveillance in the past required significant human power. With improved perceptual recognition capabilities, computers can provide massive surveillance capabilities — without requiring much human power.
It’s instructive remember a crucial difference between computers and humans: When we learn a skill, there is no easy way to instantly transfer it to others — we don’t have USB connectors to our brains. In contrast, computers do, and thus when they enter our land, they enter all at once.
Even an innocuous smart speaker in our home can invade our privacy. This alarming trend is already seen in some countries such as China, where the idea of privacy in the public sphere is becoming increasingly quaint. Countering this trend will require significant vigilance and regulatory oversight from civil society.
After a century of toiling in the land of computers, we finally will have them come to our land, on our terms. If language is the soul of a culture, our computers will start having first glimpses of our human culture. The coming decade will be a test of how we will balance the many positive impacts of this capability on productivity and quality of life with its harmful or weaponized aspects.
More about: #Technology