Our expectations of the future have evolved from the language interpretation and etiquette services of “C3PO” in Star Wars, to the emotional companionship of “Samantha” in Her. While C3PO imitated human form, Samantha has no form at all. While C3PO followed instructions (with some comic banter thrown in), Samantha can perceive the environment, sense the emotions within, and have discussions with humans, just like a human being.
Emotional intelligence of computers may be dedicated to the realm of science fiction for some time to come, but the ability for them to interact with humans is here and now. Conversational interaction through voice is one such area that is seeing rapid adoption as a form of leveraging computers. Computer and human voice interaction is one of those rare technologies that both Baby Boomers and Millennials are attracted to. The aging Boomers find voice much easier than computer screen interactions, while Millennials have little love for the interaction preferences of their parents.
There are several advances that have made natural interaction with computers possible — the first of which is voice recognition. While Google and Apple (Siri) made voice interaction cool, Amazon Alexa (with its far field recognition) changed the game in conversational interactions. You are no longer tethered to your computer or phone. Noise reduction, echo cancellation, and its ability to distinguish between the relevant voice and ambient sounds means that one can interact with Alexa from across the room. The second technology-related advance is the ability to flawlessly match the signal captured to a word in a language, while also accounting for accents and dialects. While you and I could pronounce the word “orange” differently, the computer would still capture it as the same word.
Once the voice has been recognized, it has to be interpreted, which is the realm of natural language processing. The system needs to understand the instruction being given — at first literally, and then contextually. Significant advances have been made recently in this area through the field of Deep Learning. While constructs such as machine learning and neural networks have been around since the 1960’s, the computational power of the cloud, and the ability to give systems enough data to draw meaningful correlations and conclusions, has exploded over the past few years.
Deep Learning makes it possible to train the system to understand the intent of the speaker and respond to it. Systems are now able to understand higher level intent of a request and fill in implicit information on its own, without needing step by step instructions to deconstruct the request. A number of the platforms are masking these complexities and making simple, high-level developmental tools available to developers to leverage this power.
Customer-oriented Digital Assistants (like Alexa) are taking most of the limelight. But for enterprises to leverage the true power of conversational interactions, we have to go a little towards Skynet territory. No, we don’t want systems to take over and run the planet for us. But for accomplishing any meaningful task beyond searching the web or playing music, the conversational interaction paradigm has to be inserted in applications at the Application Programming Interface (API) layer. Smart applications should be able to interact with one another through APIs. This would include interactions like exchanging data, providing answers to queries, executing tasks requested by other authorized applications, providing the results on the completion of the task, initiating requests with other applications if needed, and more. They should accomplish this without the need for an individual to interact with multiple systems. This has a significant positive impact on productivity of our workers and customers. The savings from reduced learning time, interaction time, computation time, and even licensing costs can be significant.