The Quiet Rise of Conversational AI
If there’s one thing that the internet is good for, it’s gathering around and ridiculing major technology snafus.
Whether it’s an incomprehensible GPS or drones that shut down airports, catching technology bumping up against the limitations of its own design and programming is oddly satisfying for humans.
Consider the ever-growing case of “#Alexafails,” where users give commands to Amazon’s digital assistant, only to receive the most bizarre translations and feedback.
These are instances where “send mom’s fax” and “feed the baby” became:
At some point, product engineers and creators will jump in with a slightly sheepish message letting customers know they’re working on a fix or offering them a link to patch up the issue.
Of course, no one expects technology to be perfect — nor, indeed, to “get it right” every time. As users, we inherently understand that software — and the hardware interfaces we interact with programs on — are always in a state of development.
But while perfection may not be the goal, having technology be human is — especially when it comes to the devices we interact with. The recourse to most of these tech mishaps often follows with an implicit: “Humans wouldn’t do that.” And, it’s true — humans with an understanding of the native language and good auditory skills wouldn’t misinterpret commands like “feed the baby” and “send mom’s fax.”
When it comes to voice, our expectations about interaction change. Somehow, the mark of a futuristic, “intelligent” user interface is one that accurately mimics the way a human communicates.
With artificial intelligence (AI) and machine learning (ML) finally making their way to the consumer products sector, voice user interfaces (VUIs) are quickly paving the way for conversational AIs.
What Is a Voice User Interface?
Voice user interfaces employ artificial intelligence and ML.
Voice user interfaces already employ artificial intelligence and machine learning UI to execute commands and interact with users.
These technologies include automatic speech recognition, named entity recognition, and speech synthesis. In other words, VUIs have speech capabilities.
These abilities are then powered by the “brain,” which is a public or private cloud that the VUI relies on to process a user’s voice and speech elements during the interaction.
When people think of VUIs, they usually think of digital assistants like Alexa and Siri. Indeed, their use has become ubiquitous in a variety of everyday contexts such as while driving, in the home for “smart” devices, and in public spaces to learn about facts or to check the weather.
However, VUIs are not the devices that we use them on. They are the conversation flows that occur when you issue a command. So, devices such as:
- Wearables (like smartwatches or fitness trackers)
- Desktop computers
- IoT-connected devices like thermostats and lights
- Speakers and sound systems
…are only the platforms through which we can interact with voice user interfaces. These wearable devices give us access to the AI-powered user flows and processes that characterize a VUI.
In other words, the interaction is novel because it’s occurring through your voice, not through touch on a screen (or a GUI, which is a graphical user interface).
All you need to do is to look at a simple dialog flow when designing VUIs to understand that any prototyping that goes into VUIs is all about anticipating situations based on an initial voice command:
Now, the advantages of dealing with a VUI over a text- or touch-based interface are self-evident.
There’s less distraction, it’s easier and faster to dictate than it is to type, it mimics the way many people search for information these days (through questions), and you can enjoy a hands-free experience.
And that’s where we are with digital voice assistants like Siri or Cortana.
But the scope of the VUI experience so far limits us to simple commands, expected responses, and mostly one-line confirmations or requests for clarifications. Sometimes, developers code in snarky responses for a bit of a chuckle — but VUIs still rely on us.
In other words, we’re doing little more than commanding robots. Which can sometimes feel like herding cats.
VUIs have a unique opportunity for growth in ways few other interfaces do — and that’s because they can directly communicate with us. So the big question is: will VUIs ever be, quite literally, conversation starters?
Voice User Interfaces of the Future: Challenges and Opportunities
The bid for “smart” or “conscious” software has been around since at least the Second World War.
That was when a quiet but brilliant English scientist and mathematician devised a “game” designed to test a machine’s “believability” as it mimicked (and even passed for) a human.
Even today, we call it the Turing Test, and researchers still use it as a litmus paper to work out whether an “intelligent” device or program can reasonably interact with a human.
That standard established by pitting a human and a machine side by side and assessing its reliability and functionality based on similarities is challenging when it comes to voice user interfaces.
Unlike content personalization, recommendation engines, and even, to a certain extent, image recognitions, VUIs are facing a more complex task because they’re completing two major things:
- Firstly, to be truly supportive, VUIs must be set up to include the traditional “learning” function that all ML-powered interfaces have.
- Secondly, they must be able to naturally process and translate linguistic features such as being able to receive commands with deep variations in syntax, semantics, pragmatics, morphology, and phonology into data that corresponds with their own.
Consider, for example, the goal of content personalization on social media platforms. YouTube uses an AI-driven algorithm to personalize backgrounds on videos, drives recommendations based on real-time data, and even parses video frames for potentially objectionable content.
And, to top it all off, the neural networks powering the YouTube AI algorithm run hundreds and even thousands of layers deep — so there’s simply no way for a human to track its data, actions, and execution.
In other words, machine learning and AI are already powerful enough to meet a large swath of Turing’s original criteria. But voice user interfaces offer the chance for a more definitive, impactful, and instant standard for computer consciousness.
And that’s because nothing in the world of user interaction and design is as “human” as voice.
This also means that the challenge to produce truly intuitive and responsive VUIs in the era of AI-powered design goes beyond simply eliminating or debugging situations of #Alexafails. It is about producing voice user interfaces that can not only learn like a human but also do so on their own, without inputs from the user. A professional UI design company will create a visually appealing voice user interface for you.
Right now, deep machine learning, which is a core part of AI’s modus operandi, can already parse for complexities relating to spoken commands. For example, one of the layers of machine learning for VUIs is natural language processing (NLP).
Check out also our project about voice recognition at the Mozilla Common Voice interface.
Together with natural language understanding (NLU), we’re already able to employ conversational AI in contexts like virtual assistants and chatbots. But cognitive scientists within artificial intelligence engineering want to be able to create VUIs that don’t need human input to learn and speak.
The true mark of a successful VUI is the ability to hold complex conversations, and personalizing conversations will take VUIs to a place beyond what’s considered human.
While humans don’t have the luxury, data, or capacity to capture and learn about your past searches, interactions, purchases, and more, AI-powered VUIs do. And this means the conversations they’ll have with you, while entirely “natural” and “human,” are also uncomfortably non-human and instantly intimate.
Technically, we’re surpassing the parameters of the Turing Test.
The Best Ways to Measure Conversational AI Success
AI and ML-powered user interfaces are impressive because of the sheer amount of data they can capture.
Are also impressive because of the ways in which they can “learn” from this data. At the end of the day, all consumers really want is an experience.
And that’s why AI and machine learning are game-changers when it comes to voice user interfaces. Think of these five measures of conversation AI success as the next-generation Turing test.
1) Applying Situational Awareness
Conversational AIs surpass human voice interaction when they maintain awareness of a history of interactions. They’re in a critical position because they’re able to leverage historical interaction data.
Based on the current context and situation, VUIs powered by AI and machine learning should be able to deliver the most relevant options to the user.
2) Learning Over Time
VUIs powered by AI need the ability to learn about more than interaction through a user’s taps, clicks, choices, or even questions. They should be able to convert factors like tone and response into usable data that it then relies on to offer more intuitive conversations over time.
3) Connecting Across Platforms, Available on One Dashboard
On the “back end,” conversational AI, which is the “next” generation of VUIs, needs to be able to connect with various other software, networks, and systems.
This comprehensive and coordinated approach might not happen right away. However, conversational AI needs to be able to draw data like product recommendations, Google map searches, order histories, and more from platforms.
On the front end, it needs to present these details readily to the user on one main dashboard or device.
4) Memory Capacity and Functionality
Just like great user interfaces don’t tax the user and require them to learn the basics of interaction all over again, conversational AI needs to maintain long- and short-term memory.
Just as true memory keeps the most commonly used elements alive, conversational AI memory should actively cull and maintain a storehouse of information that’s most to least relevant. From here, it can offer information, responses, and suggestions based on an individual’s history of actions and behaviors.
5) Predictability and Analysis
How well conversational AI is able to predict and anticipate a user’s needs is a significant aspect of its viability and usefulness.
They need to be able to harness predictive algorithms that offer the best course of action, based on past experiences, with the most positive outcome for that situation. This is, of course, where both situational awareness and memory capacity come into play.
There are plenty of recommendations and developer- and designer-focused guidelines for how to “design for voice.” And these are, no doubt, crucial in helping users adapt to and evolve into users who are voice-first.
Improve your digital presence with professional interface design services.
However, the rise of truly conversational AI is that it won’t need you to design for conversations — it’ll spark conversations instead.