Communication between a computer and humans have always been unnatural – typing to get a response isn’t the most intuitive method of conveying commands/instructions as opposed to the ease of using voice. The introduction of speech recognition technology aims to bridge the gap of communication – where computers can make sense of certain phrases and sentences.

Devices such as Amazon EchoApple HomePod, and Google Home have widely been embraced and used in homes, offices, etc which further solidifies the need for voice-controlled interfaces. Although we are far from the futuristic A.I assistants/androids we see in movies, strides are being made to allow computers to recognize speech which paves the way for such futuristic precedence.

Without the need to swipe on screens, type, or use gestures – Voice User Interface (VUI) allows lots of visually and motor-impaired individuals to interact with computers naturally and coherently as everyone else. VUI’s might also be one way we get rid of screen fatigue and accessibility issues, which have been a major cause for concern with visual interfaces. The World Health Organization (WHO) cited a whopping 2.2 billion people who have a vision impairment or blindness – this creates a strong case for accessibility in design and reduction in screen usage or better screen technology.

The unrestrictive nature of VUI will allow both designers and developers to come up with much richer content based on the conversational tone and intent of the user. Text to Speech (TTS) was one of the first implementations of VUI. This allowed the creation of documents solely by using the voice and sending short commands with a somewhat intuitive response. TTS allowed developers to improve and refine the “hearing” capabilities of computers – capturing voice and transcoding them into text. The next logical step was to make the computer recognize these texts, phrases, and sentences while making sense of them. This will later come in the form of Natural Language Processing (NLP).

Most of the new productivity-focused applications (including some existing ones), as well as some email clients, have begun to embrace NLP as a simplified way for users to create tasks, search, and communicate. By allowing the A.I to recognize patterns within free form speech, these systems allow natural interaction with apps by making them conform/understand human language – as opposed to humans issuing commands in a less than natural manner for computers to make sense of these commands. Although NLP is largely developer-oriented, the approach to making this abstract concept visual creates an exciting design challenge.

VUI shows a lot of promise for the future, this is however plagued by the number of unknowns lurking. Users do not know the understanding capabilities these devices wield, which results in users often having to randomly guess and hope to get the response needed. Most of the voice assistants currently on the market have specific phrases used to invoke certain commands which breaks the “natural” flow of conversation with computers. The lack of long term memory in computers is another hindrance to the progression of VUI’s.

Even though the concepts of VUI’s have been around for ages, the reality on the ground is far from what we envision. This is to say, the discipline is relatively new and offers exciting challenges both to designers and developers. With the top four companies (Apple, Google, Amazon, and Facebook) backing VUI progression and laying down guidelines (some design/development resources) it won’t be long before we see the true potential of VUI’s. The future of design, VUI, in particular, seems to be a field with lots of promising design opportunities as well as solutions to iron out the “weird” interaction barrier between us and the systems we use – while making accessibility widely available to all.