February 20, 2017
The use of speech recognition technology is increasing and your business could benefit. Do you remember the first time you heard an automated voice recite, “press or say one” when you called a service center? How about the first time you used the little microphone icon on your phone to dictate a text? The option to speak instead of press or type was novel and exciting. Today, speech recognition technology has advanced so significantly that the software can “comprehend” complete sentences, not just a simple number or word. In this blog post, we review what speech recognition is, software options, and current uses. We wrap up with some ideas about the future of speech recognition.
The Basics of Speech Recognition
Speech recognition, simply put, is the automatic process of converting the audio of human speech into text. The value of speech recognition typically centers around increased speed and accuracy in any of a variety of tasks. An email or term paper can be dictated at roughly three times the speed of typing it. For non-native speakers, and those for whom spelling does not come easily, speaking text instead of typing it greatly increases correct spelling.
Additionally, speech recognition software can eliminate the need to hold a device or use a keyboard, allowing you to go hands-free. This feature benefits not only professionals, but also multi-tasking parents, and those with injuries or disabilities that limit keyboard use.
Leveraging Speech Recognition in Business
Businesses have found many ways to incorporate speech recognition to improve business efficiency. Legal firms use speech recognition software to speed up the process of creating legal briefs and transcripts. In healthcare, reports can be dictated, leading to better patient care and quicker insurance payments due to accurate notes.
Choices for speech recognition software abound and some software comes highly recommended. Costs increase for improved accuracy and speed, as well as for extra features like the ability to dictate formatting and punctuation. While some circumstances do merit the purchase of specialized software, it’s worth noting that most Mac and Windows machines can be enabled for dictation through their standard settings.
While transcript creation is valuable, the increased accuracy of talk to text has made it possible to use voice for much more. Call centers use sophisticated voice input analysis to route calls. The automotive industry employs speech recognition technology for hands-free control of activities while driving, like getting weather information, listening to music, or generating directions. It is the foundation of voice control systems like Amazon Echo (Alexa) and Google Home.
Recent advances in IoT (Internet of Things) devices are rapidly changing the landscape even further, enabling people to go beyond simple information gathering via something like Echo to actually using a verbal command to control other devices. Many home appliances, from thermostats to lighting to garage doors, can now be managed by using Amazon’s Echo. While controlling personal devices by voice is exciting, we expect other businesses will also benefit as factories, offices, and various types of manufacturing equipment begin to incorporate more IoT technologies.
Speech recognition is also making impressive strides with language translation. Some electronic translators, whether in app form or stand alone, are equipped with speech-to-text recognition that will pronounce a desired word or phrase aloud. This technology allows for faster communication, whether you’re on vacation or in a business meeting.
Beyond Speech Recognition
The conversion of talk to text obviously has value in and of itself, but is not the same as voice recognition or contextual understanding, both of which we anticipate will have significant impact in the coming years.
Voice recognition takes speech recognition one step further – capturing not only words, but also inflection and tone. Voice recognition uses your unique way of speaking to authenticate your identity, allowing for multiple fascinating uses. One such use is in banking; Barclays introduced voice security technology to all its customers last year.
Contextual understanding of speech is complex in a different way, and it is the end goal of some of the work being done with artificial intelligence and machine learning. Current speech recognition technology can give the user the impression they are engaged in a conversation (think bots), but bots are primarily running a behind-the-scenes search, based on your verbal query. So, bots can answer complex isolated questions like “How tall is the Empire State Building?”, “Which football team won Super Bowl 50?”, and “What are the ingredients of a chocolate soda?”
However, bots are not currently able to maintain a true conversation, where one question implies recollection of the prior one. For example, it would be stumped by a series of questions like: “What route should I take to get to Boulder, Colorado?”, “How long will it take?”, “What is the current temperature there?”, “Can you please make me a reservation at 6:00 for 4 people at a good vegan restaurant?” As noted in our blog post, Digital Assistants, The Quest for Artificial Intelligence, we are hot on the trail of achieving this goal – thanks in part to the tremendous strides that have been made this past decade in speech recognition.
Love hearing about important technology trends? If so, consider subscribing to our monthly e-mail and/or following us on Facebook, LinkedIn, or Instagram. We’ll do our best to keep you up-to-date on important events and new innovations.
Interested in industry news and trends?
Sign up for our monthly email to get the highlights on technologies and innovations impacting mobile strategy.