About the above: three words—AI robot standup
The journal Machine Audition defines this branch of artificial intelligence as “the study of algorithms and systems for the automatic analysis and understanding of sound by machine.”
In the laboratory, this means using machines to listen, understand and respond to what humans say, no matter the language, accent or dialect.
Speech recognition—according to com, “speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.”
–Speaker recognition—Speaker recognition is the process of automatically recognizing a speaker from his/her voice samples. Speaker recognition activity can be divided into two principle tasks:
–Speaker identification is to identify a speaker from a given set of speakers from the input speech signal.
Automatic speaker verification—addresses the authentication issue of a claimed identity of a person from his/her voice samples.”—IET Biometrics citation.
Speech generation/generating devices—Aetna defines Speech-generating devices (SGDs), also known as voice output communication aids, as “electronic augmentative and alternative communication systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate.”
Machine audition is the business end of Alexa, Siri and Google Home. It turns talking to a piece of plastic into a simple conversation. Machine audition listens and talks back to you on the phone or at a customer service kiosk. Like computer vision, machine audition enables other AI applications and tools. Computer audition enables NLP to hear, understand and respond. It helps machine learning translate languages on the fly.
One of the many concerns some have about AI is the threat of teaching machines to mimic humans. Someone who can use your voice could wreak complete havoc in your life. That someone is now officially enabled by a company called LyreBird. This two-year-old Toronto-based startup bills themselves as creators of “the most realistic artificial voices in the world. Although their name may not be familiar, here’s some of their work:
Their go-to-market approach was to create a great voice mimic and offer it to the public. Right now, you can go to their site and create a sample of your own voice. They offer a custom artificial voice product as well as an API that allows end users to create their own voices.
For some, LyreBird means we have ceded control of our voice to machines. There will come a crisis where fraudsters use spam phone calls to record and mimic your voice. At the pace that we police the internet, this should happen in the next ten years. We have given scammers a better tool to tricking voice activated sensors and surrendered the first skirmish in the war against tech hegemony.
According to the ALS Association, 75 percent of all people diagnosed with ALS will need communication assistance. ALS affects speech in a number of ways. Decay of the central nervous system impairs control of oral muscles. Weakness, incoordination or paralysis of speech musculature. ALS can disturb the pitch, tone, loudness and quality of the voice. Eventually the damage leads to the person losing their ability to speak because they can’t control their limps and tongue.
Enter Predictable, a voice assistive app from TherapyBox. Predictable allows ALS patients the ability to integrate their voice into the app. Imagine having a clear understanding of how your body will fail you over the next few years, and the pleasure of knowing there’s at least one piece of technology can that help maintain a connection with your current self.
Machine audition is a set of tools that teaches machines how to hear, identify, verify and respond to people. In real time or by analyzing recordings. Machines can respond expansively through SGDs, primarily used by the speech impaired. But as with the Amazon Alexa and other devices, other machines are getting significantly better at speaking back to us.
Cost Leadership—The general assumption would be that machine audition paired with a few NLP tools could end the call center as we know it. This assumption, like others about the societal lethality of AI, would be wrong. And old. Chatbots have gained in use among call centers.
There are drawbacks to an all-robot CSR department. You lose the necessary human touch that a caller needs at times. What a human can hear and process in the blink of an eye is very expensive in terms of AI. Machine audition can hear and speak, but it can’t really react in the call center context. The costs to develop this tech would be prohibitive. Reputationally, you may or may not survive news coverage of being that first adapter who used AI to fire a department.
The 2017 Microsoft State of Global Customer Service Report found that “about as many respondents typically begin their customer service interaction over the phone as online, 43% versus 49%, respectively.” Companies may end up shifting people from the phone to online support. Dora Rapcsak, writing in VCC.live, a call center blog foresees a future where chatbots continue to support call center agents as well as become their personal assistants. Instead of the agent sifting through screens, the chatbot assistant automatically gathers and presents information about the call.
Differentiation— With more training data, speaker verification improves significantly. Think about that. What is a cutthroat, crowded and duplicative vertical where the ability to use a verified voice to unlock a benefit, service, access, etc. could be the new product feature that makes a difference? What could an event PR firm do with that? What if your app allowed multiple users to activate their accounts by voice only?
Focus-– Do you, or a client of your PR firm or internal communications department have a need for a machine that does nothing but listen to and sift customer service conversations, transcripts and email threads for actionable information? That can analyze text for syntactics (what they say), semantics (how they feel) and pragmatics (the context that they say it in)?
There are companies selling this technology today (expanded examples).
Before thinking of using speaker recognition in a media/marketing setting, be sure to understand how your customers feel about privacy, because you literally will be storing their electronic voiceprint on your server. These days, people will ask how that works and may not accept the usual friendly corporate lawyer-speak. How transparent do you feel you need to be about the technology? Can you accomplish the goal without collecting their information? Or can it be held temporarily and securely before being flushed from the system.
The advent of 5G will expand the opportunity for marketers, merchandisers and organizations to expand the use of all kinds of sensors—wearables, static, touch and voice. Many of those sensors will be equipped to hear and speak. There will be opportunities for technology companies and their marketers to create the hardware and software to add all three technologies into signage, wall panels, doorways, building infrastructure and more.
Here’s a look at Alexa Auto, Amazon’s effort to put machine audition in your car. What is a skill related to your client or publication that would provide a unique experience in the car?
April 9, 2021
Artificial Intelligence, Machine Audition