Teams of Amazon employees and contractors review as many as 1,000 disembodied audio clips from Amazon Echo devices a day in order to transcribe, annotate, and feed their contents back into the company's Alexa AI, training the software to better understand the nuances of human speech, seven people who worked on the program told Bloomberg. Listeners are stationed in locations as diverse as Boston, Costa Rica, India, and Romania.
Amazon doesn't exactly advertise the human role in its AI assistant. The marketing materials for the Echo claim "Alexa lives in the cloud and is always getting smarter," only hinting in the lengthy Alexa FAQ that "We use your requests to Alexa to train our speech recognition and natural language understanding systems." The speaker's privacy settings include an option to disable the use of voice recordings "for the development of new features," but it's unclear if this spares the user from human eavesdropping entirely.
"We only annotate an extremely small sample of Alexa voice recordings in order [to] improve the customer experience," an Amazon spokesman told Bloomberg via email, adding that the process helps tune Alexa's speech recognition and understanding of language. According to the company, "employees do not have direct access to information that can identify the person or account" during the annotation process, but Bloomberg's sources provided screenshots showing that listeners actually receive an account number, device serial number, and first name of the user associated with an audio clip.
Alexa's human enablers confirm one of the common fears about the device: that it doesn't always wait for you to activate it to start recording you. Each listener ends up transcribing up to 100 "accidental" recordings a day, one source told Bloomberg – "accidental" meaning cases in which an Echo device starts recording without the user saying its "wake word." Listeners who hear users discussing sensitive data like bank details or names are told to tick a box marking the file as "critical" and move on – which sounds fine, until one thinks of those "critical" files as red flags for some unscrupulous person looking to prey on Amazon customers.
Perhaps most disturbingly, two sources based in Romania told Bloomberg they believe they overheard a sexual assault. Seeking guidance from Amazon on how to proceed, they say they were told it "wasn't Amazon's job to interfere." The company, however, claims it has a set procedure in place for annotators who come upon something "distressing" in a recording.
And when listeners hear you singing in the shower, they don't just laugh about it to themselves, the sources told Bloomberg – they share amusing recordings among themselves in an internal chat room otherwise used to seek help in deciphering unclear speech.
Alexa is certainly not the only "voice assistant" to use humans in order to better serve humans – Apple's Siri, arguably the AI who started it all, has human helpers who evaluate whether its responses make sense in response to user commands – but Siri's recordings are stored for six months and linked only to a random identifier, an Apple security white paper explains. Google's voice assistant employs humans with access to a few audio clips, but the company claims those are not linked to any personally identifiable information and the audio itself is distorted.
Amazon insists it has a zero tolerance policy for "abuse of our system" and claims to use multi-factor authentication and encryption to protect customer recordings during the annotation process. But customers' recordings have certainly ended up in strange places before.
More about: Amazon