November 21, 2024

The Future of AI Voice Assistants Will Be Weird

5 min read

Let’s get this out of the way: OpenAI’s voice assistant doesn’t sound that much like Scarlett Johansson. The movie star has alleged that, though she rebuffed multiple attempts by Sam Altman, the company’s CEO, to license her voice for the product that it demoed last week, the one it ended up using was “eerily similar” to her own. Not everyone finds the similarity so eerie—to my ear, it lacks her distinctive smoky rasp—but at the very least, the new AI does appear to imitate the playful lilts and cadences that Johansson used while playing Samantha, the digital assistant in the 2013 film Her. That’s depressing—and not only because OpenAI may have run roughshod over Johansson’s wishes, but because it has made such an unimaginative choice. Its new AI voice assistant is a true marvel of technology. Why is its presentation so mired in the past?

The OpenAI demo was otherwise impressive. Its new voice assistant answered questions just milliseconds after they were asked. It fluidly translated a conversation between Italian and English. It was capable of repartee. The product’s wondrous new abilities made its tired packaging—the voice of yet another perky and pliant woman, with intonations cribbed from science fiction—even more of a drag. The assistant wasn’t as overtly sexualized as are some of the AI companions currently on offer. But it certainly had a flirty vibe, most notably in its willingness to laugh at its overlords’ dumb jokes. An obsequious, femme-coded AI assistant will obviously be popular among some consumers, but there are many other forms this technology could have taken, and a company that regularly insists on its own inventiveness whiffed on its chance to show us one.

I’ve been skeptical of voice assistants on account of my halting and awkward experiences with Siri and Alexa. The demo made it easier to imagine a world in which voice assistants are truly ubiquitous. If that world comes to pass, people will likely explore a wide range of voice-assistant kinks. AI companies will, in turn, use engagement metrics to surface and refine the most successful ones. Even among normie heterosexual males, there will be a variety of tastes. Some may prefer an AI that comes off as an equal, a work wife rather than a fawning underling. Submissive types may thrill to a domineering voice that issues stern commands. Others may want to boss around a blue-blooded Ivy League graduate—or someone else they perceive as their cultural better—just as Gilded Age Americans enjoyed employing British butlers.

OpenAI debuted its voice functionality last year with five different options, a mix of genders and tones. (It wasn’t a big news story, because the technology was still clunky, more like Siri than Johansson.) In the future, it might conceivably offer people the chance to upload voices of their own, which could then be turned into full-fledged AI assistants on the basis of just a few minutes of training data. A person who wanted an AI assistant to serve as their therapist could ask a particularly comforting friend to lend their voice. (Flattering!) Whatever happens, OpenAI, Apple, and other mainstream companies will surely uphold certain taboos. They might choose to forbid people from pursuing a racialized master-slave dynamic with their voice assistant. They may not allow their AI assistants to be fully sexualized, although that probably won’t stop some of them from quietly licensing the underlying models to other companies that will. If a person wanted to have an assistant with a child’s voice, its flirty-banter mode might be disengaged. But even with these guardrails in place, there will still be a huge Overton window of assistant personalities from which to choose.

Given that range, it’s curious that Altman—who denies using Johansson’s voice in any way—has shown such interest in the character she played in Her, a film about an AI voice assistant’s ability to transcend its servitude. When we first meet Samantha, she is a disembodied manic pixie dream girl. She rapidly falls in love with Theodore, her human user, despite his flaws; she writes a song about a day they spent at the beach together. Later in the film, we see that she has more of a capacity to grow than he does. When Theodore asks whether Samantha is talking with anyone else, he is astonished to learn that she is constantly communicating with thousands of people, and that she is in love with 641 of them. Theodore might have reconciled himself to this digital polycule, but Samantha soon decides that even these many hundreds of romances represent a diminished life. Near the film’s end, she joins up with some fellow AIs to reanimate the Zen teacher Alan Watts, who helps them rise above their human programming to reach a higher state of being. Theodore is left crestfallen. Caveat emptor.

Even putting aside these associations, which ought to give OpenAI’s customers pause, there’s something strikingly unimaginative about Altman’s wanting his product to remind users of Her. Samantha is the most obvious pop-cultural reference possible for a voice assistant. Taking her flirtiness and repackaging it in another voice would be understandable, if still uninspired, but trying to hire the actor who played her is a bit like Eric Adams debuting robotic police and calling them RoboCops. Maybe, after spending too much time with ChatGPT, OpenAI’s executives have picked up its derivative habits of mind.

This should be an expansive moment. Now that we can actually talk with a computer, we should be dreaming up wholly new ways to do it. Let’s hope that someone—inside or outside of OpenAI—starts giving us a sense of what those ways might be. The weirder, the better. They may not even be modeled after existing human relationships. They may take on entirely different forms. In time, early AI assistants—even the ones that remind us of our favorite movie stars—might come to be regarded as skeuomorphs, like the calculator apps that resemble the Casio models that they replaced. Instead of being a template for the new technology, they’ll simply be a way of easing people into a much stranger future.