November 22, 2024

The World Through The Eyes of a Chatbot

4 min read
A diagram of the word "four-legged" pointing to a dog and a chair

This article was originally published by Quanta Magazine.

A picture may be worth a thousand words, but how many numbers is a word worth? The question may sound silly, but it happens to be the foundation that underlies large language models, or LLMs—and through them, many modern applications of artificial intelligence.

Every LLM has its own answer. In Meta’s open-source Llama 3 model, words are split into tokens represented by 4,096 numbers; for one version of GPT-3, it’s 12,288. Individually, these long numerical lists—known as “embeddings”—are just inscrutable chains of digits. But in concert, they encode mathematical relationships between words that can look surprisingly like meaning.

The basic idea behind word embeddings is decades old. To model language on a computer, start by taking every word in the dictionary and making a list of its essential features—how many is up to you, as long as it’s the same for every word. “You can almost think of it like a 20 Questions game,” says Ellie Pavlick, a computer scientist studying language models at Brown University and Google DeepMind. “Animal, vegetable, object—the features can be anything that people think are useful for distinguishing concepts.” Then assign a numerical value to each feature in the list. The word dog, for example, would score high on “furry” but low on “metallic.” The result will embed each word’s semantic associations, and its relationship to other words, into a unique string of numbers.

Researchers once specified these embeddings by hand, but now they’re generated automatically. For instance, neural networks can be trained to group words (or, technically, fragments of text called “tokens”) according to features that the network defines by itself. “Maybe one feature separates nouns and verbs really nicely, and another separates words that tend to occur after a period from words that don’t occur after a period,” Pavlick says.

The downside of these machine-learned embeddings is that, unlike in a game of 20 Questions, many of the descriptions encoded in each list of numbers are not interpretable by humans. “It seems to be a grab bag of stuff,” Pavlick says. “The neural network can just make up features in any way that will help.”

But when a neural network is trained on a particular task called language modeling—which here involves predicting the next word in a sequence—the embeddings it learns are anything but arbitrary. Like iron filings lining up under a magnetic field, the values become set in such a way that words with similar associations have mathematically similar embeddings. For example, the embeddings for dog and cat will be more similar than those for dog and chair.

This phenomenon can make embeddings seem mysterious, even magical: a neural network somehow transmuting raw numbers into linguistic meaning, “like spinning straw into gold,” Pavlick says. Famous examples of “word arithmetic”—king minus man plus woman roughly equals queen—have only enhanced the aura around embeddings. They seem to act as a rich, flexible repository of what an LLM “knows.”

But this supposed knowledge isn’t anything like what we’d find in a dictionary. Instead, it’s more like a map. If you imagine every embedding as a set of coordinates on a high-dimensional map shared by other embeddings, you’ll see certain patterns pop up. Certain words will cluster together, like suburbs hugging a big city. And again, dog and cat will have more similar coordinates than dog and chair.

But unlike points on a map, these coordinates refer only to one another—not to any underlying territory, the way latitude and longitude numbers indicate specific spots on Earth. Instead, the embeddings for dog or cat are more like coordinates in interstellar space: meaningless, except for how close they happen to be to other known points.

So why are the embeddings for dog and cat so similar? It’s because they take advantage of something that linguists have known for decades: Words used in similar contexts tend to have similar meanings. In the sequence “I hired a pet sitter to feed my ____,” the next word might be dog or cat, but it’s probably not chair. You don’t need a dictionary to determine this, just statistics.

Embeddings—contextual coordinates, based on those statistics—are how an LLM can find a good starting point for making its next-word predictions, without relying on definitions.

Certain words in certain contexts fit together better than others, sometimes so precisely that literally no other words will do. (Imagine finishing the sentence “The current president of France is named ____.”) According to many linguists, a big part of why humans can finely discern this sense of fitting is because we don’t just relate words to one another—we actually know what they refer to, like territory on a map. Language models don’t, because embeddings don’t work that way.

Still, as a proxy for semantic meaning, embeddings have proved surprisingly effective. It’s one reason why large language models have rapidly risen to the forefront of AI. When these mathematical objects fit together in a way that coincides with our expectations, it feels like intelligence; when they don’t, we call it a “hallucination.” To the LLM, though, there’s no difference. They’re just lists of numbers, lost in space.