December 25, 2024

The AI Search War Has Begun

6 min read
Illustration of two guns with internet search bars replacing the barrels

Every second of every day, people across the world type tens of thousands of queries into Google, adding up to trillions of searches a year. Google and a few other search engines are the portal through which several billion people navigate the internet. Many of the world’s most powerful tech companies, including Google, Microsoft, and OpenAI, have recently spotted an opportunity to remake that gateway with generative AI, and they are racing to seize it. And as of this week, the generative-AI search wars are in full swing.

The value of an AI-powered search bar is straightforward: Instead of having to open and read multiple links, wouldn’t it be better to type your query into a chatbot and receive an immediate, comprehensive answer? In order for this approach to work, though, AI models have to be able to scrape the web for relevant information. Nearly two years after the arrival of ChatGPT, and with users growing aware that many generative-AI products have effectively been built on stolen information, tech companies are trying to play nice with the media outlets that supply the content these machines need.

This morning, the start-up Perplexity, which offers an AI-powered “answer engine,” announced revenue-sharing deals with Time, Fortune, and several other publishers. Moving forward, these publishers will be compensated when Perplexity earns ad revenue from AI-generated answers that cite partner content. The site does not currently run ads, but will begin doing so in the form of sponsored “related follow-up questions” this fall—a sportswear brand could pay for a follow-up question to appear in response to a query about Babe Ruth, and if the AI used Time in its answer, then Time would get a cut of the ad revenue for every citation. OpenAI has been building its own roster of media partners, including News Corp, Vox Media, and The Atlantic, and last week announced its own AI-search prototype, SearchGPT. (The editorial division of The Atlantic operates independently from the business division, which announced its corporate partnership with OpenAI in May.) Google has purchased the rights to use Reddit content to train future AI models, and currently appears to be the only major search engine that Reddit is permitting to surface its content. The default was once that you would directly consume work by another person; now an AI may chew and regurgitate it first, then determine what you see based on its opaque underlying algorithm. This also means that many of the human readers whom media outlets currently show ads and sell subscriptions to will have less reason to ever visit publishers’ websites.

Tech companies have made deals with journalistic outlets in the past, paying publishers to use products such as Facebook Live and Snapchat Discover, but these AI searchbots are different. Facebook and Snapchat are social products at their core; you log on to them to see what other people are posting, and for many users, news content may be incidental. Perplexity and SearchGPT, by contrast, need high-quality, timely content to answer questions accurately.

Generative-AI models have no internal information beyond their training data, which tend to be months or years old. Without more recent stories, these products would be limited, unable to deliver relevant information about H5N1, the attempted assassination of Donald Trump, the Olympics, and so on. OpenAI’s most advanced model, for instance, was released in May but has no knowledge of events after October 2023. When I first spoke with Dmitry Shevelenko, Perplexity’s chief business officer, in June, he told me, “One of the key ingredients for our long-term success is that we need web publishers to keep creating great journalism that is loaded up with facts, because you can’t answer questions well if you don’t have accurate source material.”

Of course, existing AI products are absolutely filled with media that publishers have received no compensation for. (Shevelenko told me that Perplexity will not stop citing publishers outside its revenue-sharing deal, nor will it show any preference for its paid partners moving forward.) AI companies don’t seem to value human words, human photos, and human videos as works of craft or products of labor; instead they treat the content as strip mines of information. “People don’t come to Perplexity to consume journalism; they come to Perplexity to consume facts,” Shevelenko told me in an interview before today’s announcement. “Journalists’ content is rich in facts, verified knowledge, and that is the utility function it plays to an AI answer engine.” To Shevelenko, that means Perplexity and journalists are not in direct competition—the former answers questions; the latter breaks news or provides compelling prose and ideas. But even he conceded that AI search will send less traffic to media websites than traditional search engines have, because users have less reason to click on any links—the bot is providing the answer.

The growing number of AI-media deals, then, are a shakedown. Sure, Shevelenko told me that Perplexity thinks revenue-sharing is the right thing to do. But AI is scraping publishers’ content whether they want it to or not: Media companies can be chumps or get paid. Still, the nature of these deals also suggests that publishers may have more power than it seems. Perplexity and OpenAI, for instance, are offering fairly different incentives to media partners—meaning the tech start-ups are themselves competing to win over publishers. All of these products have made basic mistakes, such as incorrectly citing sources and fabricating information. Having a searchbot ground itself in human-made “verified knowledge” might help alleviate these issues, especially for recent events the AI model wasn’t trained on. Publishers also have at least some ability to limit AI search engines’ ability to read their websites. They can also refuse to sign or renegotiate deals, or even sue AI companies for copyright infringement, as The New York Times has done. AI firms seem to have their own ways around media companies’ barricades, but that is an ongoing arms race without a clear winner.

Publishers may now have sway over AI companies that need high-quality, human-made content to either answer user queries or train future AI models, like a GPT-5 or GPT-6. Nicholas Thompson, the CEO of The Atlantic, said in an interview with the tech journalist Nilay Patel that The Atlantic’s contract with OpenAI will expire after two years, and is designed to create “more leverage when there’s another moment of negotiation.” Reddit has recently cut off search engines other than Google from crawling its site; if DuckDuckGo, Perplexity, or Bing want to show users new posts from Reddit, they will have to “make enforceable promises regarding their use of Reddit content, including their use for AI,” a Reddit spokesperson told The Verge. (Of course, Reddit has a hard-core user base and isn’t a traditional news organization—media companies are constantly vying for attention and may be less comfortable with closing off potential audiences.)

In other words, whether OpenAI, Perplexity, Google, or someone else wins the AI search war might not depend entirely on their software: Media partners are also an important part of the equation. This could possibly shift. Shevelenko told me he believes that Perplexity’s use of publishers’ content is legal under copyright law, and if he’s proved right by a judge’s ruling, then AI companies may no longer see an incentive to pay publishers. For now, that decision is up in the air, and publishers are taking advantage of a small window of opportunity. Perplexity, for its part, has been accused of plagiarizing content from publishers including Forbes and Condé Nast, which could dissuade other publishers from partnering with the start-up; Shevelenko has told Semafor that Perplexity had to persuade its initial slate of partners to overlook these allegations. The company was supposed to announce its revenue-sharing program roughly when Shevelenko spoke with me in June, but delayed the formal launch amid a wave of criticism. Now, he said, “the ball is in our court to show publishers that we are a good-faith actor taking the right, long-term moves.”

The search war is an attempt to change how people navigate the internet, the system through which the contemporary world organizes and disseminates knowledge. But the underlying terrain has not changed: Knowledge, no matter its organization, remains the sum of writing, art, and thinking from humanity, not from a bot.