Why AI “Deep Research” Products All Suck

A deep dive into the issues with AI-powered research engines.

Feb 03, 2025

Every other day, another tech giant or well-funded Silicon Valley startup releases what claims to be an AI product that produces “professional quality deep research.”

These releases receive great fanfare from the tech media, who are often quite literally bribed by PR firms. They put them through incredibly weak, superficial tests such as “tell me about the latest battery technology.” Then they hold out the unimpressive results as confirmation of the company’s claims, even going so far as to say that the products could "replace a team of researchers at an investment bank."

Invariably, I get suckered into trying all of these products—usually after signing up for a lifetime of spam email from the company, and paying something like $20. Perplexity, Exa, ChatGPT Search, DeepSeek, Gemini Deep Research. You name it. I’ve tried them all.

Without exception, all of these products suck. After each disappointment, I reflect on the fact that even the Version 1.0 research tool I built into LobbyMatic is far more capable than any of these products released by companies that are backed by many millions, and even billions, of dollars. Here are the reasons why:

Sources - As a professional researcher, like the sort who works for the Congressional Research Service or an opposition research firm, when you are given an assignment, it is obviously not even remotely sufficient to “just Google it.” Everyone knows this.

Even when Google was a real search engine, rather than the flaming pile of AI-generated, “sponsored content” slop that we see today—it would fail to produce anything more than a starting point for your research assignment.

As a researcher in the Washington DC sphere, for example, you would likely pore through some mix of:

Mentions within books
USPTO searches of patents and trademarks
The
Congress.gov
and
Regulations.gov
databases of legislation and regulations
PACER and other court record systems
Yearbook databases
Personal information databases like LexisNexis or TLO
SEC filings and corporate disclosures
Government contracting databases like
SAM.gov
Blockchain records
FEC disclosures
Specialized, paywalled trade publications
Social media records (often with the help of tools like Maltego)
Academic publication databases like PubMed, IEEE Xplore, ArXiv
Hours of video from YouTube, Facebook, Instagram and X
Plus much more

Only then would you be able to produce the sort of research report that you might reasonably expect to be able to pay your rent with.

These AI research tools don’t do anything like this. What do they do?

Well, they essentially all conduct something like a basic Google Search, usually using SerpAPI.

In the case of the most well-funded products, they might preemptively vacuum in large swaths of web data, cache it, and then carry out some sort of pseudo-proprietary mix of keyword and semantic searches across the data. This is as much to woo investors with claims of novel technical prowess as it is to increase the speed at which results are generated. There are even entire startups, funded with millions of dollars, based entirely on the premise of being good at this “search the cached data” step in the process (see Vespa, Qdrant, Pinecone, et al).

I should be clear: It isn’t some sort of a revelation to the engineer-types behind these products that they would achieve a better “deep research” product if they did what we did with LobbyMatic, and plugged into the vast universe of sources incorporated by our AI research tool. And it’s also not, per se, a software engineering problem that is out of their reach. So why don’t they do what we did, and build functional, truly useful AI research products?

“Muh Data” is the New Oil - Every major tech company, from Reddit to Yelp, and yes, even LinkedIn, has convinced themselves, and their shareholders, into believing that they are sitting on an untapped wellspring of data that they will somehow monetize in the AI gold rush.

Whether this data is actually so valuable or not is a topic for another day—but the result of this belief is that these companies are now blocking open access to this data, precluding it from use in AI products. All under the working assumption that every company is going to have their own blockbuster AI product, or a meaningful revenue stream from licensing their data (or their users’ data) to someone else’s blockbuster AI product.

For our niche lobbying product, this wasn’t a problem because we didn’t do anything close to the level of traffic necessary for these data-rich companies to even notice or care. We could have an agent that “tool-called” on a hyper-performant scraper written in Rust, and bam, we had the data from any webpage we wanted that wasn’t behind a login screen. For companies like OpenAI or Perplexity, this would result in an instant cease and desist letter and almost certain litigation (if only for the purpose of promoting one’s lesser-well-known company through the inevitable news cycle caused by suing Sam Altman). The most ambitious goal of the litigation would be to strike some kind of content sharing deal with OpenAI, the way that certain news outlets have.

Computational Expense - To conduct truly deep research, you are going to need to process millions of tokens. (*for the non-technical, just think of tokens as words for these purposes)

To build a deep research AI product properly, it’s ill advised to do so with the sort of embedding, re-ranking, semantic search, sorcery that is so common among the community of engineers who build retrieval-augmented-generation (RAG) systems.

These techniques were born of an era where large language models were limited to a few thousand input tokens. They were essential to build anything approximating a useful RAG product in the early days. But the techniques, mostly represented as open-source GitHub repos, have always been a cobbled-together mess of barely reliable point-solutions (usually written in low-performance scripting languages).

The good news? That era is over. State of the art models now have incredible context length capabilities, but for the sake of this article, I’ll take you through what I believe to be the most impressive models in this regard, which are produced by Google DeepMind.

Google’s flagship Gemini models perform excellently across their maximum 2 million tokens of context length. And Google has versions of the models, which are harder to get your hands on, that perform just as well with 9.9 million tokens in the context window. The best part is that these models, through their associated APIs, effortlessly ingest plain text, PDFs, images, audio and even video. The genius engineers at Google DeepMind have taken care of all the mathematically intensive engineering minutia associated with, for example, designing algorithms for embedding tokens extracted from images. Their “ring attention” mechanism, among other innovations, enable this best in class long-context performance. They do this all while running exclusively on their own internally designed chips known as Tensor Processing Units (TPUs), rather than the Nvidia H100s that we hear so much about.

All of this has worked very well since Gemini 1.5 Pro, which was publicly released all the way back in May of 2024. Since then, the models have only improved.

So with all of this said, how could Google’s own “Gemini Deep Research” product be such a complete piece of garbage?

Firstly, the product managers behind this Gemini Deep Research product decided to opt for the kind of half-baked embedding tricks that their own company’s DeepMind engineers (mostly based in London, rather than in San Francisco with the Deep Research team) had already rendered obsolete. This was presumably in hopes of minor cost savings, limiting the potential for losses, with people paying a fixed $20/month fee for the “Gemini Advanced” plan, where the Deep Research product is parked. It’s also likely the result of a desire by the team to do engineering for engineering’s own sake, which could be shown off to the higher-ups in internal meetings. (*Nobody in big tech ever got a promotion by saying “Don’t worry boss! The other team of engineers already solved this problem.”)

Danger/Safety/Lawyers - The last problem with developing and releasing a deep research tool to the public is the inevitable faux-outrage that ensues if and when the deep research tool actually works. You can imagine the headlines already from The Verge and TechCrunch: “SHOCK! OpenAI Research Tool Reveals Someone’s Home Address!” – Imagine the horror! An AI tool reveals the exact same thing that a basic Google search might reveal. Shut it down! Investigate!

You see, by producing a deep research tool that actually works, and releasing it to the public, the AI company behind it would guarantee that they would be called before Congress to testify about the absolute atrocity of someone’s address or phone number or some other piece of totally pedestrian information being retrieved by the AI tool. One again—in the same way that the information might be retrieved by a run of the mill Google search.

AI is a hot topic right now—which means that click-hungry “journalists” and attention-loving politicians, from congressmen to state attorneys general, would be certain to attack whoever released that product. It’s for this same reason that Google and Bing search bars will happily produce adult content results, but their AI-search products will not. One technology is 25 years old, and thus uninteresting to the media and lawmakers, and the other is new, and therefore potentially scandalous.

Perhaps China, mostly unrestrained by these considerations, will release a product like the one I built to the general public? In the meantime, you will have to build it yourself, or—if you are the US government, pay Deloitte or Booz Allen tens of millions of dollars to do so.

TLDR; Building a great deep research product is not really all that difficult. I’ve done it myself. However, it requires scraping some “proprietary” data from the web, which certain people don’t like. It also requires that one does not over-engineer the product with too-clever-by-half tricks, and that means that the product needs to cost quite a bit more than $20/month. Lastly, the tool must be willing to process and retrieve certain data, including personal data governed by GLBA and any number of other federal and state regulations, which makes the lawyers jump out of their chairs.

Jacob Wohl's Newsletter

Discussion about this post