AI infra for a multi-model, agent-driven world
What happens when foundational models are commoditized?
“Vinod Khosla, explained the point this way: If you thought existing search technology was 90 percent as good as the best possible version, then pushing performance up to 95 percent was not going to win you customers. But if you thought there was more headroom—that existing search technology represented only 20 percent of the potential—then Google might be three or four times as good as its rivals, in which case its margin of engineering excellence would attract a flood of users.” - The Power Law by Sebastian Mallaby
When Google was first founded, it was so much better than the alternatives that they were able to establish a near-monopoly in consumer search. And from a consumer product standpoint, foundational models look a lot like search engines. ChatGPT, Perplexity, and Gemini from Google are all used as search tools in addition to their generative content capabilities.
But, I think there’s one big difference: the gap in the quality of these foundational models is much smaller. As a result, I think these models will get commoditized and there will be no Google of LLMs.
So what happens when, as a developer or user, it no longer becomes important to pick a specific model?
Why vertical-focused models?
I’m super interested in what will happen when businesses with proprietary data will be able to fine tune models that are super focused on a particular domain. One example of this is BloombergGPT outperforming other similar models on financial NLP tasks or Devin from Cognition Labs focused on software development.
However, I intend to write a separate article about why we are indeed not at the “end of software”. I do think software will become more vertical and integrated than it already has. Platforms will solve complete worfklows with agents and point solutions may become obselete. But, for the sake of this post, I’m going to solely talk about the infrastructure we need to build this future where agents navigate multiple domain-specific models.
What do we need to build on top of vertical models?
To make vertical models compelling, they will need to be able to leverage retrieval-augmented generation, a model where LLMs generate responses by factoring in data outside of its training data set. For RAG to work, LLMs need clean data. Today, ETLs solve that problem for structured data but large swaths of the data that RAG would leverage is unstructured like PDFs and PowerPoints. Some companies like Unstructured.io and Datavolo have been trying to solve this problem.
Next, once businesses can leverage multiple models, they will need to be able to efficiently move between them without complex rip-and-replace projects. Enter model routers that can switch out the model for every new prompt. Today, companies like Martian, Not Diamond, and Unify allow you to optimize for cost and latency across the major foundational models. As the vertical-specific multi-model world becomes real, model routers will need to route prompts based off their domain relevance. This is especially important for workflows that will require an agent to take a chain of actions. We will also need innovation at the encoding layer to make sure prompts can work across the various models.
How do we enable agents to solve complete workflows?
First off, we need agents to be able to interact with the rest of the internet the way we would. I am sympathetic to the notion espoused by many that best AI agents will fundamentally act as your right-hand assistant solving your problems for you. But to do that, agents have to be able to browse the web, search for information, and pay for stuff on your behalf. That’s not easy. Companies have started to solve those problems. Browserbase is building a headless browser for agents, Exa.ai is doing something similar for search, and Elevate is trying to do that for payments. There will be many more areas that need similar solutions.
And lastly, if all of that works and agents can start to solve complete workflows, we will need to be able to monitor their performance and remediate failures. We need a DataDog of agents and maybe they’ll be the ones to build it. Startups like Haize Labs are trying too.
As always, if you’re building in any of these areas, please reach out!