- Jim Holiday's Easy Essay's
- Posts
- Gen AI Startups With a Tech Moat
Gen AI Startups With a Tech Moat
Do technical moats exist for gen ai startups? What do they look like in 2025?
For now, I think the answer is yes.
A useful thought experiment for medium-term (6-18 month) defensibility - “if LLMs get 100x better at what they do currently, does my product get better or get eaten?”. This is something we’ve been asking ourselves through every generation of LLMs over the last couple of years.
The 3 pillars of differentiation that we think about are - multimodality, value delivery and data.
Data is the least interesting, it’s well understood. You have a defensible edge if you have data that the LLM does not.
This is an extremely broad statement. Until recently it included near real-time data, but that is gone if you’re product is a chat-like experience as almost all the mainstream LLMs now offer access to real-time data through the UI. X’s Grok and Google’s Gemini are the first to offer this functionality through the API, which indicates your real-time data moat is about to be completely gone even if you don’t have a chat-like experience.
This leads nicely into the second pillar - Value Delivery. This is the most broadly applicable moat, but also the one that may be the most vulnerable in the near future.
Forget generative ai exists for a second. How do you envision the ideal product experience looking like for your customer? If this is not a chat-like experience, you might have a value delivery moat. This is vulnerable because generative coding tools are getting really good at emulating an arbitrary product delivery experience. But a chat-like experience is a terrible solution for almost all problems - so it’s relatively easy for now to have a value delivery moat.
Finally, multimodality; generative ai still stucks at being multimodal compared to a half-decent intern. It’s not clear how soon this will be solved.
Other people will have a more scientific definition, but I think of text being exponentially easier than audio and image, which are both exponentially easier than video. Images today are somewhere about GPT3.5 level - the original ChatGPT model that came out in November 2022. Videos are GPT3 level - which came out in June 2020. It’s now 2025. This is super exciting because it means these things have such an amazingly high ceiling to reach. It’s terrifying for the same reasons.
As a side note, you can bypass a lot of these problems by going very deep on a specific problem. Startup fundamentals haven’t changed, the application has just evolved.
What do you think? Let me know. These thoughts are still evolving based on conversations and they’ll continue as I train the best neural net of all between my 2 big ears.