Evaluating LLM Infrastructure Investments: Our Portfolio Lens

LLM infrastructure has been one of the most heavily funded categories in venture in the past eighteen months — and one of the most difficult to evaluate from a seed investment perspective. The challenge is not identifying the genuine technical innovation in the space; there is plenty of it. The challenge is distinguishing between infrastructure layers that will remain independent product categories versus those that will be absorbed into either the hyperscaler cloud platforms (AWS, GCP, Azure) or into the foundation model providers' own API offerings. Getting this wrong is expensive: investing in infrastructure that gets commoditised within two years of deployment is a common failure mode in enterprise tech, and the LLM infrastructure market is moving fast enough that the commoditisation horizon is compressed relative to previous infrastructure cycles.

The framework we use is a three-question evaluation. First: is the capability being offered technically differentiated at a level that requires specialised engineering to replicate, or is it primarily an integration and tooling layer over existing foundation model APIs? A vector database with optimised ANN search algorithms and production-grade ACID compliance represents genuine engineering investment; a prompt management dashboard that chains API calls and stores templates is a tooling convenience with a six-month replication surface. Second: does the product sit at a layer of the stack where the hyperscalers have a structural distribution advantage that makes independent market entry difficult? Observability and monitoring for production ML systems can exist independently of the model infrastructure; compute provisioning for GPU workloads is a layer where AWS and Azure have structural advantages that are hard to overcome. Third: does the buyer (enterprise software companies and AI-native B2B teams) have a procurement preference for independent specialist vendors over integrated offerings from their existing cloud infrastructure provider?

Applying this framework to our current pipeline: we are most interested in the evaluation and quality assurance layer — specifically, tooling that helps enterprise software teams measure whether their LLM-powered features are working correctly in production, at the granularity of specific task types and input distributions. This is not a capability that foundation model providers are incentivised to build well, because rigorous model quality measurement would surface limitations of their models that they would prefer customers not quantify precisely. It is also not a capability that cloud platforms have absorbed, because it requires deep understanding of ML evaluation methodology that is outside the core hyperscaler competency. And it is a buyer pain that is increasing rapidly: as enterprise software teams ship more LLM-powered features to production, the lack of rigorous quality measurement infrastructure is becoming a risk management problem.

The portfolio lens we bring to this evaluation is specific: every AI-native B2B company in our portfolio is grappling with the problem of how to know whether their ML components are working correctly after deployment. The answers they have found independently are inconsistent and often expensive — custom evaluation datasets built by hand, manual sampling processes, ad-hoc statistical monitoring that does not capture the full distribution of production inputs. The companies that can offer this quality assurance infrastructure as a reusable product, specifically calibrated for the kinds of structured output tasks (extraction, classification, decision routing) that characterise enterprise workflow automation, are solving a pain that we can validate directly from portfolio conversations.

We are not saying LLM infrastructure as a category is overinvested. The demand side is genuine and growing. What we are saying is that the infrastructure layers closest to the model providers and the hyperscalers face commoditisation dynamics that make the investment horizon much shorter than the growth in AI adoption would suggest. The infrastructure layers that sit closer to the application layer — evaluation, quality assurance, governance, multi-model orchestration with defined fallback logic — have more defensible positions because they require application-context knowledge that infrastructure-first vendors do not typically develop. The seed investments in this space that we find credible are those where the technical founding team has previously built production AI systems and is solving an infrastructure problem they encountered firsthand, not those building infrastructure products based on market research into what other people might need.