May 24, 2026·6 min read

Proprietary data is the only real AI moat in vertical B2B SaaS.

Model access is not a moat. Cobalt Glacier builds AI features on the data behind them — workflow-generated, non-obvious schema, closed feedback loop — not the model in front of them.

Investors Underwriting AI-native SaaS Vertical SaaS Permanent capital B2B SaaS

// TL;DR

The current generation of AI features in B2B SaaS is being justified, in most build plans, on the wrong asset. Teams ask whether they can ship AI, whether the model is fine-tuned, and which provider sits behind the inference call. None of those questions describe a moat. The moat lives one layer down — in the data the product captures while it runs, the schema that data lands in, and the feedback loop that connects model output back to user behavior. This essay is the framework Cobalt Glacier uses, inside our own studio, to separate AI features that compound from AI features that are quietly a vendor relationship with whichever foundation model is cheapest this quarter.

Key takeaways

Model access is not a moat. Every meaningful AI model is available to every competitor on roughly the same pricing curve, and the curve gets cheaper every quarter.
Proprietary data is the only durable AI moat in vertical B2B SaaS, and only when three conditions hold: the data is workflow-generated, the schema is non-obvious, and the feedback loop is closed inside the product.
Cobalt Glacier greenlights AI features on the data behind them, not the model in front of them. The build question is whether the data asset would survive the model layer being swapped out tomorrow.

We have argued in AI doesn't lower SaaS prices, it widens margins that the operator-level economics of AI in B2B SaaS are the opposite of the consensus narrative, and in Gross margin floors in AI-native B2B SaaS that we will not build an AI-native feature into a brand's roadmap below a seventy-percent steady-state gross margin floor. This essay sits on top of both of those — once the margin floor is cleared, what actually decides whether the AI feature defends the brand on a twenty-five-year build.

The default mistake

The default mistake in AI product planning is to treat the model as the asset. A team ships a feature powered by a frontier model, the demo is impressive, the pricing page references the feature, and the team concludes that the AI is doing defensible work. The problem is that every competitor in the category can rebuild the same feature in a sprint, on the same model, at a roughly identical inference cost. The feature is real. The moat is not. Within twelve months, the feature is table stakes and the original differentiation has migrated to whoever owns the data the model is reasoning over.

The model is a commodity input. The data the model runs on, and the workflow that data was captured inside, is the only part of an AI feature that does not arbitrage to zero.

The three conditions for a real data moat

A proprietary data asset only behaves like a moat when it meets three conditions simultaneously. Miss any one of them and the asset is real but undefended, which on a patient-capital build clock is the same thing as undefended.

1. The data is workflow-generated

The data has to be a byproduct of the actual work the customer is doing in the product, not something they could plausibly export, license, or recreate elsewhere. Workflow-generated data accumulates passively as long as the product is being used and cannot be reconstructed from the outside without rebuilding the workflow itself. Static datasets that were purchased, scraped, or seeded at launch fail this test — anything a brand pays for, a competitor can pay for, and anything that was scraped once can be scraped again.

2. The schema is non-obvious

The shape of the data matters more than the volume. If the schema is the obvious one — invoices have line items, tickets have statuses, contacts have email addresses — then any competent competitor can stand up an equivalent table in a week and the only remaining advantage is volume, which a funded competitor can close quickly. Non-obvious schemas come from products that have spent years opinionating about what the workflow actually is and have captured the resulting structure in the data model. The schema is the encoded product thesis, and it is much harder to copy than the rows.

3. The feedback loop is closed inside the product

Every meaningful AI feature produces an output the user accepts, edits, or rejects. If those acceptances and edits flow back into the product as labeled training signal — preferably inside a feature flag the operator controls — the model improves with use and the data asset compounds. If the output is consumed outside the product, the feedback loop is broken and the asset stops improving the moment the customer turns their attention elsewhere. The closed loop is what turns a data asset from a snapshot into an engine.

What this rules out

Quite a lot, in practice. The framework rules out AI features layered on top of integrations the brand does not own end-to-end, AI features that summarize data the customer could export to any other vendor tomorrow, and AI features whose primary differentiation is prompt engineering or system prompts. It also rules out most of the "AI copilots" being shipped onto horizontal B2B SaaS surfaces in 2026 — the feature is real, the engagement metrics are real, and the moat is roughly zero. We kill these feature ideas at the AI-moat layer even when they clear every other build bar.

What it rules in

Vertical SaaS with deep workflow ownership. The reason vertical SaaS continues to earn pricing power, as we wrote in Pricing power in vertical SaaS, is the same reason it generates the best AI data assets — the workflow is narrow enough that the schema becomes the product thesis, and the customer cannot easily unbundle the data from the tool. It is also why we pick narrow verticals when we spin up a new brand from zero.
Products with a system of record posture. If the product is where the customer enters the canonical version of a record — not where they sync it from somewhere else — the data is workflow-generated by definition. We design for this posture from the first schema decision, not after the fact.
Products that log user corrections from day one. A pre-existing accept/edit/reject telemetry stream is one of the strongest signals we build for before we ship any AI surface. It means the team is thinking about the feedback loop before there is a reason to monetize it.

How we decide what to build

The question we actually run before greenlighting an AI feature is whether the data asset behind it would survive the model layer being swapped out tomorrow. We assume the foundation model we are using today is replaced in twelve months by a model that is faster, cheaper, and roughly as capable, available to every competitor on the same terms. The data asset has to be the part of the AI feature that still produces differentiation under that assumption. If the asset collapses when the model changes — if the entire feature is a thin wrapper on a foundation model that anyone can call — we do not greenlight it as a differentiator, even if we still ship it as table-stakes functionality.

The corollary is that we are entirely comfortable shipping an AI feature that is unimpressive in isolation as long as the underlying data asset is strong. The unimpressive feature is usually a function of an early model layer that has not yet caught up; the strong data asset is the part that took years to accrue. We can replace the feature next quarter. We cannot manufacture the data overnight.

The operator implication

Inside every brand in the studio, the framework cashes out in a few concrete behaviors. We invest aggressively in capturing accept/edit/reject telemetry inside every AI surface from the first release, we resist exposing raw data via APIs to integrations that would let customers exfiltrate the workflow signal, and we treat the schema as a first-class product artifact that product managers own and that does not change without a real conversation. None of this is exotic. It is the operating-system-level work that turns an AI feature into a data asset and a data asset into a moat, built into each brand from the day we start it.

The bottom line

The AI moat in vertical B2B SaaS is not the model, the provider, the prompt, or the inference price. It is the data the product captures while the workflow runs, the schema that data lands in, and the feedback loop that turns user corrections into model improvement. We greenlight AI features on the data behind them, not the model in front of them, and we hold back the feature's roadmap priority to reflect whether the data asset would survive the model layer being swapped out tomorrow. A patient build is a long enough horizon that the model layer will be swapped, more than once. The data asset is what is left when the dust settles.

If you are an investor evaluating co-investment in the studio and want to understand how we decide which AI features to build inside each brand, start a conversation with our team. The build plan for any AI feature we ship leads with the data asset, not the demo.