Proprietary data is the only real AI moat in vertical B2B SaaS.
Model access is not a moat. Cobalt Glacier underwrites AI features on the data behind them — workflow-generated, non-obvious schema, closed feedback loop — not the model in front of them.
We have argued in AI doesn't lower SaaS prices, it widens margins that the operator-level economics of AI in B2B SaaS are the opposite of the consensus narrative, and in Gross margin floors in AI-native B2B SaaS that we will not underwrite an AI-native acquisition below a seventy-percent steady-state gross margin floor. This essay sits on top of both of those — once the margin floor is cleared, what actually decides whether the AI feature defends the brand on a twenty-five-year hold.
The default mistake
The default mistake in AI diligence is to treat the model as the asset. A target ships a feature powered by a frontier model, the demo is impressive, the renewal language references the feature, and the buyer concludes that the AI is doing defensible work. The problem is that every competitor in the category can rebuild the same feature in a sprint, on the same model, at a roughly identical inference cost. The feature is real. The moat is not. Within twelve months, the feature is table stakes and the original differentiation has migrated to whoever owns the data the model is reasoning over.
The model is a commodity input. The data the model runs on, and the workflow that data was captured inside, is the only part of an AI feature that does not arbitrage to zero.
The three conditions for a real data moat
A proprietary data asset only behaves like a moat when it meets three conditions simultaneously. Miss any one of them and the asset is real but undefended, which on a permanent-capital clock is the same thing as undefended.
1. The data is workflow-generated
The data has to be a byproduct of the actual work the customer is doing in the product, not something they could plausibly export, license, or recreate elsewhere. Workflow-generated data accumulates passively as long as the product is being used and cannot be reconstructed from the outside without rebuilding the workflow itself. Static datasets that were purchased, scraped, or seeded at launch fail this test — anything a target paid for, a competitor can pay for, and anything that was scraped once can be scraped again.
2. The schema is non-obvious
The shape of the data matters more than the volume. If the schema is the obvious one — invoices have line items, tickets have statuses, contacts have email addresses — then any competent competitor can stand up an equivalent table in a week and the only remaining advantage is volume, which a funded competitor can close quickly. Non-obvious schemas come from products that have spent years opinionating about what the workflow actually is and have captured the resulting structure in the data model. The schema is the encoded product thesis, and it is much harder to copy than the rows.
3. The feedback loop is closed inside the product
Every meaningful AI feature produces an output the user accepts, edits, or rejects. If those acceptances and edits flow back into the product as labeled training signal — preferably inside a feature flag the operator controls — the model improves with use and the data asset compounds. If the output is consumed outside the product, the feedback loop is broken and the asset stops improving the moment the customer turns their attention elsewhere. The closed loop is what turns a data asset from a snapshot into an engine.
What this rules out
Quite a lot, in practice. The framework rules out AI features layered on top of integrations the target does not own end-to-end, AI features that summarize data the customer could export to any other vendor tomorrow, and AI features whose primary differentiation is prompt engineering or system prompts. It also rules out most of the "AI copilots" being shipped onto horizontal B2B SaaS surfaces in 2026 — the feature is real, the engagement metrics are real, and the moat is roughly zero. We pass on these targets at the AI-moat layer even if they clear every other underwriting bar.
What it rules in
- Vertical SaaS with deep workflow ownership. The reason vertical SaaS continues to earn pricing power, as we wrote in Pricing power in vertical SaaS, is the same reason it generates the best AI data assets — the workflow is narrow enough that the schema becomes the product thesis, and the customer cannot easily unbundle the data from the tool.
- Products with a system of record posture. If the product is where the customer enters the canonical version of a record — not where they sync it from somewhere else — the data is workflow-generated by definition.
- Products that already log user corrections. A pre-existing accept/edit/reject telemetry stream is one of the strongest signals we look for. It means the team has been thinking about the feedback loop before they had a reason to monetize it.
How we diligence the data asset
The diligence question we actually run is whether the data asset would survive the model layer being swapped out tomorrow. We assume the foundation model the target is using today is replaced in twelve months by a model that is faster, cheaper, and roughly as capable, available to every competitor on the same terms. The data asset has to be the part of the AI feature that still produces differentiation under that assumption. If the asset collapses when the model changes — if the entire feature was a thin wrapper on a foundation model that anyone can call — there is no moat to underwrite, and we adjust the multiple accordingly.
The corollary is that we are entirely comfortable with targets whose AI feature is unimpressive in isolation but whose underlying data asset is strong. The unimpressive feature is usually a function of an early team that has not yet built the model layer well; the strong data asset is the part that is hard to acquire. We can replace the feature. We cannot replicate the data.
The operator implication
For operators inside the portfolio, the framework cashes out in a few concrete behaviors. We invest aggressively in capturing accept/edit/reject telemetry inside every AI surface, we resist exposing raw data via APIs to integrations that would let customers exfiltrate the workflow signal, and we treat the schema as a first-class product artifact that product managers own and that does not change without a real conversation. None of this is exotic. It is the operating-system-level work that turns an AI feature into a data asset and a data asset into a moat.
The bottom line
The AI moat in vertical B2B SaaS is not the model, the provider, the prompt, or the inference price. It is the data the product captures while the workflow runs, the schema that data lands in, and the feedback loop that turns user corrections into model improvement. We underwrite AI features on the data behind them, not the model in front of them, and we adjust the multiple to reflect whether the data asset would survive the model layer being swapped out tomorrow. Permanent capital is a long enough hold that the model layer will be swapped, more than once. The data asset is what is left when the dust settles.
If you are an investor evaluating co-investment in permanent-capital SaaS and want to understand how we score AI-native targets, start a conversation with our team. The underwriting memo for any AI-native deal we pursue leads with the data asset, not the demo.