Semantic Contracts: The Governance Bridge Between Data and Meaning
In my last post, I walked through what it looks like when someone follows a thread of inquiry across multiple data sources conversationally, without pre-built reports. The response was encouraging, and so was the pushback. The most common question boiled down to this:
If you’re querying across multiple systems in a conversation, how does the AI know what “revenue” actually means?
It’s the right question. And the answer is the concept I think will matter more than any single technology in this shift: Semantic Contracts.
The Problem Hiding in Plain Sight
Let’s say your organization has three systems connected through MCP: your financial planning platform holds a cube of actuals, your CRM tracks closed opportunity amounts, and your ERP contains invoice line items.
All three have something that could reasonably be called “revenue.” The planning system shows recognized revenue net of returns, in USD at month-end exchange rates, with intercompany eliminated. The CRM shows the dollar value a salesperson attached to a deal when it closed. The ERP shows individual invoice amounts, some of which might include tax, some of which might not, depending on the jurisdiction.
These are three different numbers. They answer three different questions. And they are all sitting behind a field that someone, somewhere, labeled “revenue.”
In a traditional BI environment, this ambiguity gets resolved invisibly. A data engineer writes an ETL job that pulls from one specific source, applies transformation rules, converts currencies, and loads the result into a warehouse table called FACT_REVENUE. The meaning of “revenue” is now encoded in pipeline logic that almost nobody reads and even fewer people understand.
The business user who opens a dashboard and sees a revenue number has no idea which source it came from, what rules were applied, or what was excluded. They just trust that someone got it right. This has always been the hidden fragility of enterprise BI. The meaning lives inside the plumbing.
The Research That Confirms the Risk
This isn’t a theoretical concern. Researchers have now measured what happens when AI encounters enterprise data without semantic grounding.
In 2023, Juan Sequeda, Dean Allemang, and Bryon Jacob published a benchmark study testing GPT-4’s ability to answer enterprise questions on SQL databases. Using an insurance-domain schema with realistic business queries, they found just 16% accuracy with zero-shot prompts directly against SQL. When the same questions were posed over a knowledge graph that encoded business meaning explicitly, accuracy rose to 54%. A 2024 follow-up pushed that to 72% by adding ontology-based query validation. Their conclusion was direct: formal representations of business meaning are a precondition for accurate AI-powered question answering.
AtScale’s testing reinforces this. A system grounded in a semantic layer achieved 92.5% accuracy, compared to 20% for a control working only with table schemas. On complex queries requiring joins across four or more tables, the ungrounded LLM was wrong roughly 80% of the time. Field names are not definitions, and schema structure is not business logic. Without explicit meaning, AI doesn’t occasionally stumble. It fails systematically.
Let's Define Semantic Contract
A semantic contract is a precise agreement about what a piece of data means, independent of where it lives or how it gets processed. For our revenue example, a semantic contract would state something like:
“For purposes of financial reporting, Revenue is defined as the sum of GL accounts 4000 through 4999 from the financial actuals, representing recognized revenue net of returns, in USD at month-end exchange rates, with intercompany transactions eliminated.”
No code. No pipeline. Just a clear, readable statement of meaning that answers the questions business users argue about in meetings: Does this include sales tax? Is this recognized or booked? Are intercompany transactions in or out? What currency, and at what rate?
If you’ve ever written documentation that says, “when we report Net Revenue, we mean X and not Y,” you’ve already been wrestling with semantic contracts. The concept isn’t new. What’s new is the idea that these agreements can be formalized in a way that machines, including AI, can read and enforce.
Terminology Matters: Semantic Contracts and Data Contracts
Readers familiar with the data engineering community will notice an overlap with the concept of data contracts, a term coined by Andrew Jones in 2021 and explored in his O’Reilly book, Driving Data Quality with Data Contracts. Jones defines data contracts as human-readable and machine-readable agreements between data producers and consumers, formalizing expectations around schema, quality standards, ownership, and change management. The pattern has gained significant traction within data mesh and data product architectures.
Semantic contracts share DNA with data contracts but operate at a different layer. Data contracts govern the structural interface: what shape does the data take, who owns it, what quality thresholds apply, and how changes are communicated. A data contract can guarantee that the “revenue” field is always a decimal, always populated, and always delivered on schedule. It cannot, by itself, tell you whether that number represents recognized revenue, booked revenue, or gross bookings before returns.
Semantic contracts address this complementary layer: the agreement about what business terms actually mean across systems. A mature data governance program needs both. Data contracts ensure reliable delivery. Semantic contracts ensure trustworthy interpretation. The distinction matters most when AI enters the picture, because an LLM doesn’t care whether your pipeline ran on time. It cares whether “revenue” means the same thing across every source it can access.
This Changes Everything for Conversational Data
Without semantic contracts, an AI connected to multiple sources is guessing. It might pull the CRM’s opportunity value because the field name matches. It might blend numbers from two sources without realizing they measure different things. The Sequeda benchmark demonstrated this precisely: when the model encountered enterprise-complexity schemas without semantic guidance, it fabricated table joins that didn’t exist and applied calculations that didn’t match the business intent.
With semantic contracts, the guessing stops. The AI knows: “Revenue means this specific calculation, sourced from these specific fields, with these specific business rules applied.” It’s not interpreting. It’s executing against a defined meaning. This is what makes the conversational model from my previous post viable at enterprise scale. Walking through data only works if the system understands not just where data lives, but what it means.
It’s worth being honest about the current state. The research shows a clear progression: 54% accuracy with a knowledge graph, 72% with ontology-based validation, 92.5% with a full semantic layer and query engine. Each step adds more explicit meaning, and each step gets closer to trustworthy. The gap between defining meaning and reliably enforcing it across every query is still being closed. But the difference between grounded and ungrounded AI is not marginal. It’s the difference between a tool you can trust and one you cannot.
Meaning vs. Mechanics
Here’s the shift that matters most, and it’s the one that traditional data architecture has the hardest time seeing.
In the classic data warehouse model, meaning and mechanics are fused together. The definition of “revenue” is embedded in ETL code. If you want to change what revenue means, or add a new interpretation, you’re changing pipelines. That’s a development project. It goes through IT. It takes weeks.
Semantic contracts separate those two things. The meaning exists as its own artifact, portable and readable, independent of any specific pipeline or storage system. The mechanics can change underneath without breaking the agreement about what things mean. And the agreement about meaning can evolve without requiring a re-architecture of the physical infrastructure.
This separation is what makes governance portable rather than embedded. In the old model, governance was tied to the pipeline: you governed data by controlling how it flowed and who built the transformations. I am suggesting in the new model, governance lives in the contracts themselves. The question stops being “are you allowed to see this data” and starts being “when you ask this question, are you getting a trustworthy answer.”
The market is converging on this insight rapidly. dbt Labs open-sourced MetricFlow under the Apache 2.0 license in late 2025, explicitly framing it as critical infrastructure for AI accuracy. Snowflake formalized Open Semantic Interchange as a cross-vendor standard. AtScale, Collibra, Denodo, and others are building semantic layer products targeting this exact problem. The Model Context Protocol itself has emerged as a standardized interface through which AI agents can query semantic definitions from governed models. The infrastructure is arriving. The organizational practice of defining, owning, and maintaining semantic agreements is what most enterprises still lack.
Who Owns the Meaning? Spoiler Alert: The Business
Technology can provide the framework, but the harder question is organizational: who writes semantic contracts, who maintains them, and who arbitrates when definitions conflict?
When the only consumers of data definitions were dashboards and reports, inconsistencies could be managed through tribal knowledge. Senior analysts knew which numbers to trust. But AI doesn’t have tribal knowledge. It can’t call the finance team to ask which revenue figure to use. It takes whatever definitions it finds and applies them at machine speed, confidently and without hesitation.
That means ownership has to be explicit. Someone, whether it’s a data governance team, a business data steward, or a cross-functional working group, needs to be accountable for defining what key business terms mean, documenting those definitions in a machine-readable form, and managing the change process when definitions evolve. This is a governance role, not a technical one. Building the infrastructure to enforce semantic contracts is an engineering problem. Deciding what “revenue” means is a business decision that belongs to the business.
You've Already Done This (You Just Didn't Call It That)
Every organization already has semantic contracts. They’re just informal. They live in the heads of senior analysts, in Confluence pages nobody reads, in the implicit logic of reports that “everyone knows” use a particular definition.
The shift is making those implicit agreements explicit, machine-readable, and enforceable. Not because the concept is new, but because for the first time, there’s a consumer that needs them to be formalized: the AI that’s navigating your data on behalf of your business users.
"Build It and They Will Come" - The Bridge This Builds
Semantic contracts are what make the conversational shift practical rather than theoretical. They’re the governance layer that lets you move analytical initiative from IT to the business without losing accuracy or trust. They’re the reason you can walk through data across multiple sources and get answers that mean what you think they mean.
Without them, talking to data is just a party trick. With them, it’s a paradigm shift.
And if that sounds like a bold claim, the next post in this series will push even further, into why the dimensions we use to analyze data aren’t as fixed as our systems assume them to be. Hope that gets your attention.


