AI Governance
June 17, 2026 | 24 min read

Why enterprise AI governance fails and how to build it so it doesn't

How to build enterprise AI governance that enforces itself

Return to blog home

Most data science teams have a governance program. Few have governance that actually runs at deployment time. An AI governance framework is the combination of policies, technical controls, and organizational accountability structures that govern how models are built, validated, deployed, and monitored. The distinction that matters is whether those controls are enforced in the workflow or documented in a SharePoint folder. That is where models actually get built and shipped.

This guide is written for data science leaders and risk officers who need to build or mature governance programs covering both traditional ML models and generative AI systems. Every organization faces real compliance pressure: financial services firms operate under SR 11-7 validation requirements and the 2026 SR 26-2 for AI; life sciences organizations need audit-grade traceability for FDA submissions; and regulated enterprises across sectors now contend with the EU AI Act's risk-based compliance requirements. What follows is a framework you can actually implement, not a conceptual model.

Why most enterprise AI governance frameworks fail before they start

Most governance programs start with a policy document and end there. The document defines roles, establishes review committees, and references NIST AI RMF categories. Then it sits in a SharePoint folder while models get built and deployed outside any formal process.

The core failure mode is treating governance as a compliance exercise rather than an operational capability. Data science leaders who have shipped models at scale will recognize all three of the patterns that follow:

  1. Governance defined at the program level, not enforced at the workflow level. A policy that says "all models must be documented before deployment" does nothing if the deployment pipeline doesn't require that documentation to exist.
  2. Ownership without accountability. Cross-functional governance committees are common. Committees with actual authority to block a model release are rare.
  3. No visibility into what's actually running. An enterprise AI governance program that cannot answer "what models are in production right now, which version of each is live, who owns them, whether the deployed version matches what was validated, and when they were last reviewed" is not governing anything.

The practical fix: start with the controls you can enforce technically, including audit trails, model registration, environment reproducibility. Build the policy layer on top of what the platform can actually surface.

What an AI governance framework actually needs to cover

An enterprise AI governance framework has five functional areas: model lifecycle controls, risk tiering, access controls and data lineage, audit trails, and governance review gates. Organizations that try to govern AI without all five end up with gaps that surface during audits or in production failures.

Model lifecycle controls

Model lifecycle controls define the stages a model passes through from development to retirement, and what approval is required at each transition. The stages can include development, validation and testing, staging, production, monitoring, and deprecation. Each transition requires a documented decision: by whom, based on what evidence, and with what sign-off.

The governance mechanism is a model registry. Every model that reaches production should exist as a registered artifact in a system that tracks its version, training data, evaluation metrics, and approval chain. Without a registry, governance questions ("which version of this model is in production?") require manual archaeology through notebooks, email threads, and Slack history. When a validator or auditor asks that question during a review cycle, the absence of a registry means days of reconstruction effort and the real possibility that the answer is wrong.

Risk tiering for ML and GenAI

Risk tiering assigns models to categories that determine the rigor of their validation and monitoring requirements. A credit scoring model at a bank and a customer churn model for internal analytics have different exposure profiles and should not have the same review requirements.

Regulatory frameworks like the EU AI Act offer a useful reference point: they classify systems into risk tiers based on potential harm, with proportional requirements at each tier. In practice, regulatory categories do not map cleanly onto internal model portfolios, and most organizations build their own tiering rubric. The most workable approach uses a two-axis model: business impact (revenue, regulatory exposure, reputational risk) against model criticality (how bad if it fails or drifts).

GenAI systems introduce a complication. The same foundation model can operate at different risk tiers depending on the use case. A summarization tool used internally is low-risk. The same model architecture deployed to generate patient-facing clinical summaries is high-risk. Risk tiering for GenAI needs to account for the deployment context and the population affected, not just the model architecture. In a model registry, this means that a single foundation model may appear at multiple risk tiers, each tied to a specific deployment and use-case definition. The model version and the deployment context together determine the applicable governance requirements.

Access controls and data lineage

Access controls determine who can see, modify, or run which models and datasets. This matters for both security and compliance: GDPR and CCPA restrict which datasets can be used for which purposes, and SR 11-7 requires that model validators be organizationally independent from model developers.

Data lineage tracks which datasets were used to train and validate each model version. Without it, the audit questions that matter most cannot be answered: was validation data independent from training data? Did the training set include individuals who have submitted deletion requests? Domino captures data lineage automatically at execution time, linked to the model artifact rather than maintained as a separate process.

Audit trails and reproducibility

An audit trail is the recorded history of a model's development and deployment lifecycle. Reproducibility means that any result, whether a training run, evaluation, or deployment configuration, can be recreated exactly from the recorded inputs. These two capabilities are inseparable for regulated enterprise AI.

Reproducibility is the foundation of model validation. If a validator cannot independently recreate a model's training run from the documented inputs, the validation is incomplete. In practice, this fails most often because environment configurations were not pinned at runtime, library versions have since changed, or the training data snapshot was not formally versioned. These are engineering problems, and they require engineering solutions. For FDA submissions in life sciences, this is a requirement. For SR 11-7 compliance in financial services, the same standard applies to model documentation.

Domino's approach is that every experiment automatically captures code version, data version, environment configuration, and results. The audit trail is a byproduct of how work gets done, not a separate process layered on top.

ML governance best practices for cross-functional teams

Defining ownership across data science, legal, and compliance

Ownership of AI governance in a large organization is not always clear. Data science teams own model quality. Legal and compliance teams own regulatory exposure. IT owns infrastructure and security. None of these roles translate cleanly into "owns governance."

One structure that works in practice includes a governance function that is organizationally separate from the data science team but has direct authority over deployment decisions for high-risk models. This function defines the standards, reviews the evidence, and has authority to delay or block a model launch. This model is standard in financial services (model risk management groups operate this way) and is being adopted in life sciences as AI initiatives in drug development scale.

Cross-functional governance requires explicit RACI ownership at each lifecycle stage. Who is responsible for initial risk classification? Who approves the model for staging? Who has final authority to approve production deployment? These cannot be answered generically. They need to be defined at the program level and enforced by the platform.

Governance gates at each lifecycle stage

Governance gates are required approval checkpoints at model lifecycle transitions. They operationalize the RACI by making certain stage transitions impossible without explicit sign-off. This is the mechanism that turns policy into enforcement.

There are four gates every enterprise AI governance program should implement:

  1. Model registration gate: a model cannot be registered in the model registry without completing required documentation, including training data, evaluation metrics, intended use, and risk classification.
  2. Staging gate: a model cannot be promoted to staging without passing automated testing requirements and receiving approval from the model validation function.
  3. Production gate: a model cannot be promoted to production without full governance review and sign-off from the appropriate authority tier based on its risk classification.
  4. Monitoring gate: post-deployment, high-risk models must have model monitoring configured for performance degradation and model drift before production promotion is complete.

Platforms like Domino implement these gates as workflow controls, not suggestions. To be promoted models must complete a registration checklist. In Domino, the governance is built into the operational process.

A parallel problem is shadow AI, which includes models and GenAI tools running inside the enterprise outside any formal governance process. The governance response is platform centralization. When the governed platform is easier to use than workarounds, most practitioners will use it. Domino's centralized infrastructure model makes shadow AI significantly harder to sustain at scale with the ability to regularly audit compute usage and model deployments within the registry surface of the platform.

Governing generative and agentic AI: what changes and what doesn't

New risk categories in GenAI systems

Generative AI introduces risk categories not included in traditional ML governance models. In traditional ML, model outputs are deterministic (or near-deterministic) and bounded by the training objective. In generative AI, outputs are probabilistic, open-ended, and shaped by prompts that change at runtime. This means governing at the use-case level, not just the model level. The same foundation model can present very different risk profiles depending on who uses it and for what.

The risk categories specific to GenAI systems are:

Prompt injection is particularly consequential in agentic deployments, where the model executes actions rather than returning text. Adversarial inputs can redirect model behavior away from its intended use in ways that are difficult to detect after the fact.

Hallucination risk is highly use-case dependent. Factually incorrect outputs presented with apparent confidence are a manageable nuisance in an internal summarization tool and a serious liability in clinical, legal, or financial decisioning contexts.

Output unpredictability makes deterministic testing difficult. The same model can produce meaningfully different outputs for semantically similar prompts, which means coverage-based validation approaches from traditional ML do not transfer cleanly.

Data leakage via prompts is a controls problem. Sensitive data entered in prompts may be retained in API logs, model context windows, or fine-tuning pipelines without explicit controls in place.

These risks add to traditional ML governance requirements. A GenAI system still requires drift monitoring, access controls, and version tracking.

Traditional ML

Generative AI

Agentic AI

Output type

Bounded (class, score, value)

Open-ended, prompt-shaped

Actions + outputs, multi-step

Primary failure modes

Model drift, data leakage, fairness violations

Hallucination, prompt injection, output variability

Action scope violations, compounding errors

Key governance controls

Model registry, validation, drift monitoring

Prompt logging, output monitoring, use-case risk tiering

Action scope definition, execution trace, rollback capability

Regulatory frameworks

SR 11-7, 21 CFR Part 11, EU AI Act

EU AI Act (high-risk classification), SR 26-2

Emerging; no settled standard yet

Governance layer

Foundation

Extends ML governance

Extends GenAI governance

Agentic AI governance considerations

Agentic AI systems are AI models that take sequences of actions, call external tools, and make decisions with limited human intervention. Governing them requires more than standard model lifecycle controls, because the action space is larger and consequences can compound across steps.

Four governance requirements apply specifically to agentic deployments:

  1. Action scope definition: what actions is the agent permitted to take? This should be defined at deployment time and enforced by the platform.
  2. Human oversight triggers: at what decision thresholds does the agent escalate to a human rather than acting autonomously? These need to be specified and logged for each deployment.
  3. Full execution trace: every action the agent takes should be logged with the model version, input context, and output that produced it. This is the audit trail equivalent for agentic workflows.
  4. Rollback capability: if an agent makes a consequential error, the organization needs the ability to identify what happened, why, and how to reverse the effects where possible.

AI governance framework in regulated industries

Life sciences: FDA, GxP, and model validation

Life sciences organizations face the most technically demanding AI governance requirements of any industry. FDA's 21 CFR Part 11 requires validated software with full audit trails for systems used in regulated processes. GxP compliance requires that the entire data lifecycle, from collection through analysis to reporting, be documented, reproducible, and auditable.

For AI models used in drug development, the governance requirements include documented and reproducible training and validation workflows, change control for any model update (a version change requires revalidation, not just redeployment), data provenance controls, and validation documentation sufficient for regulatory submission.

Domino's audit trail and reproducibility capabilities make validation documentation a reporting exercise. Every training run and evaluation is automatically logged and linked to the model artifact, so the evidence required for regulatory submission exists as a byproduct of how work gets done. Domino's life sciences platform is built around these requirements.

Financial services: model risk management and SR 11-7

Financial services organizations have operated under formal model risk management requirements since the Federal Reserve's SR 11-7 guidance in 2011. SR 11-7 establishes three pillars: model development and implementation, independent model validation, and ongoing model governance. ML models present specific challenges here: conceptual soundness review requires explaining model logic to a validator, ongoing monitoring must include automated drift detection, and high-risk models in credit decisioning require regular audits against documented baselines. For a deeper foundation, Domino's model risk management solution covers the platform capabilities and regulatory requirements in detail.

The 2026 SR 26-2 guidance extends SR 11-7 to cover AI systems more explicitly, with heightened requirements for model explainability and human oversight in automated decisioning. What changes with SR 26-2 covers the specific implications. Moody's, using Domino's platform, achieved a 4x increase in model validation frequency, a direct outcome of reproducible, well-documented workflows that eliminated the reconstruction effort that had previously consumed validation team bandwidth. When the documentation artifacts are generated at execution time, the validation team spends its time on substantive review. Domino's banking and financial services platform is built for these compliance requirements.

Public sector: NIST AI RMF, security, and mission-scale governance

Public sector and defense organizations face AI governance requirements shaped by federal policy. The NIST AI RMF is the primary voluntary framework for federal agencies, and Executive Order 14110 on AI safety established additional requirements for agencies developing or procuring high-impact AI systems. For organizations like the U.S. Department of the Treasury, which uses AI models in financial surveillance, sanctions screening, and economic analysis, governance requirements include strict auditability, model explainability for regulatory decisions, and data security controls that extend to classified and sensitive financial data.

The deployment constraint that defines public sector AI governance is environment: models often run in on-premises, GovCloud, or air-gapped environments. Governance infrastructure has to work across hybrid deployments without creating separate audit processes for each environment. Domino is deployed in DoD IL5 environments and supports hybrid and on-premises deployments, giving agencies a single governed platform regardless of where compute runs. Domino's public sector platform covers the specific deployment and compliance requirements for government agencies.

Why enterprise AI governance requires a platform, not just a policy

An AI governance framework built entirely on policy documents and manual review processes has a structural flaw: it depends on practitioners choosing to follow the process every time, for every model. At scale, with dozens of data scientists and hundreds of models in production, that assumption fails. Model delivery is an engineering problem. Accountability and control are a governance problem. They require the same underlying infrastructure.

The common objection is that disciplined teams can implement governance without a dedicated platform. This is true for a small portfolio of high-visibility models with stable ownership. It becomes operationally unsustainable when the model count reaches the dozens, ownership turns over, and the pressure to ship accelerates. The big question is whether governance is consistent.

Workflow-native governance changes the equation. The platform enforces documentation requirements before a model can be registered, so governance happens at the point of work. Audit trails are generated at execution time. Drift detection runs as a platform service, which means monitoring happens consistently across all deployed models rather than only for the ones someone remembered to instrument.

Organizations with standardized governance workflows get models to production faster, because the review process is efficient and documentation artifacts are generated automatically. Moody's 4x model validation frequency is the outcome of a governed platform. Broader AI adoption becomes sustainable when governance scales with it.

For enterprises expanding into generative AI and agentic systems, the infrastructure question becomes more urgent. AI-powered applications built on GenAI require prompt logging, output monitoring, and action auditing that are impractical to implement manually across dozens of deployments.

That infrastructure has to be embedded at the platform layer before agentic use cases can scale. Domino's AI governance capabilities, including model registry, automated audit trails, environment reproducibility, and integrated monitoring, are infrastructure-enforced controls. They work because they are built into how work gets done. Understanding what governed infrastructure makes possible for AI-powered applications is the right starting point for any organization evaluating its MLOps and governance readiness.

Frequently asked questions

What is an AI governance framework for enterprise ML and GenAI?

An AI governance framework for enterprise ML and GenAI is the combination of policies, technical controls, and organizational accountability structures that govern how AI models are built, validated, deployed, and monitored. For ML systems, this covers the full model lifecycle from development through retirement. For GenAI, it additionally covers prompt management, output monitoring, and the specific risks of probabilistic, open-ended model outputs, including hallucination, prompt injection, and data leakage. An effective framework is enforced at the platform level across all models and teams, regardless of who built them or how they were deployed.

What is the difference between ML governance and GenAI governance?

ML governance and GenAI governance share the same foundational requirements: model registration, audit trails, lifecycle controls, access management, drift monitoring. They differ in the risk categories they address. ML models produce outputs from a defined class or value space; their failure modes, including model drift, data leakage, and fairness violations, are well-understood and largely covered by established frameworks like SR 11-7 and 21 CFR Part 11. GenAI systems produce open-ended outputs shaped by runtime prompts, introducing risks including prompt injection, hallucination, output variability, and data leakage through prompts. GenAI governance adds a layer on top of ML governance. The same traceability and lifecycle controls apply; the risk assessment criteria and monitoring requirements must account for the probabilistic, context-dependent nature of generative outputs.

How does the NIST AI RMF apply to enterprise AI governance?

The NIST AI Risk Management Framework provides a voluntary structure for identifying, assessing, and managing AI risks across four functions: Govern, Map, Measure, and Manage. It is not a compliance requirement in most contexts, unlike SR 11-7 for financial services or 21 CFR Part 11 for life sciences, but it provides a structured vocabulary for building an AI risk management framework and maps well onto the EU AI Act's risk-based classification approach. For enterprises operating across multiple regulatory regimes, it is a useful design reference, particularly for its emphasis on trustworthy AI characteristics: accuracy, reliability, explainability, fairness, privacy, security, and accountability.

Danny Stout
Danny W. Stout, Ph.D

Danny W. Stout, Ph.D, is a seasoned data science and analytics leader with over two decades of experience driving enterprise AI and machine learning initiatives. He held senior analytics and AI leadership roles across global organizations including Ernst & Young, Takeda, TIBCO, Quest, and Dell, spanning forecasting, pricing, analytics strategy, and data science consulting. His work emphasizes effectiveness over scale, focusing on governance, team alignment, and measurable outcomes as the determinants of successful AI adoption. Based in Charlton, MA, Danny holds a Ph.D. and combines technical leadership with practical insights that help organizations scale data science responsibly and effectively.

Domino Platform

The enterprise platform to build, deliver, and govern AI

Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.