The enterprise AI event for data science & IT leaders
Join us at Rev, where innovators from leading organizations share how they're driving results across industries.
AI governance frameworks built for single models are failing at scale because they cannot define what a model is, what counts as a change, or how a modification in one component propagates through a pipeline.
In January 2012, JPMorgan's Chief Investment Office replaced its Value-at-Risk model with a new one. The new model immediately halved the reported risk on a $50 billion synthetic credit portfolio. No one recorded the swap as a model change. No one measured how much the outputs shifted. No one reviewed why the change was made. The reduced risk figure emboldened traders to triple the portfolio to $157 billion in notional value. Total losses reached $6.2 billion.
Nearly a decade later, Zillow's home-buying algorithm was purchasing roughly 7,000 homes based on automated price predictions. The model depended heavily on housing market conditions that were shifting rapidly as pandemic-era demand cooled. No one was systematically measuring how sensitive predictions were to those environmental shifts. Every pricing error turned into a real financial commitment. Zillow wrote down over $500 million, shut down the Offers business, and laid off 2,000 employees.
Different industries. Different models. Different eras. The root cause was the same: no formal definition of what the model was, what constituted a change, or how to measure the downstream impact. Governance ran on judgment calls. Those judgment calls failed at scale.
The next one of these won't be a history lesson. The question is whether it happens to you, and whether you'll be ready when it does.

SR 11-7 has been the foundational guidance for model risk management in U.S. banking for 15 years. It directs institutions to validate their models, but it never formally defines what a model is as a computational object. It cannot specify what counts as a change, how to measure the magnitude of that change, or how a modification in one component propagates through a pipeline of interconnected models.
In a roundabout way, now is the moment for banks everywhere to get this right — not just European ones. The EU AI Act has set the most concrete governance standard that exists, and many U.S. firms operating globally are already building to it because no comparable U.S. framework has that level of specificity. The August 2026 deadline is a European date, but it's becoming a global benchmark. The EU AI Act introduces new obligations for high-risk AI systems taking effect August 2, 2026. Providers must complete conformity assessments, finalize technical documentation, and establish continuous monitoring. The Act also requires organizations to assess "substantial modifications" to their AI systems, yet does not define "substantial" with enough precision to automate the determination.
Without a formal definition of model identity, institutions cannot distinguish a substantial modification from a routine update except through expert judgment. That is the same failure mode that enabled the London Whale losses in 2012.
The governance surface area has expanded dramatically since SR 11-7 was issued. Modern AI systems are not single models. They are pipelines, ensembles, and agentic architectures where multiple models call each other, share data sources, and make routing decisions dynamically. Third-party model APIs can change without the consuming organization's knowledge. System prompts, data feeds, and configuration thresholds can shift model behavior without any version change on record.
The result: a gap now exists between what banks have known as a result of following SR 11-7 and what they should know in today’s AI environment, and soon must know in order to comply with the EU AI Act deadline. We’ve condensed this knowledge gap down to five questions that every bank's governance framework should be able to answer, and currently cannot. These are not edge cases. They are the core of how modern AI systems actually operate.
Each of these moves directly addresses one of those unanswerable questions and converts a judgment call into a computation.

The model boundary is the line between what is versioned as part of the model and what belongs to the operating environment: data sources, compute infrastructure, and configuration settings such as thresholds and system prompts.
Ask yourself: What is inside the model, and what is part of the operating environment? The answer is the version index. Everything versioned together under a single version number is the model. Everything else is the environment: data sources, compute infrastructure, and configuration settings such as thresholds and system prompts. If an organization cannot draw this line consistently, it cannot determine whether something changed inside the model or around it. Most institutions today cannot draw this line.
Every component that can change model outputs (including approval thresholds, system prompts, and data feeds) requires a version number and a change log. Model weights are only one of many components that affect what a model produces. The approval threshold on a credit model is a configuration setting that can flip decisions without any retraining. A rewritten system prompt on a customer-facing LLM can change behavior overnight. A refreshed data feed can shift outputs across the portfolio. If a component can change the output, it needs a version number and a log. Otherwise, the organization is in the same position as JPMorgan in 2012: something changed, and no one recorded it.
Before deploying any model change, measure the magnitude of the output shift across a representative population and set a threshold tied to business materiality. When a model is retrained or a component is updated, institutions should measure the magnitude of the output shift across a representative population before deployment. Set a threshold tied to business materiality. If the shift exceeds the threshold, trigger revalidation automatically. Not by committee. JPMorgan's VaR model swap produced a 50% output shift overnight. If that number had been measured and compared against a threshold, mandatory review would have been triggered before the portfolio tripled.
A small change at the front of a pipeline can amplify as it moves through downstream models. Institutions that evaluate models in isolation miss the compounding effects that drive the largest losses.
A data feed update that shifts feature values by a fraction of a percent can, after passing through a prediction model and a decision engine, flip real decisions for a meaningful share of the population. Regulators are moving toward enterprise-level model risk assessment. Organizations that evaluate models in isolation are missing the compounding effects that drive the largest losses.
In agentic AI systems, the routing layer, the component that decides which tool to call or which sub-model to invoke, carries governance risk and requires the same treatment as the models it chooses between.
When an LLM decides which tool to call, which sub-model to invoke, or which branch of a pipeline to execute, the routing decision itself carries risk. An agent that nearly selected a different tool is carrying latent risk that no traditional governance framework accounts for. If an organization is deploying agentic AI, the router needs the same governance treatment as the models it chooses between: versioned, monitored, and subject to sensitivity analysis.
These five moves are grounded in a formal framework I developed in a recent paper, "The Model Operator: Formal Foundations for Model Governance". The paper defines a model as a versioned, typed operator with a formally separable runtime environment. From that single definition, the model boundary, version sensitivity, propagation calculus, and risk functional all follow.
None of this is aspirational. It is computable. Institutions that operationalize these five moves now will be ahead of both the August 2026 deadline and the next governance failure at scale.
For organizations looking to put these principles into practice, Domino Data Lab provides the enterprise AI platform that supports governed model development, deployment, and monitoring across the full model lifecycle. Domino's governance capabilities, including model inventory, reproducibility controls, and approval workflows, provide the operational infrastructure these formal principles require.
If you work in model risk, AI governance, or regulatory compliance, I welcome the conversation.
Rev New York on May 19 is the perfect venue to start taking action: keynoted by SR 11-7 architect David Palmer, it will also feature a panel I'll lead about this topic with model risk management leaders who have experience at Capital One, TIAA and New York Life. Register now.
You can also reach me on LinkedIn or through Domino Data Lab.

Nicholas Goble, Ph.D. leads Solution Architecture for Financial Services & Insurance at Domino Data Lab, bringing more than ten years of experience across quantitative finance, derivatives modeling, and fintech innovation. At Venerable, Nicholas managed Quantitative Research and Development, where he established quant research capabilities from the ground up and guided teams in building sophisticated trading platforms and pricing engines. Before that, he was a Senior Quantitative Researcher at Chatham Financial, focusing on valuation methodologies and bringing machine learning models into live trading environments. Nicholas holds a Ph.D. in Physics from Case Western Reserve University.
Join us at Rev, where innovators from leading organizations share how they're driving results across industries.
Join us at Rev, where innovators from leading organizations share how they're driving results across industries.