Why AI reproducibility is the holy grail of good governance
Leila Nouri2024-04-25 | 7 min read
Overview
The scientific method requires that every experiment be reproducible and produce the same results before a theory can become a fact. This method can be applied to AI models, which form the foundation of explainable AI.
By definition, reproducibility requires capturing every artifact used to build and operate an AI model and a model's purpose. Carnegie Mellon’s ML blog describes reproducibility as “a minimum requirement for a finding to be believable and informative.” Reproducibility gives other data scientists what they need to reproduce a model’s results — this includes the exact version of the data, code, tools, libraries, programming languages, and operating systems used.
True reproducibility means anyone can return to a point in time — anywhere in the AI/ML lifecycle — and see how a model was built and understand its purpose and KPIs. Yet, most AI models are built outside of controlled environments and systems of record. Enterprise AI platforms like Domino solve this by automatically unifying and capturing all model provenance and all artifacts across teams, users, tools, and environments without manual detective work, which can produce mixed results.
Why is reproducibility a challenge?
Without a single source of truth, even the best AI models operate in a bespoke manner instead of an industrialized manner suitable for complex enterprise settings (e.g., with thousands of models, AI consumers, and AI builders). Unsurprisingly, manually reproducing AI experiments can become an archaeological expedition that takes weeks and months. Nothing is constant; users find it hard to recreate the same conditions manually (down to the version of code, configurations, third-party packages, infrastructure settings, and data snapshots used).
First, Domino Model Sentry includes “model cards,” which automatically record all AI lineage for models, codes, experiments, and environments. Next, Domino simplifies reproducibility with “project templates,” which centrally codify betting practices so they are used for every model, every time. These templates enforce responsible AI policies and create consistency. Domino Model Sentry also includes “model registry,” which tracks all models and their lineage, experiments, and objects in a single glass pane across the entire AI life cycle. Model registry also includes audit trails that simplify compliance documentation.
There are four reasons why automating reproducibility is critical to enterprise-grade AI:
Compounded knowledge
Reproducibility helps data scientists build an AI knowledge base across organizations. Past work is effortlessly discoverable and reusable, and less time is wasted reinventing the wheel. With Domino model cards, even trace results (charts, derived data, model files, etc.) can be reconstructed using the exact training jobs and conditions that generated them. Model cards also allow users to see all past versions of deployed model APIs and apps and access their state at each previous deployment.
Making AI reproducible also enables technical stakeholders to collaborate better with nontechnical stakeholders to share and reuse work more quickly. This capability creates a foundation of AI knowledge and saves time by allowing users to reuse and build upon past work and standardize the best models.
Data science innovation happens when teams can iterate, then collaborate and share their findings instead of duplicating unproductive work or fruitless experiments proven to be faulty.
When all work is easily reproducible and discoverable by others, it becomes painless to build on prior work and innovate faster. This also means institutional knowledge cannot disappear once teams change and employees leave, de-risking AI.
Lastly, Domino Model Sentry logs all communications and conversations between teams and actors in one project canvas where all actions are tracked, progress is measured, and all work takes place. Logs remove knowledge silos and make it easy to collaborate and iterate faster.
Simpler compliance
Highly regulated industries like banking, pharmaceuticals, and government require AI transparency and must often demonstrate how models were developed and used for their regulators. Even the algorithms used and why they were selected are essential for de-risking AI. Having all model lineage readily available lets regulators reconstruct AI models and lets enterprises validate their work.
Easier AI validation
Data science teams should validate all models before deployment to ensure the results are repeatable so the model will achieve its intended purpose. Too often, this is a time-intensive process that requires validation teams to manually piece together model lineage, including the environment, tools, data, and other artifacts used to create the model. This often prevents models from being promoted into production on time, and many models never make it to production. When an organization can instantly reproduce a model, validators can instead focus on ensuring that models are robust, performant, and accurate.
Built-in version tracking
Version control — or tracking changes to artifacts like code, data, labels, models, hyperparameters, dependencies, and environments for training and inference — must be automated. Domino seamlessly integrates version control across the entire AI lifecycle and automatically tracks the active revision of each component when data scientists execute their code. This helpful functionality means that anyone can restore one part of a project back to an old state, run current code on an older version of the data, and view and revert file revisions. Domino eliminates all of this manual work and enables reproducibility at enterprise scale.
Conclusion
Reproducibility is the cornerstone of delivering enterprise-grade AI. It is the key to reusing past work and building a foundation of AI knowledge, simplifying compliance and iteration so the best models get into production faster. Domino guarantees reproducibility for all AI models by automatically capturing all model lineage and artifacts, while embedding customer policies and processes into AI workflows for consistency and compliance. Check out Domino’s reproducibility page to learn more.
Leila Nouri, Director of Product Marketing at Domino Data Lab, is an innovative and data-driven product marketing leader with 15+ years of experience building high-performing teams, go-to-market campaigns, and new revenue streams for startups and Fortune 500 companies.