Mission success: How Domino automates the CDAO AI test and evaluation framework

Domino2025-06-26 | 10 min read

In April 2024, the Department of Defense’s Chief Digital and Artificial Intelligence Office (CDAO) released Test and Evaluation of AI Models, a document outlining what trustworthy AI means in practice for mission-critical environments. While not a binding policy, it sets clear expectations for model performance, robustness, and resilience across the AI lifecycle. For Defense teams, the challenge is turning this strategy into an executable, repeatable workflow?

The CDAO’s guidance is structured around six key areas: Thinking about Performance, Thinking about Testing Methods, Thinking about Data, Thinking about AI Models, Thinking about Context, and Thinking about Documentation. This blog post mirrors that structure, demonstrating how Domino Data Lab’s Enterprise AI Platform provides the capabilities to address the CDAO's recommendations to streamline AI delivery — even in secure, air-gapped environments.

1. Performance: Beyond basic correctness

CDAO rightly asserts that "Correctness is just the tip of the iceberg” (CDAO, p. 8). A robust testing and evaluation strategy requires assessing a spectrum of performance dimensions, including bias, robustness, drift, and latency to ensure true mission readiness (CDAO, pp. 10-17).

How Domino empowers comprehensive performance evaluation

Lifecycle monitoring and mitigation: Domino enables continuous model monitoring to track diverse performance metrics over time, facilitating early detection of drift and proactive model retraining.
Ensuring robustness and addressing bias: Domino underpins strong AI governance by enabling systematic testing in version-controlled environments — practices crucial for evaluating and mitigating bias, ensuring models perform reliably and fairly as intended.

Real-world example: U.S. Navy's Project AMMO

The U.S. Navy’s Project AMMO uses Domino to accelerate AI model updates for autonomous underwater vehicles conducting mine countermeasures. Domino helps maintain model robustness by integrating multiple ML tools and providing version control, reducing model update cycles from six months to two weeks, ensuring sonar and imagery intelligence remains accurate at the edge.

2. Testing methods: Ensuring real-world readiness

The CDAO framework advocates for a diverse range of testing methods, from A/B testing to red teaming, to accurately reflect real-world operational conditions (CDAO, pp. 20-28).

How Domino facilitates sophisticated testing strategies

Flexible and automated testing: Domino’s open platform allows teams to employ custom test harnesses and integrates with CI/CD systems for automated, continuous evaluation.
Diverse scenario evaluation: The platform supports testing across varied scenarios, including live and simulated environments, for thorough assessment before deployment (CDAO, p. 28).

Real-world example: Lockheed Martin

Lockheed Martin uses Domino to accelerate the Test and Evaluation (T&E) of AI models for defense programs, including improving target recognition. By integrating Domino into secure R&D workflows, teams can simulate mission-representative scenarios and rapidly iterate based on automated experiment tracking and CI/CD integration. This streamlined pipeline delivers over $20M in annual value through faster, more reliable model evaluation.

3. Data: The foundation of trustworthy AI

Data is the fundamental building block of AI, and the CDAO framework details a comprehensive data lifecycle — insisting that data be complete, operationally realistic, and well-documented (CDAO, pp. 30-37).

How Domino ensures data integrity and lifecycle management

Secure data connections: Domino allows you to create and manage secure connections to common external data sources with a consistent access pattern.
End-to-end data lifecycle support: Domino provides a unified environment for the entire data workflow, including robust version control for datasets and code essential for reproducible data preparation (CDAO, pp. 32-34).
Data governance and operational realism: With strong data governance features and flexible data connectors, Domino ensures data handling meets transparency expectations and that training data accurately reflects the operational context.

Real-world example: U.S. Navy's Project AMMO

Project AMMO uses Domino in AWS GovCloud to securely manage high-volume sonar data from unmanned underwater vehicles. The platform enables full dataset traceability, linking data to specific missions — critical for auditability. This ensures models are continuously calibrated with real-world operational data, meeting CDAO standards.

4. AI models: Managing complexity from development to deployment

Effective T&E requires a deep understanding of the entire AI model lifecycle, from architecture selection and training to deployment maintenance (CDAO, pp. 42-48).

How Domino streamlines the AI model lifecycle

Reproducible development and rigorous testing: Domino ensures reproducible research by automatically tracking experiments and versioning all artifacts, facilitating rigorous evaluation against operational requirements (CDAO, p. 45).
Governed deployment and continuous maintenance: The platform enables the governed deployment with built-in model monitoring capabilities to support proactive maintenance (CDAO, pp. 47-48).

Real-world example: U.S. Navy's Project AMMO

The project uses Domino as a centralized MLOps “factory” to manage the full AI model lifecycle across unmanned underwater vehicles (UUVs). Its built-in governance ensures every model, dataset, and experiment is versioned and auditable, supporting secure deployment in classified and edge settings.

5. Context: Ensuring relevance in the real world

A model’s effectiveness is judged within its specific operational context. The CDAO framework stresses understanding the use case (CDAO, p. 56) and accounting for environmental constraints such as network availability, compute resources, and security protocols (CDAO, p. 57).

How Domino adapts to demanding operational contexts

Deploy anywhere, for any mission: Domino is engineered for diverse public sector environments, supporting deployment from cloud to on-premises and fully air-gapped systems, ensuring T&E aligns with the true fielded context. Containerized models can run in the cloud, on-premises, or at the edge.
Security and compliance built in: With a focus on security and compliance, Domino supports agency-specific accreditation needs.

Real-world example: U.S. Navy's Project AMMO

Project AMMO relies on Domino to deploy AI models in challenging environments, including air-gapped systems on UUVs. Domino’s containerized deployment enables models to run reliably on constrained hardware with limited connectivity, while its security features support Navy accreditation requirements.

6. Documentation: Building trust and transparency

Comprehensive documentation — including Data Cards, Model Cards, version control, and detailed test reports — is non-negotiable for building trustworthy and transparent AI systems (CDAO, pp. 58-63).

How Domino automates and embeds documentation

Automated artifact tracking for richer records: Domino’s AI workbench acts as a system-of-record, automatically capturing and versioning all critical artifacts (code, data, results, environments) — providing a detailed, traceable history essential for generating comprehensive Data and Model Cards (CDAO, pp. 60-63).
Enhanced auditability and governance: All changes are logged, providing complete audit trails that support compliance, facilitate governance, and ensure development processes are transparent and accountable.

Real-world example: U.S. Navy's Project AMMO and Lockheed Martin

The US Navy’s Project AMMO uses Domino to automatically track all experiments and data for rigorous auditability. Similarly, Lockheed Martin leverages Domino’s version control to maintain transparent workflows, enabling seamless collaboration and traceability. These capabilities directly support the CDAO’s call for detailed documentation.

The bottom line

The CDAO’s T&E guidance is clear: trustworthy AI requires more than high accuracy. It requires deliberate, repeatable, and realistic evaluation throughout a model’s lifecycle. Domino provides the infrastructure to operationalize this vision — helping federal teams evaluate AI systems with confidence, even in the most sensitive environments.

Whether you’re red-teaming a new LLM, evaluating the mission readiness of a sensor fusion model, or preparing documentation for approval, Domino provides the tools to do it securely, scalably, and with full traceability.

Check out Domino’s public sector page to learn more.

Domino

Domino Data Lab empowers the largest AI-driven enterprises to build and operate AI at scale. Domino’s Enterprise AI Platform provides an integrated experience encompassing model development, MLOps, collaboration, and governance. With Domino, global enterprises can develop better medicines, grow more productive crops, develop more competitive products, and more. Founded in 2013, Domino is backed by Sequoia Capital, Coatue Management, NVIDIA, Snowflake, and other leading investors.

Summary

Mission success: How Domino automates the CDAO AI test and evaluation framework

1. Performance: Beyond basic correctness

How Domino empowers comprehensive performance evaluation

Real-world example: U.S. Navy's Project AMMO

2. Testing methods: Ensuring real-world readiness

How Domino facilitates sophisticated testing strategies

Real-world example: Lockheed Martin

3. Data: The foundation of trustworthy AI

How Domino ensures data integrity and lifecycle management

Real-world example: U.S. Navy's Project AMMO

4. AI models: Managing complexity from development to deployment

How Domino streamlines the AI model lifecycle

Real-world example: U.S. Navy's Project AMMO

5. Context: Ensuring relevance in the real world

How Domino adapts to demanding operational contexts

Real-world example: U.S. Navy's Project AMMO

6. Documentation: Building trust and transparency

How Domino automates and embeds documentation

Real-world example: U.S. Navy's Project AMMO and Lockheed Martin

The bottom line

Other posts you might be interested in

Operational AI: Scaling AI in government agencies

Transform AI readiness into operational impact for government

The White House AI Action Plan: A data science roadmap for federal agencies