What is the main challenge when moving agentic AI systems to production?

The biggest challenge is the gap between a working experiment and a live application, which typically requires weeks of DevOps coordination, containerization work, and infrastructure provisioning. Once in production, teams also face limited visibility into agent decisions, tool calls, and output quality, making drift detection and root-cause analysis difficult.

How does Domino simplify deploying an agentic AI system?

Domino lets you deploy directly from a validated experiment run in the Experiment Manager, selecting the exact configuration, hardware tier, and auto-scaling policies. There is no translation layer or re-instrumentation — the same code developed and evaluated goes straight into production, with containerization and infrastructure setup handled automatically.

Can I deploy an agent directly from an experiment run in Domino?

Yes. In your project, navigate to Experiments, select the run you want to deploy, and click the '...' menu to launch the deployment wizard. The wizard pre-fills the code commit from the selected run. You then specify the entrypoint, hardware tier, auto-scaling policy, and access permissions.

Do I need to add new instrumentation for production tracing?

No. The same @add_tracing decorator and DominoRun context used during development automatically collect traces from real user interactions in production. No additional instrumentation is required.

What types of evaluations does Domino support for production tracing?

Domino supports two types of evaluations via log_evaluation(): Metrics (numeric values used for aggregation such as mean, median, and sum — for example, a quality score) and Labels (string values used for filtering and grouping — for example, an incident category or classification).

How can I monitor production quality over time without slowing down responses?

You can schedule Domino Jobs to run evaluators asynchronously against production traces on a set cadence. This keeps production responses fast while automatically versioning and reproducing evaluations. All results appear in the Performance tab of the deployed agent dashboard.

What does the deployed agent dashboard show?

The dashboard has three views: Overview (deployment status and configuration details), Performance (evaluation metrics and production traces visualized over time to identify patterns), and Usage (user invocations and interaction tracking).

How do I troubleshoot a production issue and redeploy a fix?

Navigate to your project's Experiments page, select the run you want to investigate, and click 'Open in Workspace' to restore it to a development environment in one click. The workspace contains the same code, environment, and data connections as production. After making changes and re-evaluating, you can redeploy directly from the new experiment run with full lineage maintained.

Where can I find the source code for this blueprint?

The source code is available on GitHub at https://github.com/dominodatalab/GenAI-Tracing-Tutorial.

Deploying agentic AI systems in Domino

Q: How does Domino support governance and compliance requirements for agentic AI?

Domino provides complete audit trails spanning the full lifecycle — from the experiment configuration selected for deployment to every agent decision made in production. This includes records of which agent made which decision, which model was called, and how outputs were generated.

Overview and goals

The challenge

Moving a working agentic AI system from development to production is where most enterprise AI projects stall. The gap between a promising experiment and a live application typically requires weeks of DevOps coordination, containerization work, and infrastructure provisioning, which slows iteration and creates organizational friction.

Once in production, teams face a second challenge: visibility. Traditional monitoring tracks uptime and latency, but agentic systems require deeper observability, including understanding which agent made which decision, which tool was called, and whether the output met quality standards. Without this, drift goes undetected, failures are hard to root-cause, and continuous improvement becomes guesswork.

Governance and compliance teams also require audit trails that span the full lifecycle, from the experiment configuration that was selected for deployment to every agent decision made in production.

The solution

Domino provides an end-to-end agentic AI platform that quickly moves systems from experiment to production. You can deploy directly from a validated experiment run in the Experiment Manager, selecting the exact configuration, hardware, and auto-scaling policies you want to put live. The same code you developed and evaluated is exactly what goes into production since there is no translation layer and no re-instrumentation.

The same tracing instrumentation used during development continues running in production automatically, collecting traces from every user interaction. Scheduled evaluations run asynchronously against production traffic, aggregating quality metrics over time so teams can spot gradual drift before it becomes a problem. When issues arise, any production configuration can be restored to a development workspace in one click, enabling rapid iteration and troubleshooting with full lineage from experiment to deployed agent.

Considerations when deploying agentic systems

• You have a validated agent configuration: You’ve evaluated your agentic system through experimentation and identified a configuration you want to make available to end users or automated workflows.

• You need continuous quality monitoring: Your agentic system handles real user requests, and you need to track classification accuracy, response quality, and routing decisions over time to detect drift before it causes failures.

• You require auditability in production: Governance or compliance requirements mean you need complete records of which agent made which decision, which model was called, and how outputs were generated, directly tied to the experiment configuration that was deployed.

• You want to close the development-production loop: When production issues arise, you need the ability to restore the exact production configuration into a development workspace, reproduce the problem, iterate, and redeploy, without losing context or lineage.

Setting up agent deployments and monitoring

The following steps demonstrate how to deploy and monitor the TriageFlow multi-agent incident response system built in the GenAI Tracing Blueprint and in this Tech Hour. The same approach applies to any agentic system developed in Domino. Note that you will need to either write code to create a frontend to interact with your agent (you can use any application framework - see the Domino docs on app publication here) or you will need to format a service such as FastAPI. The linked GenAI Tracing repository has a simple Streamlit dashboard that we’ll use to interact with our agentic setup. For additional information on building and evaluating agentic system in Domino, view our documentation.

Step 1. Deploy your Agent

Agentic systems can be deployed either directly from an evaluation run or from the Agents view. To deploy from an experiment run, you must have run the Experiment from a Job. In the tutorial repo, this will be the run_triage.py script.

To deploy from Experiment Manager:

In your project, click Experiments from the left navigation menu.
Choose the experiment containing the agent configurations you evaluated.
Select the run you want to deploy and click “…” on the right to launch the run.
A deployment wizard opens, pre-filling the code commit from the selected run. Specify your entrypoint - which will be app/app.sh for this example, the hardware tier, and auto-scaling policy, then set access permissions for your target team.

To deploy from the Agents view:

Select Deployments → Apps & Agents from the left navigation menu.
Click Publish → Agent in the top right corner.
Under the code page in the Agent launch modal, you can define the most recent agent deployment file or select the “Experiment run” tab to select a specific experiment run.
Specify the same deployment settings as above: entrypoint, hardware tier, auto-scaling policy, and access permissions.

The same code you developed and evaluated is exactly what goes live. Domino handles containerization, environment replication, and infrastructure setup automatically.

Step 2. Confirm production tracing evaluations

No additional instrumentation is required for tracing, the same @add_tracing decorator and DominoRun context used during development automatically collect production traces from real user interactions. Domino's agentic AI tracing supports two types of evaluations that can be attached to any trace using log_evaluation():

Metrics (numeric): Used for aggregation, such as mean, median, and sum — for example, a quality score or confidence value.
Labels (string): Used for filtering and grouping. For example, incident category or blast radius classification.

In the TriageFlow app, the trace_id is captured from inside the traced function and passed to log_evaluation() to attach both types. The app also includes a user feedback mechanism, so human scores and approvals are logged as evaluations when operators submit feedback through the UI. Note that all label values are strings, including feedback like "yes" or "no".

from domino.agents.logging import log_evaluation                                                                                     
# Metrics - numeric values for aggregation (mean, median, sum)                                              
log_evaluation(trace_id=trace_id, name="quality_score", value=4.5)                      
# Labels - string values for filtering and grouping                                                                 
log_evaluation(trace_id=trace_id, name="category", value="security_incident")   
# When user submits their evaluation                                                                                  
log_evaluation(trace_id=trace_id, name="human_score", value=4.0)                                                      log_evaluation(trace_id=trace_id, name="human_approved", value="yes") 

Open the application and interact with the agent to generate a few calls, then navigate to the agent's “Performance” tab in the deployed agent dashboard to confirm that traces appear.

The deployed agent dashboard provides three views:

• Overview: Deployment status and configuration details.

• Performance: Evaluation metrics visualized alongside production traces, showing trends over time to identify patterns in successful versus problematic interactions. You use the dropdowns to view different category counts and metrics.

• Usage: User invocations and interaction tracking.

Step 3. Schedule evaluations against production traffic

To keep production responses fast, evaluations can be set up to run asynchronously rather than being collected live. You can schedule Domino Jobs to run your evaluators on a set cadence against production traces. This approach also automatically versions and reproduces evaluations.

If you're following the tutorial repo, the run_scheduled_evaluation.py script handles this particular part for you. It evaluates all calls made to the agent in the last 24 hours and logs for post-hoc evaluations to each trace: a review status, a priority score, a high-risk flag for urgent high-impact incidents, and a timestamp of when the evaluation was performed. It also generates a report, saved to the project's artifacts. To set this up:

1. Select the Run Evaluations button in the top right of the agent dashboard. This provides a code template using search_traces() to retrieve production traces.

2. Copy the AGENT_ID into the run_scheduled_evaluation.py script.

3. Save and run the script manually as a one-time Job or schedule it to run automatically.

All evaluations, whether scheduled or live, appear in the same Performance tab.

For custom setups, use log_evaluation() to attach evaluation scores to the traces retrieved by search_traces().

Step 4: Iterate and redeploy

When monitoring surfaces an issue, navigate to your project's Experiments page, select the run you want to investigate, and restore it to a development workspace with one click. To do this, select “...” in the top right corner and then “Open in Workspace”. The restored workspace contains the same code, environment, and data connections as those in production.

Make your changes, run them through your evaluation judges using the same workflow and redeploy directly from the new experiment run. This tight loop between production monitoring and development enables rapid iteration while maintaining complete lineage between production agents and their source experiments.

Deploying agentic AI systems in Domino

Author

Article topics

Intended audience

Source code repository