Deploying agentic AI systems in Domino
Author
Andrea Lowe
Product Marketing Director, Data Science/AI/ML
Article topics
Agentic AI, GenAI production
Intended audience
Data scientists, ML engineers, AI engineers
Overview and goals
The challenge
Moving a working agentic AI system from development to production is where most enterprise AI projects stall. The gap between a promising experiment and a live application typically requires weeks of DevOps coordination, containerization work, and infrastructure provisioning, which slows iteration and creates organizational friction.
Once in production, teams face a second challenge: visibility. Traditional monitoring tracks uptime and latency, but agentic systems require deeper observability, including understanding which agent made which decision, which tool was called, and whether the output met quality standards. Without this, drift goes undetected, failures are hard to root-cause, and continuous improvement becomes guesswork.
Governance and compliance teams also require audit trails that span the full lifecycle, from the experiment configuration that was selected for deployment to every agent decision made in production.
The solution
Domino provides an end-to-end agentic AI platform that quickly moves systems from experiment to production. You can deploy directly from a validated experiment run in the Experiment Manager, selecting the exact configuration, hardware, and auto-scaling policies you want to put live. The same code you developed and evaluated is exactly what goes into production since there is no translation layer and no re-instrumentation.
The same tracing instrumentation used during development continues running in production automatically, collecting traces from every user interaction. Scheduled evaluations run asynchronously against production traffic, aggregating quality metrics over time so teams can spot gradual drift before it becomes a problem. When issues arise, any production configuration can be restored to a development workspace in one click, enabling rapid iteration and troubleshooting with full lineage from experiment to deployed agent.
Considerations when deploying agentic systems
• You have a validated agent configuration: You’ve evaluated your agentic system through experimentation and identified a configuration you want to make available to end users or automated workflows.
• You need continuous quality monitoring: Your agentic system handles real user requests, and you need to track classification accuracy, response quality, and routing decisions over time to detect drift before it causes failures.
• You require auditability in production: Governance or compliance requirements mean you need complete records of which agent made which decision, which model was called, and how outputs were generated, directly tied to the experiment configuration that was deployed.
• You want to close the development-production loop: When production issues arise, you need the ability to restore the exact production configuration into a development workspace, reproduce the problem, iterate, and redeploy, without losing context or lineage.
Setting up agent deployments and monitoring
The following steps demonstrate how to deploy and monitor the TriageFlow multi-agent incident response system built in the GenAI Tracing Blueprint and in this Tech Hour. The same approach applies to any agentic system developed in Domino. Note that you will need to either write code to create a frontend to interact with your agent (you can use any application framework - see the Domino docs on app publication here) or you will need to format a service such as FastAPI. The linked GenAI Tracing repository has a simple Streamlit dashboard that we’ll use to interact with our agentic setup. For additional information on building and evaluating agentic system in Domino, view our documentation.
Step 1. Deploy your Agent
Agentic systems can be deployed either directly from an evaluation run or from the Agents view. To deploy from an experiment run, you must have run the Experiment from a Job. In the tutorial repo, this will be the run_triage.py script.
To deploy from Experiment Manager:

- In your project, click Experiments from the left navigation menu.
- Choose the experiment containing the agent configurations you evaluated.
- Select the run you want to deploy and click “…” on the right to launch the run.
- A deployment wizard opens, pre-filling the code commit from the selected run. Specify your entrypoint - which will be app/app.sh for this example, the hardware tier, and auto-scaling policy, then set access permissions for your target team.

To deploy from the Agents view:

- Select Deployments → Apps & Agents from the left navigation menu.
- Click Publish → Agent in the top right corner.
- Under the code page in the Agent launch modal, you can define the most recent agent deployment file or select the “Experiment run” tab to select a specific experiment run.
- Specify the same deployment settings as above: entrypoint, hardware tier, auto-scaling policy, and access permissions.
The same code you developed and evaluated is exactly what goes live. Domino handles containerization, environment replication, and infrastructure setup automatically.
Step 2. Confirm production tracing evaluations
No additional instrumentation is required for tracing, the same @add_tracing decorator and DominoRun context used during development automatically collect production traces from real user interactions. Domino's agentic AI tracing supports two types of evaluations that can be attached to any trace using log_evaluation():
- Metrics (numeric): Used for aggregation, such as mean, median, and sum — for example, a quality score or confidence value.
- Labels (string): Used for filtering and grouping. For example, incident category or blast radius classification.
In the TriageFlow app, the trace_id is captured from inside the traced function and passed to log_evaluation() to attach both types. The app also includes a user feedback mechanism, so human scores and approvals are logged as evaluations when operators submit feedback through the UI. Note that all label values are strings, including feedback like "yes" or "no".
from domino.agents.logging import log_evaluation
# Metrics - numeric values for aggregation (mean, median, sum)
log_evaluation(trace_id=trace_id, name="quality_score", value=4.5)
# Labels - string values for filtering and grouping
log_evaluation(trace_id=trace_id, name="category", value="security_incident")
# When user submits their evaluation
log_evaluation(trace_id=trace_id, name="human_score", value=4.0) log_evaluation(trace_id=trace_id, name="human_approved", value="yes") Open the application and interact with the agent to generate a few calls, then navigate to the agent's “Performance” tab in the deployed agent dashboard to confirm that traces appear.

The deployed agent dashboard provides three views:
• Overview: Deployment status and configuration details.
• Performance: Evaluation metrics visualized alongside production traces, showing trends over time to identify patterns in successful versus problematic interactions. You use the dropdowns to view different category counts and metrics.
• Usage: User invocations and interaction tracking.
Step 3. Schedule evaluations against production traffic
To keep production responses fast, evaluations can be set up to run asynchronously rather than being collected live. You can schedule Domino Jobs to run your evaluators on a set cadence against production traces. This approach also automatically versions and reproduces evaluations.
If you're following the tutorial repo, the run_scheduled_evaluation.py script handles this particular part for you. It evaluates all calls made to the agent in the last 24 hours and logs for post-hoc evaluations to each trace: a review status, a priority score, a high-risk flag for urgent high-impact incidents, and a timestamp of when the evaluation was performed. It also generates a report, saved to the project's artifacts. To set this up:
1. Select the Run Evaluations button in the top right of the agent dashboard. This provides a code template using search_traces() to retrieve production traces.
2. Copy the AGENT_ID into the run_scheduled_evaluation.py script.

3. Save and run the script manually as a one-time Job or schedule it to run automatically.
All evaluations, whether scheduled or live, appear in the same Performance tab.
For custom setups, use log_evaluation() to attach evaluation scores to the traces retrieved by search_traces().
Step 4: Iterate and redeploy
When monitoring surfaces an issue, navigate to your project's Experiments page, select the run you want to investigate, and restore it to a development workspace with one click. To do this, select “...” in the top right corner and then “Open in Workspace”. The restored workspace contains the same code, environment, and data connections as those in production.

Make your changes, run them through your evaluation judges using the same workflow and redeploy directly from the new experiment run. This tight loop between production monitoring and development enables rapid iteration while maintaining complete lineage between production agents and their source experiments.
Check out the GitHub repo

Andrea Lowe
Product Marketing Director, Data Science/AI/ML

Andrea Lowe, PhD, is Product Marketing Director, Data Science/AI/ML at Domino Data Lab where she develops training on topics including overviews of coding in Python, machine learning, Kubernetes, and AWS. She trained over 1000 data scientists and analysts in the last year. She has previously taught courses including Numerical Methods and Data Analytics & Visualization at the University of South Florida and UC Berkeley Extension.