The enterprise platform to build, deliver, and govern AI
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.
One of the primary challenges of any ML/AI project is transitioning it from the hands of data scientists in the develop phase of the data science lifecycle into the hands of engineers in the deploy phase.
Where in the life cycle does data scientists’ involvement end? Who takes responsibility for the operationalized models? And how long should the transition between development and deployment last? What does a data scientist do, compared to a data engineer or a DevOps engineer?
The answers to these questions are seldom cut and dried, even in a small shop. For an enterprise, the questions can become even more complicated as you add additional team members, each with different roles, into the mix.
The data science lifecycle encompasses four phases that provide a thumbnail sketch of the overall process and indicate where different team members should be focusing.
However, the roles and responsibilities in the typical lifecycle are seldom this clearly delineated.
In smaller data science operations, it’s possible for one person to have more than one role, but in an enterprise each team member should be able to focus on their specialty. There are seven primary roles, although there are usually several others involved. The business manager, for example, would be involved in the ideation and validation stages, while someone on the legal team would oversee the project for compliance before the model is delivered.
Often seen as the central player in any MLOps team, the Data Scientist is responsible for analyzing and processing data. They build and test the ML models and then send the models to the production unit. In some enterprises, they are also responsible for monitoring the performance of models once they are put into production.
The data analyst works in coordination with product managers and the business unit to uncover insights from user data. They typically specialize in different types of tasks, such as marketing analysis, financial analysis, or risk analysis. Many have quantitative skills comparable to those of data scientists while others can be classified as citizen data scientists that have some knowledge of what needs to be done but lack the coding skills and statistical background to work alone as data scientists do.
The Data Engineer manages how the data is collected, processed and stored to be imported and exported from the software reliably. They may have expertise in specific areas, like SQL databases, cloud platforms, as well as particular distribution systems, data structures, or algorithms. They are often vital in operationalizing data science results.
The DevOps engineer provides data scientists and other roles with access to the specialized tools and infrastructure (e.g., storage, distributed compute, GPUs, etc.) they need across the data science lifecycle. They develop the methodologies to balance unique data science requirements with those of the rest of the business to provide integration with existing processes and CI/CD pipelines.
The ML Architect develops the strategies, blueprints and processes for MLOps to be used, while identifying any risks inherent in the life cycle. They identify and evaluate the best tools and assemble the team of engineers and developers to work on it. Throughout the project life cycle, they oversee MLOps processes. They unify the work of data scientists, data engineers, and software developers.
The Software Developer works with data engineers and data scientists, focusing on the productionalization of ML models and the supporting infrastructure. They develop solutions based on the ML architect's blueprints, selecting and building necessary tools and implementing risk mitigation strategies.
A Domain Experts/Business Translator has deep in-depth knowledge of business domains and processes. They help the technical team understand what is possible and how to frame the business problem into an ML problem. They help the business team understand the value offered by models and how to use them. They can be instrumental in any phase where a deeper understanding of the data is crucial.
With so many stages in the process and so many people involved in an enterprise operation, communication and collaboration between teams and between silos can quickly create a number of problems. For instance, problems arise when teams do not understand what data was used for which model, where the data originated and how it is being tracked. This creates reliance on the data scientist(s) to provide all the necessary information and manage the transition from one stage to the other, which becomes an issue of data science governance. Problems arise when changes and progress in the MLOps process are not properly documented, which can create inaccurate datasets and overall confusion for team members.
A key point is ensuring that models are efficiently transitioned from one stage to another, without key information from previous stages becoming lost. This is accomplished with an Enterprise MLOps platform that simplifies and streamlines the complex data science process. For example, data scientists can easily get access to the tools and computations they need without having to rely on an Infrastructure Engineer.
One of the most important facets of managing MLOps well is to ensure each member is aware of their role in the team. Having a data scientist responsible for monitoring the deployment phase of a project instead of an engineer, for example, is relatively easy when they have access to monitoring tools and can be pinged automatically by the MLOps platform when an issue with the model arises.
Each specialization should have a lead assigned who is responsible for signing off on each phase of the project. A lead data scientist, for example, would oversee work done during the testing phase and would be responsible for determining when the model is ready for validation by the business unit.
Using Domino’s Enterprise MLOps platform, team members are able to easily perform their role across the entire data science lifecycle. It shortens the time and effort at key transitions and integrated workflows provide consistency regardless of who is doing the work. It also provides access to automatic monitoring tools and automatically generated reports that require very little time for someone to check in on the progress of a model. Because the needed information is right there at their disposal, the additional collaboration doesn’t take time or energy away from other tasks at hand.

David Weedmark is a published author who has worked as a project manager, software developer and as a network security consultant.
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.