How to Build a Data Science Team
There’s no right way to build a data science team. What works best for you will depend on what stage of growth your business is in, the budget available, and the culture of the business, among other factors.
While there aren’t any strict rules to follow, the key areas you should think about include deciding on a team structure, fitting data science into the organization, allocating resources, scaling over time, and hiring the right people.
Deciding on a Team Structure
Data science teams require a mixed bag of skills. You need people who can code, understand statistics and data science techniques (from basic to AI/ML), data visualization, data wrangling and feature creation, communication, and have good business understanding. Not everyone needs every skill but across your team you should cover them. Initially you will have more generalists who cover a wide range of skills. Over time and as the team expands you may start to create distinct roles. You also want this team to have strong relationships with data engineers, quality assurance/validation teams, and software/machine learning engineers IT operations and support teams.
Following are some typical roles within a data science team:
Data engineers are responsible for designing and building data management systems. They create data models, design data pipelines, and recommend technologies. As senior members of the team, they also mentor other participants.
Machine learning engineers focus on the deployment and infrastructure surrounding a model. Their work relies on the tools and frameworks for updating models as well as creating interfaces for end users to easily see what their predictions might mean in real life scenarios; all while working closely with data scientists who deploy these tools at various points across our network.
Data scientists are integral to a data science team. While there isn’t a consensus definition of a data scientist, they often all use statistics, mathematics, engineering and technology to help businesses make better decisions. They have a deep understanding of how data works but also know when it's not worth analyzing, using their intuition and expertise to guide teams on how to best invest time and resources.
Product managers understand customer needs and can identify AI and ML use cases to build solutions to their problems. They drive product development from inception to launch with the aim of sticking to budgets and timelines.
Data science managers hire the data scientists, perform appraisals, prioritize and distribute workloads, define processes and standards, communicate with business stakeholders, and ultimately, have ownership of the entire data science program. Every data science team is different. What might be done by an ML Engineer in one organization might be done by a data engineer in another. While you may find these typical roles work together in a data science team - they may or may not also be a part of your data science team structure.
Fitting Data Science in the Organization
The success of your data science team relies on how well it can use data to make a real impact on the organization. Can you use data to increase sales, reduce costs, or better please your customers? Does your team understand the different business products, services, and processes to seamlessly communicate with other crossfunctional stakeholder teams?
To help answer these questions, you need to choose which type of data science team structure will work best within your organization. Following, you’ll learn about two of the most common team models seen in businesses: decentralized and centralized. Note that one is not necessarily better than the other, and often organizations flow between decentralized and centralized structures that incorporate the best of the two models.
Decentralized Model
In a decentralized model, data science resources are spread throughout different teams in the organization. This model is often found in companies where individual lines of business have recognized the power of data science and have gone on to hire or train their own staff to fill the role. In this model, there is no central data science team.
Decentralized models excel because data scientists are fully integrated into the line of business team, and because of this, they understand the product and processes of the organization as a whole. They can use data science to solve problems, like cutting costs or automating manual processes, as well as recognizing opportunities, like using customer data to target new demographics in order to increase sales.
However, decentralized structures often create silos. When data scientists work separately across the organization, duplication of work is common as well as a lack of standardization, and decentralized reporting. Career progression and mentoring can be a challenge if the leadership isn’t proficient in data science themselves. Additionally, the broader organization may not reap the full benefits of data science because efforts are prioritized and focused at the line of business rather than the enterprise.
Centralized Model
In a centralized model, data scientists operate as a team of their own, providing data solutions to other teams in the organization. Centralized models are often found in companies that are serious about using data for decision-making, analysis, and research, and are happy to allocate adequate funding and resources.
A centralized model encourages mentoring between experienced and junior staff, improved standards, and a centralized approach to business problems. This often leads to efficient workflows, especially when the right tools are deployed for managing workloads and access.
However, part of a data scientist’s role is to investigate the data requirements of different business units and make feasible solution recommendations. A team with a poor understanding of the business and domain can’t produce accurate recommendations. Centralized teams need to ensure they fully understand different areas of the business in order to make informed decisions. You can overcome this challenge by fostering communication channels between your team and the rest of the business (eg lunch and learn, practice area demonstration, and demo sessions).
Center of Excellence Model
Centralized analytics teams, or CoEs, provide advanced skills and services to business units and analyst groups. There are several benefits to this model, especially for large organizations. CoEs allow for ease of peer-to-peer learning, the exchange of best practices, and the sharing of experiences. In addition, this model helps to ensure that all analysts have access to the same resources and expertise. As a result, CoE models are very effective in larger organizations. However, they can also be beneficial for smaller organizations that have limited resources. By centralizing analytics functions, smaller organizations can still enjoy the benefits of economies of scale.
A CoE model has to operate more through relationships and influence than direct assignment or demand in order to prioritize data science needs. Cross-functional and enterprise analytics opportunities may go unaddressed without the charter to do so. The CoE should take charge and be the leading voice when it comes to data science and machine learning within the company in order to avoid any business opportunities being missed. By prioritizing analytics, the CoE can ensure that the company is making data-driven decisions that will help them stay ahead of the competition.
Allocating Resources
In addition to showing off the capabilities of your data science team, you can prove their worth by tracking your ROI. For instance, data scientists at UPS were able to use on-truck telematics and advanced algorithms to predict vehicle maintenance, optimize routes, and cut engine idle time. Their ROI analysis showed they saved over 39 million gallons of fuel and avoided driving 364 million miles.
It’s easy for data science teams to get pulled in different directions. That’s why you must define priorities with the business and keep your team allocated to the priority projects. Adopt an agile project management approach to ensure projects are delivered on time and within budget. You also need to define KPIs and other metrics to measure success. Don’t forget to allow some time for experimentation, however. That is often when true breakthroughs occur.
Scaling Over Time
To ensure a manageable workload and deliver more projects, you need to scale your team and processes. For example, in the beginning, your data scientists may be able to work on three projects simultaneously. But what happens when there are twice as many projects and requests to deliver with the same head count? Can you develop processes or tools to automate those tests? Are you hiring forward-thinking data scientists who understand the importance of automation?
Hiring Process
One of the top tips to leading a successful data science team is hiring a diverse skill set. But that doesn’t mean filling each role with a distinct professional. Instead, leverage the strengths of your existing team members to see who can take more responsibilities. For example, can you invest in training your ML engineers to do more data science tasks? Can your database engineers step up to become data engineers?
If using resources from within the company isn’t enough, creating a robust and repeatable hiring process makes scaling easier. The last thing you want is a high turnover rate. You should define a number of processes and steps for onboarding successful candidates into your team. This can include being introduced to the team and the business, conducting initial training sessions, familiarizing with different systems and business processes, granting role-based access to IT systems and data warehouses, setting up workspaces, and providing necessary software and hardware.
Domino’s Enterprise MLOps Platform can reduce onboarding time for new team members by 75% by providing an environment as friendly as their laptop, provisioned with the tools, data, and compute they need. . Its central repository makes it easy to search, reproduce, and reuse all data science–related work.. This makes it easier for new staff to look at past or current projects for learning purposes and start reusing the artifacts for new projects.
Conclusion
In this article, you learned why having a plan to build your data science team is essential. You also learned which important roles you need to hire for, how to fit the data science team into your overall business, how to allocate resources efficiently, and how to scale your team over time.
Domino Data Lab offers an enterprise MLOps platform that can help you build and scale your data science team. It provides a self-service infrastructure portal for data scientists to quickly spin up development environments, a model factory to quickly test data science models, and a system of record that centralizes all artifacts from previous projects. Data scientists can find, reuse, reproduce, and build upon the saved components from past works. Version tracking enables avoiding conflicts when reusing work, and as technology changes, you can easily add or remove emerging tools.