Building data science teams
In 2012, data scientist was coined as the sexiest job of the twenty-first century by the Harvard Business Review. However, lots of companies still struggle to find and retain top data science talent, with one study showing that less than 2 percent of data scientists stay in the job for more than five years and the average tenure for a data scientist is only 1.5 years.
Team leads in companies where data science has just been incorporated can find it challenging to scale their teams. Without a defined plan, it’s often difficult to decide which use cases to work on and which direction to take the team. It’s also unclear which teams you should collaborate with, how to track success, and how to hire the right people.
If you’re trying to build or scale your data science team, this guide is for you. Here, you’ll learn what a team structure should be, how to fit its capabilities within your organization’s broader goals, and how to allocate resources to scale the team over time.
Achieving data science success with a small (yet mighty) team
There is no one-size-fits-all answer to building a data science team. However, there are some common themes that emerge from successful data science leaders. Before diving into the guide, learn from data science leaders about how they have built their teams from the ground up.
Why you need a team-building plan
Data scientists sometimes struggle to find a holistic view of their business goals, especially when working remotely or within pockets of data science teams in different departments. This can lead to overly complex processes, slow onboarding of new staff, inefficient workflows, and unnecessarily duplicated efforts.
To lead a productive team, data science leads need to know and prioritize their team’s current workload, choose the correct use cases and tool sets, and provide sufficient system access to team members. For example, does the team have access to all the data required? Do they have accounts set up on databases and data warehouses? Do they have access to the compute and tools they need? Having a plan can help address these concerns as well as efficiently allocate resources and encourage collaboration.
Your team-building plan should cover the following:
- What your team structure will look like (ie centralized or decentralized)
- How you plan on integrating your team into the organization as a whole
- How you allocate resources
- How to scale over time
- How you plan to hire, engage, and retain top talent
In addition, such plans shouldn’t be rigid. The data science industry is rapidly evolving, and your planning process should adapt to these changes. Don’t be afraid to deviate from the plan, especially with large external changes to either your industry or your organization. Ensure your plan is flexible to adapt to both small (trying a new tool or framework) and larger (developing a CoE) changes.
Data science is a field where roles are still evolving and skills can significantly differ between companies. A team-building plan can help achieve a competitive advantage in the market by saving you from losing top talent to competitors. Your team-building strategy should define roles and responsibilities, onboarding processes, continuing education opportunities and collaboration techniques that keep your staff feeling motivated and enjoying their work.
Moreover, the success of your data science team relies partly on how well the team collaborates internally. Set up regular meetings, workshops, or even team-building events with teams like engineering, business development, or sales to grow relationships. Partnerships can prevent duplication of work, encourage knowledge-sharing, build skills, reuse artifacts from past projects, and equally distribute workloads. At the same time, such collaboration helps teams look at the overall business goals from a common perspective. Your plan should include the tools you will use to promote collaboration and a sense of belonging.
It can often take years to build a data science team with different specialties. Your plan should identify which areas require technical skills and how you can train new or existing staff to fill those voids. In this way, you build and retain knowledge and ensure the team is self-sufficient even when key resources are absent or leaving.
Data scientists are usually motivated individuals who look for exciting, cutting-edge technologies and projects. Your team’s morale will inevitably suffer if it’s always doing repetitive and low-skill tasks, getting pulled into different projects with competing priorities, or not completing projects due to organizational as well as external factors. This will lead to key team members walking away from the organization. You should, therefore, plan on how to retain team members by keeping them motivated with exciting projects, a clear definition of work, work-life balance, performance appraisal, and recognition.
Hiring and onboarding plan
This hiring and onboarding plan template walks data science leaders through key questions to help find and train new data scientists on your team. The plan template includes key questions around attracting top talent, hiring process, on boarding, retention, and more.
How to Build a Data Science Team
There’s no right way to build a data science team. What works best for you will depend on what stage of growth your business is in, the budget available, and the culture of the business, among other factors.
While there aren’t any strict rules to follow, the key areas you should think about include deciding on a team structure, fitting data science into the organization, allocating resources, scaling over time, and hiring the right people.
Deciding on a Team Structure
Data science teams require a mixed bag of skills. You need people who can code, understand statistics and data science techniques (from basic to AI/ML), data visualization, data wrangling and feature creation, communication, and have good business understanding. Not everyone needs every skill but across your team you should cover them. Initially you will have more generalists who cover a wide range of skills. Over time and as the team expands you may start to create distinct roles. You also want this team to have strong relationships with data engineers, quality assurance/validation teams, and software/machine learning engineers IT operations and support teams.
Following are some typical roles within a data science team:
- Data engineers are responsible for designing and building data management systems. They create data models, design data pipelines, and recommend technologies. As senior members of the team, they also mentor other participants.
- Machine learning engineers focus on the deployment and infrastructure surrounding a model. Their work relies on the tools and frameworks for updating models as well as creating interfaces for end users to easily see what their predictions might mean in real life scenarios; all while working closely with data scientists who deploy these tools at various points across our network.
- Data scientists are integral to a data science team. While there isn’t a consensus definition of a data scientist, they often all use statistics, mathematics, engineering and technology to help businesses make better decisions. They have a deep understanding of how data works but also know when it's not worth analyzing, using their intuition and expertise to guide teams on how to best invest time and resources.
- Product managers understand customer needs and can identify AI and ML use cases to build solutions to their problems. They drive product development from inception to launch with the aim of sticking to budgets and timelines.
- Data science managers hire the data scientists, perform appraisals, prioritize and distribute workloads, define processes and standards, communicate with business stakeholders, and ultimately, have ownership of the entire data science program. Every data science team is different. What might be done by an ML Engineer in one organization might be done by a data engineer in another. While you may find these typical roles work together in a data science team — they may or may not also be a part of your data science team structure.
Fitting Data Science in the Organization
The success of your data science team relies on how well it can use data to make a real impact on the organization. Can you use data to increase sales, reduce costs, or better please your customers? Does your team understand the different business products, services, and processes to seamlessly communicate with other crossfunctional stakeholder teams?
To help answer these questions, you need to choose which type of data science team structure will work best within your organization. Following, you’ll learn about two of the most common team models seen in businesses: decentralized and centralized. Note that one is not necessarily better than the other, and often organizations flow between decentralized and centralized structures that incorporate the best of the two models.
Decentralized Model
In a decentralized model, data science resources are spread throughout different teams in the organization. This model is often found in companies where individual lines of business have recognized the power of data science and have gone on to hire or train their own staff to fill the role. In this model, there is no central data science team.
Decentralized models excel because data scientists are fully integrated into the line of business team, and because of this, they understand the product and processes of the organization as a whole. They can use data science to solve problems, like cutting costs or automating manual processes, as well as recognizing opportunities, like using customer data to target new demographics in order to increase sales.
However, decentralized structures often create silos. When data scientists work separately across the organization, duplication of work is common as well as a lack of standardization, and decentralized reporting. Career progression and mentoring can be a challenge if the leadership isn’t proficient in data science themselves. Additionally, the broader organization may not reap the full benefits of data science because efforts are prioritized and focused at the line of business rather than the enterprise.
Centralized Model
In a centralized model, data scientists operate as a team of their own, providing data solutions to other teams in the organization. Centralized models are often found in companies that are serious about using data for decision-making, analysis, and research, and are happy to allocate adequate funding and resources.
A centralized model encourages mentoring between experienced and junior staff, improved standards, and a centralized approach to business problems. This often leads to efficient workflows, especially when the right tools are deployed for managing workloads and access.
However, part of a data scientist’s role is to investigate the data requirements of different business units and make feasible solution recommendations. A team with a poor understanding of the business and domain can’t produce accurate recommendations. Centralized teams need to ensure they fully understand different areas of the business in order to make informed decisions. You can overcome this challenge by fostering communication channels between your team and the rest of the business (eg lunch and learn, practice area demonstration, and demo sessions).
Center of Excellence Model
Centralized analytics teams, or CoEs, provide advanced skills and services to business units and analyst groups. There are several benefits to this model, especially for large organizations. CoEs allow for ease of peer-to-peer learning, the exchange of best practices, and the sharing of experiences. In addition, this model helps to ensure that all analysts have access to the same resources and expertise. As a result, CoE models are very effective in larger organizations. However, they can also be beneficial for smaller organizations that have limited resources. By centralizing analytics functions, smaller organizations can still enjoy the benefits of economies of scale.
A CoE model has to operate more through relationships and influence than direct assignment or demand in order to prioritize data science needs. Cross-functional and enterprise analytics opportunities may go unaddressed without the charter to do so. The CoE should take charge and be the leading voice when it comes to data science and machine learning within the company in order to avoid any business opportunities being missed. By prioritizing analytics, the CoE can ensure that the company is making data-driven decisions that will help them stay ahead of the competition.
Allocating Resources
In addition to showing off the capabilities of your data science team, you can prove their worth by tracking your ROI. For instance, data scientists at UPS were able to use on-truck telematics and advanced algorithms to predict vehicle maintenance, optimize routes, and cut engine idle time. Their ROI analysis showed they saved over 39 million gallons of fuel and avoided driving 364 million miles.
It’s easy for data science teams to get pulled in different directions. That’s why you must define priorities with the business and keep your team allocated to the priority projects. Adopt an agile project management approach to ensure projects are delivered on time and within budget. You also need to define KPIs and other metrics to measure success. Don’t forget to allow some time for experimentation, however. That is often when true breakthroughs occur.
Scaling Over Time
To ensure a manageable workload and deliver more projects, you need to scale your team and processes. For example, in the beginning, your data scientists may be able to work on three projects simultaneously. But what happens when there are twice as many projects and requests to deliver with the same head count? Can you develop processes or tools to automate those tests? Are you hiring forward-thinking data scientists who understand the importance of automation?
Hiring Process
One of the top tips to leading a successful data science team is hiring a diverse skill set. But that doesn’t mean filling each role with a distinct professional. Instead, leverage the strengths of your existing team members to see who can take more responsibilities. For example, can you invest in training your ML engineers to do more data science tasks? Can your database engineers step up to become data engineers?
If using resources from within the company isn’t enough, creating a robust and repeatable hiring process makes scaling easier. The last thing you want is a high turnover rate. You should define a number of processes and steps for onboarding successful candidates into your team. This can include being introduced to the team and the business, conducting initial training sessions, familiarizing with different systems and business processes, granting role-based access to IT systems and data warehouses, setting up workspaces, and providing necessary software and hardware.
Domino’s Enterprise MLOps Platform can reduce onboarding time for new team members by 75% by providing an environment as friendly as their laptop, provisioned with the tools, data, and compute they need. . Its central repository makes it easy to search, reproduce, and reuse all data science–related work.. This makes it easier for new staff to look at past or current projects for learning purposes and start reusing the artifacts for new projects.
Conclusion
In this article, you learned why having a plan to build your data science team is essential. You also learned which important roles you need to hire for, how to fit the data science team into your overall business, how to allocate resources efficiently, and how to scale your team over time.
Domino Data Lab offers an enterprise MLOps platform that can help you build and scale your data science team. It provides a self-service infrastructure portal for data scientists to quickly spin up development environments, a model factory to quickly test data science models, and a system of record that centralizes all artifacts from previous projects. Data scientists can find, reuse, reproduce, and build upon the saved components from past works. Version tracking enables avoiding conflicts when reusing work, and as technology changes, you can easily add or remove emerging tools.