Data Science != Software Engineering
By Domino2017-06-293 min read
Why understanding key differences between data science and engineering matters
As data science becomes more mature within an organization, engineering leaders are often pulled into leading, enabling, and collaborating with data science team members. While there are similarities between data science and software development (e.g., both include code), well intentioned engineering leaders may make assumptions about data science that lead to avoidable conflict and unproductive workflows. Conflict and unproductive workflows that engineering leaders are then tasked with resolving. Data science, unlike software development, is more similar to research, has unique computing demands, and the teams often work closely with business stakeholders with whom engineering teams don't typically engage.
Data science is more like research than engineering
Engineering involves building something that is already understood ahead of time. This allows engineering teams to track, monitor, predict, and control the engineering process. However, data science projects are often centered around answering a question that may turn into an insight or model. This focus on answering a question is what makes data science an exploratory and experimental research process. This also results in the need for more flexibility and agility around data science infrastructure and tooling than what is needed within engineering.
Variable computing demands
Engineering teams build software that may run on high-performance architecture. The engineering team uses infrastructure for testing and QA, and the infrastructure needs are static and predictable. Individual engineers often work on a single machine with a 16-32GB of RAM and four-to-eight cores. In contrast, data science projects’ compute capacity is not predictable and constant. Data science work involves computationally intensive experiments. Memory and CPU can be a bottleneck. For example, it could take 30 minutes to write code for an experiment and then it could take eight hours to run the experiment on a laptop. To avoid this type of bottleneck, the data scientist may utilize large machines for parallelizing work across cores or loading more data into memory.
Integration with other parts of the organization
While engineering is aligned with the organization's overall priorities, engineering teams are often independent and their work does not require close integration with finance, marketing, or HR teams. Data science projects are often focused on answering a question for a business stakeholder. For example, a data science team would work very closely with the HR team when building models for employee retention.
In this post, we discussed data science’s similarity to research, data science’s variable computing demands, and how data scientists often work closely with business stakeholders with whom engineering teams do not typically engage. If you are interested in reading more about how to enable data science within your organization, please see The Practical Guide to Managing Data Science at Scale.
Domino powers model-driven businesses with its leading Enterprise MLOps platform that accelerates the development and deployment of data science work while increasing collaboration and governance. More than 20 percent of the Fortune 100 count on Domino to help scale data science, turning it into a competitive advantage. Founded in 2013, Domino is backed by Sequoia Capital and other leading investors.
Subscribe to the Domino Newsletter
Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.