How Data Scientists Can Avoid Three Common Collaboration Challenges

Domino Data Lab2018-11-29 | 6 min read

Return to blog home

For the vast majority of data science teams, math and coding prowess alone aren’t enough. Unless you’re working on an esoteric academic project, your skills will go to waste if you fail to cooperate with the colleagues that will end up using your products. The biggest challenges often come at the beginning and end of the data science workflow: understanding the problems you’re solving and making sure the results are put to use.

At its first annual Rev Summit earlier this year, Domino asked three data science leaders to share their advice on improving collaboration to insure your team’s hard work has an impact, whether at a tiny startup or a large corporation. Here’s their advice for avoiding three of the most common collaboration challenges by upgrading your tools, processes and organizational structure:

You reinvented the wheel

Perhaps you had to start from scratch when a key team member left, leaving no record of how they stored data or built their models. Maybe you spent weeks on a project, only to learn a colleague was working on something similar. Or maybe you’ve found yourself rewriting the same functionalities over and over. Creating institutional memory is one of the most common challenges for data science teams, especially when they’re moving fast or facing high turnover.

Having the right tools in place is key to ensuring knowledge doesn’t get lost, says Sivan Aldor-Noiman, vice president and head of data science at Wellio, a startup that applies AI (Artificial Intelligence) to nutrition. Even small companies should use a wiki or Python notebooks to preserve memory, she says. At a larger organization, tools like Domino Data lab are critical to ensure models are reproducible without added hassle, says Aldor-Noiman. “The process by which we share our information and document and reproduce our information should become easier,” she says. “If your organization doesn’t feel that way, then your organization is not using the latest technology.”

Another way to save time and avoid duplication is to create internal libraries for common functionalities, suggests Patrick Harrison, associate director of data science at S&P Global. “Spend a little extra time and effort the first or second time you’re working on something to build your own little software library,” he says. “It takes some discipline, but it’s worth it.”

Tools alone aren’t enough to enforce record-keeping; leaders also need to clarify that it’s a priority. “I’ve seen people who are pretty anti-documentation…They’re like, ‘It takes so much time and is bureaucratic,” says Elena Grewal, head of data science at Airbnb. “We put it in our performance reviews.”

You came up with a great answer…to the wrong question

Nothing is more disheartening to a data science team than discovering a brilliant solution, only to find out it fails to solve the problem at hand. One technique to help stakeholders get on the same page is the “Five Whys.” “You start with a statement, [such as] ‘AirBnB users need to be able to split payments,’ and you ask ‘Why?’ again and again to really understand the root of what you’re trying to solve,” says Grewal.

At a large organization, Harrison suggests using a “matrix approach” in which data scientists report to a single lead (at S&P Global, that’s him), but form groups with domain experts and business stakeholders around a project or product. Even with no active project, he encourages his team to cold call people in other departments to learn about their work and data.

At a smaller company like Aldor-Noiman’s, she suggests leaders emphasize business priorities in weekly meetings and pay attention to prospective hires’ communication skills during interviews. Another idea is to build a “model strategy team” whose role is to understand whether a given problem is worth solving before putting data scientists on the project.

You and other stakeholders are speaking different languages

The comprehension gap has a flip side: Other departments often have trouble understanding what data scientists do, making it difficult to figure out what projects to tackle together. At S&P Global, Harrison and his team urged colleagues across the company to take an open online course on data science and led guided discussion sessions; 100 people signed up within two weeks. The company also created an internal course on AI and Machine Learning (ML), which 10,000 employees have taken. “That really planted the seed for a lot of folks so they now have context for what this is about and know it’s a strategic priority for the organization,” he said.

AirBnB has gone so far as to create its own Data University, which it promoted internally with marketing videos, posters and swag. More than 1,000 people across the company have taken courses on everything from experimentation to ML.

Formal education aside, just giving people the chance to explore data and visualizations on their own terms is effective, Aldor-Noiman says. “If you have the right tools and they make it simple, people stop fearing data science,” she says. “You’d be surprised how much people will be curious to understand.”

The Practical Guide to  Managing Data Science at Scale  Lessons from the field on managing data science projects and portfolios Read the Guide

Domino powers model-driven businesses with its leading Enterprise MLOps platform that accelerates the development and deployment of data science work while increasing collaboration and governance. More than 20 percent of the Fortune 100 count on Domino to help scale data science, turning it into a competitive advantage. Founded in 2013, Domino is backed by Sequoia Capital and other leading investors.

SHARE