On Collaboration Between Data Science, Product, and Engineering Teams

Ann Spencer2019-02-20 | 35 min read

Return to blog home

Eugene Mandel, Head of Product at Superconductive Health, recently dropped by Domino HQ to candidly discuss cross-team collaboration within data science. Mandel’s previous leadership roles within data engineering, product, and data science teams at multiple companies provides him with a unique perspective when identifying and addressing potential tension points.

Consider Being Product-Minded

Eugene Mandel, Head of Product at Superconductive Health, recently dropped by Domino HQ to discuss cross-team collaboration within data science. While Mandel’s background is engineering, he has held leadership roles within data engineering, data science, and product teams at multiple companies. Mandel’s practical experience within these roles has provided him with a unique perspective about the state of collaboration within data science as well as “why” some potential cross-team tension points arise. During the candid discussion he points out how the probabilistic nature of data science work differs from “normal software engineering.” He also points out that this difference may provide some challenges for data science, product, and engineering teams as they work toward building data products. Yet, Mandel advocates for building trust through understanding potential tension points from multiple perspectives as well as being “product-minded.” For example, parties can be “product-minded” by taking a step back and considering how the target users will experience their end product. Mandel also speculates that one of the potential trends we will see for the industry is that “data science will become more like data product development and be a core part of the product organization.”

This blog post provides some excerpts from the discussion, the audio recording, as well as a full written transcript of the conversation. This post is a part of an ongoing series where I sit down with various members of the industry and capture their different perspectives regarding the current state of collaboration within data science, collaboration tension points, and how to address them. The intention of this series is to contribute to public discourse about cross-team collaboration within data science in order to accelerate data science work.

Applying Product Management Principles to Data Science

Prior to our discussion at Domino HQ, I was already aware that Mandel has spoken and written about applying product management principles to data science. I asked him to unpack his journey of how he reached that perspective. Mandel mentioned that he started noticing how data science teams were making a transition from being advisors to building data products such as a recommendation engine

“When you see how data science teams made this transition from a purely advisory role, to product building role, you see how product management principles apply. Because, at first, people assume that what we're building is software, so probably, it's just normal software. Then we started realizing that there are a lot of similarities, but that there are a lot of differences too. And, of course, another thing is that unlike in, well, quote unquote "normal software engineering", the background of people that work in data science is much more diverse. I've worked with physicists. I've worked with psychologists. I've worked with sociologists. I've worked with normal software engineers, people that come from a business background. If you're working for a regular software company, you can't assume that most people have this shared culture of software product management. When going to data science teams, or companies that build data products, it's best not to assume that, and be much more explicit. “

“Normal Software Development” vs Data Product Development

Mandel also later relayed a key difference between “normal software development” and developing a data product

“The last company I worked for was a very interesting story as well, because I joined the company that had a very good established culture of software development...with tests, with CI/CD, experienced engineers, experienced engineering management, and product management. But not much experience in building data products. When I started identifying what data products can be built and how can they be built, we did go through several iterations of, first of all, gaining trust, and then agreeing on what is the process for developing the product. And that is a major point, because, let's say in “normal software development”, a lot of companies will use Scrum or one of the variations of Scrum, they have a lot of conversation about deploying, story points, and stories, right? When you develop data products you can't exactly adopt the same process, just because the level of uncertainty is much higher. Your process should be focused on iterative reducing of uncertainty and prototyping. So bridging this kind of gap of understanding in engineering and data science was a challenge. Once we were past that, we talked about how the products essentially...(products is plural because there was a core product, and then development data science products)...how would they talk? Right? And what I've found, that, well, first of all, after you gain trust, which is crucial, it really helps to agree on KPIs that are very concrete, but broad enough between the main non-data product and data products. Then it allows us to iterate data products much faster without creating a lot of risk to the core product. And that actually was a model that we've followed and that was quite successful."

Not Involving Data in Your Prototype: A Common Mistake to Avoid

When asked to generalize an example and unpack a particularly thorny issue for product managers to consider while working with data scientists, Mandel relayed

“so how do product managers work with data scientists? When a product manager prototypes a regular software product, coming back to process, the process is slightly different from data products. Why? Because when you prototype a regular software product, we've probably talked about user stories, use cases, you've started doing some kind of prototype of UI, bringing it to people and getting feedback. All extremely valuable.

When you are prototyping a data product, you have to do all of that, but also, you have to actually get access to the data that will be participating in this data product and understand it, analyze it, and actually plug it into the prototype. Why? Because without knowing the data, what I've seen product managers do is make a mistake on what the product can and cannot cut.

Sometimes you can just assume that the product, that the data that they have access to, can tell you much more. You are creating a prototype based on this assumption. Of course, to the users it all looks magical, because it gives you a recommendation, it knows just who you are, what you like, and what you do, right? But then when it transitions from prototyping stage to actual product, you see that the data set that you do have, is much less reliable, much more sparse, and when recommendations created from this data showing users, well, they're not exactly magical. Right? So, that's an example. Unless you involve data in your prototypes, that's a problem. And product managers that don't have experience in data product management tend to make this mistake.”

Data Products vs “Normal” Software Engineering: Tension with JIRA Tickets and Potential Risk

During the discussion, Mandel also reinforced how a data science work is probabilistic in nature and how his could lead to a tension point, particularly with engineering

“Now about collaboration between engineers and data scientists... what I've seen is, in regular software engineering, and this is probably over generalization, but things to tend to be "true" or "not true." When you look at unit tests, integration tests, for software products, all that sourced ...there are things like: the value is five, and this is true, and this is false, and the length of this list is 573. When dealing with data products, you move from deterministic world to probabilistic world, which means that what you expect are ranges and then you have to judge what makes sense, what doesn't make sense.

So, your tests start looking different, which means that when you.... and of course, you know, work with engineers, you always talk about tests. So, test frameworks and you could use probably the same frameworks, but how you structure the tests is different. What is considered mistake, or bad, even that, that's a very fundamental thing.... but even that is up to discussion.

In one of the teams that I've worked with, we had this extreme ....that I still consider to be very funny ....where there was this classifier and every time the classifier was producing a wrong prediction. At first, the team wanted to file it in JIRA, every individual case. Now, if you think about it from the software engineering's point of view, actually that's not such an insane idea. Well, it's a bug. Every wrong prediction is a bug. Bugs go in JIRA. They are managed. They are prioritized. It almost sounds logical. But if you come from the data science side, it doesn't exactly make sense. Because we know it's [the classifier] working, one is some kind of different process.”

Mandel also relayed insight

“….when talking about difference between normal software engineering and data products software engineering is [also] talking about risk. In normal software engineering, all the risk lies in code. Bad code. Undocumented code. Changing code. The normal software engineering organizations have really good culture and good processes with dealing with code. Any engineer joins your team, they'll probably not be surprised. Right? You know. Unit tests, integration tests, CI/CD, documentation. Right?

In data products, all of this exists. But on top of that, there is a risk of coming, that comes not from code, but from the data. Because every time data goes into data product and we don't control this data, a change in your data can wreak havoc downstream in your product. Even without one piece of the source code changing. And that's actually something that just doesn't exist in normal software engineering.

One of my favorite things is every time I talk to a data product person, data scientist, I'm collecting, scary data stories. Stories of how some innocent change of data somewhere far upstream from your product starts wreaking havoc downstream. And if you're lucky, it just breaks your product. But if you're unlucky, it just silently starts behaving differently. Which means that in normal software engineering code is the, I guess, first class citizen. In data products, the two players: the code itself and the data. And this kind of, you know, what we're trying to do is making data sets first class objects, first class citizens of the data world that can be testable, describable, that you can talk about.”

For More Insight

These excerpts are just a few insights pulled from the recent discussion at Domino HQ regarding cross-team collaboration within data science. If you are interested in additional insights, the audio recording as well as a full transcript of the discussion is provided. The written transcript has been edited for readability.

Audio Recording

Full Transcript

The following transcript has been edited for readability.

Ann Spencer, Head of Content, Domino Data Lab: Thank you so much for agreeing to do this. I very much appreciate it. I know that we've known each other for about a year or so, yet it would be great if you could provide some insight to what you're up to these days.

Eugene Mandel, Head of Product, Superconductive Health: Alright. Just a little background. I've been doing software products in different industries for different use cases for over 15 years, ranging from voice-over IP to marketing services software. But the thread that was always going through those products is data. Specifically, how to use data to make products better, even in cases when data was not the product itself.

My last job was working for a company called Directly where we helped other companies use their customers and resources to pitch-in with customer support. I joined the company to help figure out how data can make the product better. We ended up building models, pipelines that essentially identified frequent math equations and provided reasonable responses.

In the past half year plus, I joined another company Superconductive Health. We work with clients in healthcare and life sciences. We do all kinds of data projects for them and build an external facing product for bigger things. Not only for life sciences, but pretty much for any company that deals with data sets and data pipelines.

Ann Spencer: For the past couple of years, you've both written and presented about applying product management principles on top of data science. Can you unpack a bit about what led you to this perspective and your journey to do that?

Eugene Mandel: Yes. So, my background is engineering. Which probably informs all of this. When data science started getting the label "data science", probably seven years ago, I started realizing that's a big part of what I've been doing. And that really triggered me thinking about, well, "what's different?" I think what's interesting is that many data science teams in companies, many data science projects, and probably the many careers of data scientists, started from being in an advisory role.

When we get access to data, we understand it, we do something extremely cool, and the final product of our work is, maybe, it's a model. Maybe it's some kind of report. Maybe it's recommendations. But it's not a running product, in itself.

With my engineering background...well, that's not the most natural thing for me. So, I always gravitate to things that, like, you build something that runs and where users attach it. And for me, I started seeing data science teams making this transition, from work being some kind of device, to the work product becoming actual product.

Let's say if we're talking about some kind of marketing use case... data science can produce a recommendation that addresses "what's the best way to talk to a particular group of users?" And that's great. But then, there was the absolute last mile....okay, so let's talk to those users this way and it usually involves building a product, some kind of recommendation engine, classifier, or something.

When you see how data science teams made this transition from a purely advisory role, to product building role, you see how product management principles apply. Because, at first, people assume that what we're building is software, so probably, it's just normal software. Then we started realizing that there are a lot of similarities, but that there are a lot of differences too. And, of course, another thing is that unlike in, well, quote unquote "normal software engineering", the background of people that work in data science is much more diverse. I've worked with physicists. I've worked with psychologists. I've worked with sociologists. I've worked with normal software engineers, people that come from a business background. If you're working for a regular software company, you can't assume that most people have this shared culture of software product management. When going to data science teams, or companies that build data products, it's best not to assume that, and be much more explicit.

Ann Spencer: When you were talking about... how you were noticing how data scientists were moving to an advisory kind of role and then the advice becomes your product as well as how the data scientists are coming from all these different backgrounds, physicists, psychologists, whatnot.... do you think that contributes to collaboration? What do you think is the current state of collaboration in data science?

Eugene Mandel: The thing that I think I know more about is collaboration between data science and engineering and product teams. In every company I worked for...it was a really interesting story of evolution....for example, at Jawbone, the data science team started as producing data stories and interesting insights and eventually...well, eventually moved to owning a piece of product.

In this case, the piece of product was in the app or in the Jawbone's app, that track your steps, diet, and gives you some recommendations. Then there are negotiations between the data science team and the products/software engineering team. It ended up being that we were all a piece of the app, where the UI was just support explaining formatted insights and recommendations. But the data science team will own what is being displayed. Data science will implement the pipelines that produces this quote and recommendations.

But this collaboration wasn't smooth at first. Because, especially if data science team doesn't have engineers on the team, there are a lot of challenges. The first challenge is even just gaining trust from the engineering team. Because if you are telling an engineer, you know, if you're not engineer and you are telling an engineer "I'm just gonna push stuff into your well guided product...you have testing, you have CI/CD, I don't." Then, you know, you can't really expect a similar acceptance. Right? So, that was a Jawbone example.

The last company I worked for was a very interesting story as well, because I joined the company that had a very good established culture of software development...with tests, with CI/CD, experienced engineers, experienced engineering management, and product management. But not much experience in building data products. When I started identifying what data products can be built and how can they be built, we did go through several iterations of, first of all, gaining trust, and then agreeing on what is the process for developing the product.

And that is a major point, because, let's say in normal software development, a lot of companies will use Scrum or one of the variations of Scrum, they have a lot of conversation about deploying, story points, and stories, right?

When you develop data products you can't exactly adopt the same process, just because the level of uncertainty is much higher. Your process should be focused on iterative reducing of uncertainty and prototyping. It is kind of like agile but not exactly agile, not like scrum in software development.

So bridging this kind of gap of understanding in engineering and data science was a challenge.
Once we were past that, we talked about how the products essentially...(products is plural because there was a core product, and then development data science products)...how would they talk?

Right? And what I've found, that, well, first of all, after you gain trust, which is crucial, it really helps to agree on KPIs that are very concrete, but broad enough between the main non-data product and data products. Then it allows us to iterate data products much faster without creating a lot of risk to the core product. And that actually was a model that we've followed and that was quite successful.

Ann Spencer: In both of your examples, you referred to gaining trust and working on the process. Are there any other, what you would think of as thorniest, or most common issues for a data scientist when they're collaborating with engineering? Or, product, when they're collaborating with data scientists. I don't know if those are the most common, or thorniest issues, but I just wanted to put that out there so that you could unpack what you think the thorniest, or most common issues are in collaboration.

Eugene Mandel: A lot of, well, a lot of issues can be packaged into process because it suggests how we work, don't work, how we talk about work, and how we collaborate, right? But to be more specific?

Okay, so how do product managers work with data scientists? When a product manager prototypes a regular software product, coming back to process, the process is slightly different from data products. Why? Because when you prototype a regular software product, we've probably talked about user stories, use cases, you've started doing some kind of prototype of UI, bringing it to people and getting feedback. All extremely valuable.

When you are prototyping a data product, you have to do all of that, but also, you have to actually get access to the data that will be participating in this data product and understand it, analyze it, and actually plug it into the prototype. Why? Because without knowing the data, what I've seen product managers do is make a mistake on what the product can and cannot cut.

Sometimes you can just assume that the product, that the data that they have access to, can tell you much more. You are creating a prototype based on this assumption. Of course, to the users it all looks magical, because it gives you a recommendation, it knows just who you are, what you like, and what you do, right? But then when it transitions from prototyping stage to actual product, you see that the data set that you do have, is much less reliable, much more sparse, and when recommendations created from this data showing users, well, they're not exactly magical. Right? So, that's an example. Unless you involve data in your prototypes, that's a problem. And product managers that don't have experience in data product management tend to make this mistake.

Now about collaboration between engineers and data scientists... what I've seen is, in regular software engineering, and this is probably over generalization, but things to tend to be "true" or "not true." When you look at unit tests, integration tests, for software products, all that sourced ...there are things like: the value is five, and this is true, and this is false, and the length of this list is 573. When dealing with data products, you move from deterministic world to probabilistic world, which means that what you expect are ranges and then you have to judge what makes sense, what doesn't make sense.

So, your tests start looking different, which means that when you.... and of course, you know, work with engineers, you always talk about tests. So, test frameworks and you could use probably the same frameworks, but how you structure the tests is different. What is considered mistake, or bad, even that, that's a very fundamental thing.... but even that is up to discussion.

In one of the teams that I've worked with, we had this extreme ....that I still consider to be very funny ....where there was this classifier and every time the classifier was producing a wrong prediction. At first, the team wanted to file it in JIRA, every individual case. Now, if you think about it from the software engineering's point of view, actually that's not such an insane idea. Well, it's a bug. Every wrong prediction is a bug. Bugs go in JIRA. They are managed. They are prioritized. It almost sounds logical. But if you come from the data science side, it doesn't exactly make sense. Because we know it's [the classifier] working, one is some kind of different process.

Right? So, that's not...so testing how you treat bugs, prototyping, that's probably first examples of potential [collaboration] problem through my mind, but that's just one of many.

Ann Spencer: Absolutely. What about some practical oriented advice to share for people? Because you've experienced this at multiple companies and have worn many hats...in product, engineering, or data science. I think you have a very unique perspective because you've worn all of the hats. What kind of practical advice would you have for people in terms of addressing some of the tension points that might come up or some of the ways to gain trust or build process?

Eugene Mandel: So, to gain trust and to build process, and, again, this is purely an opinion, of course, right? So, you know, different companies structure their data science teams differently. I believe that it's very important for data science teams to be as close as possible to be full stack product groups. Which means that on the team you either have either data scientists that can be engineers and are product minded, or they have data science teams where you have just mix of pure data scientists and engineers and product managers. So, having engineers helps build trust with other engineers because you talk the same language. When you talk about continuous integration, continuous deployment, [CI/CD] about testing, about quality, about even style of code, right? Which if you were a data scientist, that would not necessarily be your first concern.

It's important, you have to have product people on the team or be product minded. Because, you can talk about classifiers and models and any kind of outlet of data products, but you have to take it one step further. You have to think about how users experience those products. Do they find them to be believable? Do they find them confusing? How can the products be pared down?

I think, in work, data science teams that are as close to product teams as possible ....to being full stack product teams... have a high chance of success in gaining trust of other engineers, of other engineering organizations, and product organizations, and actually getting stuff done.

Ann Spencer: So, you provide a lot of insights, in terms of the different companies you've been at, the different examples, some of the thorniest and common issues that you've seen as well as practical advice. What do you think about future state of data science? Or what would you like the future state to be?

Eugene Mandel: Well, this is, of course, purely speculative, right? Because-

Ann Spencer: Yes, exactly. Exactly.

Eugene Mandel: Because nobody knows the future, right?

Ann Spencer: It's a blue sky-

Eugene Mandel: It's like, you know, humorous to assume that we do know the future, right? But, okay, here's what I think I see. Which is, again, purely opinion and speculation, right? So, companies that didn't have data science started doing data science and companies that started doing data science began moving into data products, being part of their core products.

I think this is, to me, that's the main trend that is happening. When I worked with Directly, it actually was a kind of pretty good example of this. Because, it's, you know, it's a startup but it's a startup that was getting mature... real product, real customers, real team, real processes, but no experience with data, right? So, first stage was just understanding that, oh, the data that we have is extremely valuable. Organizing it. Just building pipelines for assembling it. Documenting it. Right? So, kind of, you know, what Monica Rogati has on her pyramid of data science needs. You know, the equivalent of the Maslow pyramid, but for data science.

Understanding that you have the data. Organizing the data. Building infrastructure. Then reasoning about what can be trusted and what cannot be trusted. Then you go up to thinking, well, what can I do with this? So, insights. Right? But once you got the insights, you start, you almost necessarily start thinking about, well, it's a report. A report is produced maybe once a month. How can I get those insights? And build, and change the logic of my core product using those insights. Right? And that requires building data pipelines. That requires building actual models, classifiers, of whatever...and of course once you introduce data science into your product, then it becomes a data product, well, then you have to test. You have to product manage. You have to document. You have to create user interface. Right? So, the main trend that I see is data science becoming more like data product development... and a part of the core of product organization.
Ann Spencer: Well, thank you so much for all of the insight. I have the feeling it will be very, very useful to our readers. Is there anything else you'd like to add?

Eugene Mandel: Well, I think, just maybe just one thing which is related to what I do now which is both. When talking about difference between normal software engineering and data products software engineering is talking about risk. In normal software engineering, all the risk lies in code. Bad code. Undocumented code. Changing code. The normal software engineering organizations have really good culture and good processes with dealing with code. Any engineer joins your team, they'll probably not be surprised. Right? You know. Unit tests, integration tests, CI/CD, documentation.

In data products, all of this exists. But on top of that, there is a risk of coming, that comes not from code, but from the data. Because every time data goes into data product and we don't control this data, a change in your data can wreak havoc downstream in your product. Even without one piece of the source code changing. And that's actually something that just doesn't exist in normal software engineering.

One of my favorite things is every time I talk to a data product person, data scientist, I'm collecting, scary data stories. Stories of how some innocent change of data somewhere far upstream from your product starts wreaking havoc downstream. And if you're lucky, it just breaks your product. But if you're unlucky, it just silently starts behaving differently. Which means that in normal software engineering code is the, I guess, first class citizen. In data products, the two players: the code itself and the data. And this kind of, you know, what we're trying to do is making data sets first class objects, first class citizens of the data world that can be testable, describable, that you can talk about.

Ann Spencer: And that goes back to what you hope for.... where you see the most effective outcome id being so close together. The data scientists, product, and the engineers.

Eugene Mandel: Yeah.

Ann Spencer: That seems to resonate.

Eugene Mandel: Yes. One thing to add to that...it's an interesting pattern that when collaborating on something, like every time we work together on something, they use something that they stare at. If we're talking about, you know, the business plan for the great office about to open up, well, I guess we will be talking and staring at distraction, right? If we're talking about changing the code, the problem would be staring at the pull request interface on github. We could be talking about tests, something else. Talking about product spec, probably, you know, staring at either a word document or one of the PM-ing tools, right? However, today, when you talk about data set, what it should look like, what it really looks like, what it does, there is actually no one thing that you stare at when you talk about. That's a problem because that just shows you that data sets....they didn't quite gain the first class citizen role in this development.

Ann Spencer: That makes a lot of sense. Particularly with trying to build a process around that.

Eugene Mandel: Yes. Exactly. Because any process involves collaboration. And unlike with code, so you can freeze the code. Great. But you can't really freeze data. Well, maybe with one exception. You can freeze data if you're using your internal data that doesn't depend on external wall. But in most, in a lot of most variable data products, not processing data streams that come outside of your team, or maybe even outside of your department. And if it's not coming out, if it's not internal, you can't control it, which means it can change.

Ann Spencer is the former Head of Content for Domino where she provided a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly, focusing on data science and data engineering.