Data Science vs Engineering: Tension Points
Ann Spencer2018-12-16 | 99 min read
This blog post provides highlights and a full written transcript from the panel, “Data Science Versus Engineering: Does It Really Have To Be This Way?” with Amy Heineike, Paco Nathan, and Pete Warden at Domino HQ. Topics discussed include the current state of collaboration around building and deploying models, tension points that potentially arise, as well as practical advice on how to address these tension points.
Introduction
Recently, I had the opportunity to moderate the panel, “Data Science Versus Engineering: Does It Really Have To Be This Way?” with Amy Heineike, Paco Nathan, and Pete Warden at Domino HQ. As Domino’s Head of Content, it is my responsibility to ensure that our content provides a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives. Discourse that directly addresses challenges, including unsolved problems with high stakes. Discourse that is also anchored in the intention of helping accelerate data science work.
Collaboration between data science and engineering is just one of the challenges that I have heard stories about during my tenure at Domino as well as a previously, in a former role as the Data Editor at O’Reilly Media focused on data science and data engineering. It was in this previous role where I first met the illustrious panelists, who kindly set aside a recent Thursday evening to discuss differing perspectives about collaboration when building and deploying models. Just a few topics candidly discussed during the panel included potential tension points that arise, problem solving to address tension points, and hopeful reflections on the potential future state. This blog post covers highlights from the panel as well as a full written transcript. Additional content on this topic is available and there will be additional forthcoming content from other industry experts.
Perspectives: The Current State of Collaboration Around Building and Deploying Models
Each of the panelists has extensive industry experience within engineering, data science, machine learning, and research. We kicked off the discussion by diving into what they are seeing right now, in their own current state, to help provide a baseline for the evening’s discussion. This also helped each of us to be aware of key differences and perspectives in order to enable, all of us invested within data science, to learn from each other.
Amy Heineike relayed that at her ~55 person startup, Primer, that
“we've ended up bringing in people who could bridge data science and engineering. We’ve called the team “product engineering” that includes people who know how to build machine learning models, know how to do data science, have a bit of product intuition, and know-how to put things into production.”
Pete Warden, echoed this perspective and added that
“When me and Amy were chatting earlier about this, [we discussed] the idea of having full-stack machine learning. Because what I've seen, as well, is that if you don't have ownership of what the end user experience is, you just end up building academic models that are great for getting metrics in something like ImageNet but you don't learn all of the things you need to learn to turn that into a really effective model for end-users. Having that combination of product and research skills; it's a superpower. You're actually able to get products out to people in a way that a lot of teams who don’t theoretically know machine learning, may struggle with if they don't have that kind of experience. Trying to encourage people to actually take ownership of the whole process has been really important.”
Paco Nathan, who sees a wide breadth of model development workflows based on his advising companies and his work with the O’Reilly Strata and AI conferences, indicated
“I’ll take a perspective from maybe the other side because like you said, I think these are very good examples. I'll take a perspective from a lot of the companies that I talk with and case studies that are introduced where, within the enterprise, is quite a division between data science and data engineering teams. They are almost siloed. It seems like a year ago I would have heard the argument between, say, people wanting to Python or R on the data science team and the data engineering team would want to recode everything in Java. There's been that going on for a long time. I believe that's changed and we're seeing a big uptick on, for instance, Jupyter being used in fairly and stodgy environments and bridging the multi-language gaps and I think Arrow and others were probably contributing to that as well…I hope the silos are coming down.”
Despite some of the nuanced differences that impact collaboration around developing and deploying models, interestingly enough, a sense of ownership was a common theme. Regardless if model development was at a startup or a larger organization. Yet, that sense of ownership of the work and the outcomes may lead to a collaborative cross-functional team or silos. Organizations rally around collaboration differently and the sheer amount of work related to building and deploying models may result in potential tension points that teams will need to problem-solve for.
Tension Points
After discussing the current state of collaboration, the discussion transitioned into candidly identifying potential tension points that arise when working towards developing and deploying models. Identification of specific tension points as well as the potential “why?” behind them, allows all of us within data science to iterate on and problem-solve ways that may address the tension points. Potential tension points that arose during the discussion included the “sheer amount of work” around model development, organizational requirements for different techniques, unaligned expectations, and the lack of reproducibility.
Heineike pointed out that there is a
“huge amount of work to build a data-driven product… a small slice of that is building machine learning models. The rest of that whole pipeline is broad.... It includes “how can we define the problem we’re actually trying to solve?”, “what's the raw data that we're going to be running on?”, “how do I clean and prepare that?”, “what labeled data can I use for training the algorithm? do I have to bring in a team of people to do the labeling, and what tools do I need to build to enable them?” All of that stuff. Then, there's all these questions about “how does that model fit into my overall application and codebase? and then the outputs of it"; “where does it fall into the user experience? and what things do I need to show so that people understand the outputs of the model to make it interpretable?” There's this whole universe of stuff that isn't building the model but it's very related to building the model. It is technical and challenging to figure out. I think it's useful to think about who's responsible for these different pieces. ….I think if you have teams that have a very narrow view and that are only excited about a slice of that pie and think that it's somebody else's job to do the rest of it…or, if you're in a situation where they can't touch the rest of it because it's somebody else's job and they’re not allowed to get into that, maybe there’s a different codebase that they can't touch, that seems like it's going to be the place where the tension comes in.”
Nathan provided an alternative perspective to the amount of work, requirements, and the “why?” behind potential silos and tension points that have arisen within enterprise companies
”These kinds of cross-teams are multi-disciplinary. This is the right way to go. What I see driving the other direction a lot in the enterprise is there may be an entrenched incumbent team that does business intelligence and has done this for years and years and they own their data warehouse. And some team trying to do different techniques, obviously what they're going to do is create a different team, give them a different role, different names. Then you have competing teams inside the same organization. Now we're introducing much more learning and exciting things that are beyond the scope of the data scientists in the enterprise are doing. So, now they're introducing a new role for machine learning engineer, having a new team, and it will bifurcate. I think that's one of the kinds of tensions that happens in enterprise but on the flip side of it, there are reasons why organizations make a choice to silo it, because of compliance issues — in some places in Finance, we're seeing where there may be a team that works with the data and develops the models that they have to hand off to another team to audit. Literally different people look at it before it goes into production, and they firewall the teams so they can't work together. It kind of destroys that interdisciplinary thing but it's how we're trying to grapple with compliance."
During the discussion, I indicated that in my previous role as the Data Editor at O’Reilly, how I had heard many stories from data scientists who recently graduated about the tension points they experienced when comparing what they worked on within academia versus what they were expected to work on within the industry. I then queried the panel if they were still seeing instances of unaligned expectations,
Warden: “I think, we talked about Andrej Kaparthy, he's got a great slide. He's the head of AI at Tesla. He has a great slide of when he's doing his PhD; he just has his pie chart while he shows ... 90 percent of his time, he was thinking about models and 10 percent at the time, he was thinking about data. He has this chart and now he is in industry, he spends 80 percent of his time thinking about the data, and purely because in the academic setting, the data set is a given. You're trying to compare your results against other people so you have to all agree on a data set and how to be stable. Whereas most of the time in the industry, the way you can actually make improvements is by updating, improving, changing your data set, changing the labels and that can be a real shock to the system because it's a whole different skill that it's very, very hard to pick up in academia. I see that as one of the biggest shocks.”
Heineike: “Yes, that's right. I think it comes back to this question of a really broad range of work that needs to be done beforehand to pull off a data driven product. I think, we've talked before; there are a couple of people out there who've managed to pull off a role where they're really spending the whole time thinking about more model architectures. There are only a few people who can manage it and I think there are only a few problems out there that have people super dedicated making these models but I think most of the work that's being done requires this kind of breadth of engagement of sub-problems. I think they're super interesting but they're just different from ones that people are expecting.”
This led the discussion to reproducibility, or lack thereof, being a tension point which appears in both academia, science as a whole, and within the industry
Nathan: “Because I'm sure that a lot of you have seen it, but there's a big dialogue now about reproducibility in science. Some of the tooling that we've developed out of this process of data and analytics is now feeding back into more traditional scientific research to help make research reports more reproducible. There is a dialogue about how can we put this not only in science but also in enterprise and industry to make results across an organization reproducible so that you're not arguing over the results you're seeing.”
Warden: “And just to riff on the reproducibility side....that's also a big challenge with a lot of the machine learning papers that are out there. I think maybe if you have might picked up a machine learning paper and just tried from just reading the paper, to actually reproduce results, it can often be really really tricky. It's really tricky even for people inside Google sometimes. And we have our own, reproducibility problems that on machine learning side we need to really improve the tooling and improve the expectations. And part of the challenge is, that some of these papers, the data set is very hard to make it available. So, that really sort of stymies a lot of our attempts.”
While it was entirely possible to spend the entire hour-long panel discussion identifying potential tension points, we moved the discussion to iterating on and problems solving ways to address the tension points.
Practical Advice for Addressing Tension Points
Many themes arose when iterating on and problem-solving ways to address tension points related to collaboration around building and deploying models. One theme included considering a shift in mindset. The shift in mindset included having a sense of ownership, tapping into “being curious” about sub-problems, and consider inhabiting an “always learning” perspective. Another theme included building or hardening communication lines between different functional roles either through pairing, product prototyping (i.e., “Wizard of Ozzing”), tech talks/weekly seminars, and organizational cultural expectations. Working towards interdisciplinary understanding through cross-functional work or cross-training was also another theme that arose.
Ownership
In regard to a sense of ownership, Heineike indicated
“If you have a very expansive view of what you're responsible for, if you're curious about a lot of that puzzle, and at your point, if you have ownership over a lot of that puzzle, ...so if your end goal is “I want to build and ship something interesting”, that takes away some of the tension.”
Warden also indicated
"When you have a team that is unified around a set of requirements and everybody's immersed in all of those requirements, you get these really amazing results out of it. That's the opposite of tensions but it's a good counterpoint to the teams where everything is set.”
Curiosity and a Learning Mindset
Full transparency, all of the panelists met prior to the panel itself to have initial discussions about collaboration and problem solving for tension points. Consider “being curious” or “being curious about the problem space” was initially brought up by Heineike and all of the panelists expressed excitement at that time. This led to me query Heineike to unpack it a bit at the panel:
Heineike: “One thing that I've really enjoyed is realizing how many different ways there are at looking at some of these problems from an engineering lens; mathematical, machine learning lens or UX lens, and getting it to work on it at Primer. We started four years ago. Four of us in a room, two broken computers, somehow there's 55 of us now. I look around the room and see all these people with deep skill sets, all these different areas, and actually, very eclectic, different kinds of backgrounds. Just seeing that all come together and realizing that as you look back, it took the astrophysicist to optimize this and it took the engineer who actually was on Broadway for a year before she went back to the masters to think about the language generation. It takes all these kinds of eclectic people that come together to do it. “ ….”I think as we get into the problems if you can really embrace the fact that there are sub-problems, some great sub-problems, and if you can delight in that, then I think you've got a very great, happy career in front of you. I think it comes down to that. There's this eclectic piece here ....we've actually got quite a few computational scientists on the Primer team, Anna is a Ph.D. in computational chemistry work [points to audience member], people who became interested in algorithms because they wanted to help crack a problem. I think if you look at the user problems as this interesting thing to crack if you find yourself following all these different paths, for a while you're immersed in UX, and you're like, "Oh, how did users think about machine learning outputs." Maybe they don't understand them the way I understand them and have geeked out on the model for a long time and having bridged this. Or, you suddenly, find yourself getting bridged so we have documents you wrap up, code, stick it on the server quickly. The more curious you get about more of these pieces and I think that's kind of fun. You can kind of bring in different people who have resources put together, work together, and have fun.”
Nathan related how a sense of curiosity was a quality that he looked for when hiring for data science teams:
“Just kind of a riff off of it.... hiring for data science teams, that was always what worked better when we were staffing.... was to go out to especially physical sciences, or physical engineering, hiring up people who would have been on some track, say Aero/Astro or I hired people out of Physics a lot. Because they had a lot of natural curiosity. There's also a lot of great ways now that the hard sciences.... and what we're seeing in data science....are helping to inform each other...”
Tapping into a learning mindset is also related to “being curious”, and Heineike related
“I think within the industry where everything is changing all the time, and the one constant is feeling like you don't know what you're doing. And you have to keep learning stuff. I feel like every six months I kind of take stock and realize there's some enormous new thing that I have to learn that I have no idea how to do. And there's normally like a key moment where you're like, "okay. I got this." I think if you can embrace that, "okay I'm always going to be learning and I'm going to be doing something new and that's okay." And also realizing everyone else is in that boat. There's going to be some things that you know, and there's going to be a bunch of things that you don't know. And if you can be pretty humble about saying, "Actually I don't know what you're talking about." when they tell you something they're very excited about, and you're waiting for the explanation so that you can learn from each other quickly. I think that's truly key.“
Communication
Building and hardening communication lines are important when addressing collaboration-oriented tension points. Just a few practical ways to execute on this include pairing, product prototyping (i.e., “Wizard of Ozzing”), tech talks/weekly seminars, and organizational cultural expectations.
Heineike suggested pairing as a way to address tension points as
“One thing that we've found that has worked well at Primer, having product teams where you have the data scientists, the engineers, the product manager, all in a team together. They're going to be interacting with each other every day and getting to talk directly. I think there's probably going to be an organizational thing that could help you....So you're sitting next to each other, and it's a good thing. You've just got to talk a lot more actually. You want the data scientists to not be intimidated to go read the code for the application, and then to start asking the questions about it, and getting to explain what it is that they're stuck on. And then the engineers to be asking questions about the data science pieces and what it is that they’re stuck on. And then getting that dialogue going.”
Warden suggested being actively open to partnering
"....I like putting together something that's like "hey, it's my simple speech recognition thing" and have all of the infrastructure is like "hey, here's a data set I've created, here's some metrics I've created" and just kind of put it out there. With the knowledge that people are much more focused on the model side than me, their eyes will light up, and they will be "ah, we can do way better than this." They don't have to worry about all the painful stuff with data cleaning. data gathering, and all this other stuff. That's been a good way of motivating.“
Warden also suggested an exercise in product prototyping, or “Wizard of Ozzing”, to help build and harden communication lines:
“this is a bit different, but one of my favorite ways, especially when we were a small team, of actually product prototyping was something called Wizard of Ozzing, where you actually have a man behind the curtain instead of a bottle. Something on the other end of like, a chat, or you know, who's getting like a screencast but doesn't actually get to be in. So the people... basically has to pretend to be the model, given what they're actually seeing on screen. It's both kind of like a really fun party game. But it really, it's like the iterations you get on the actual product. Because oftentimes, the product team won't, or the product person if it's a start-up, they won't really understand all that they have to think through and mentally model like kind of what a machine learning solution is actually able to do. If you can show them even a person with 25 years of being alive in the world, can't solve this problem with the information they are given, then we're probably not going to be able to train a model to do that.“
Warden also recommended clearly articulating the return and impact of the work to researchers as
“One of the things that I've seen as most encouraging to researchers is people being very clear they need what the researchers are working on, actually giving them direction and saying, "Look if you could solve this problem, this would be a really big deal." One example is, I'm doing a lot of work on some of my microcontrollers and like trying to do things really tiny negative footprint devices. In order to do any image recognition on that, you only got like 250 kilobytes of RAM to play with...you don't have much space. I actually really care about, "okay, how good can you get an image network that is that small? Some of the researchers internally are like, "Oh, yeah, I did think about playing around with something like this, but I didn't know if anyone would actually use it so I was not going to publish it. But here's my results. Hey, do you want to sort of collaborate and stuff? " That's the flip side of the full sack. It doesn't have to be this whole thing that's imposed on researchers. Maybe it's just researchers that I interact with and talk to, they're actually willing to talk to me. They're actually really excited to hear about ways their research can be used and to get ideas on the directions, the important unsolved problems they should be thinking about. That really makes their day, when you're like, " Oh, we actually took this and we thought it was interesting and we unpacked it and we used it in this way. Your work actually made a difference. You aren't just a PDF creator.”
Nathan provided insight into how weekly seminars, or tech talks, could be a potential solution to address collaboration-oriented tension points
“At a larger company, like 170 people, what would we do is the different data scientists were assigned to different product teams. But we wanted to get more feedback amongst the people who wouldn't get to talk to each other as much or see their projects. So we did a weekly seminar, and we would invite stakeholders. We would invite the head of finance to come in and just kind of listen. We would ask the people not to be too aggressive about a lot of probing questions, but sort of like, you could engage, but don't try to put them on the spot, because we're really trying to share here. It was more like a graduate seminar that they were having us run. And it worked generally well. There was a lot of great feedback between teams that way.”
Nathan also pointed out how organizational cultural expectations have helped companies like Netflix
“Another really good example that I like is what's happening at the data infrastructure at Netflix. Michelle Ufford is one of the leaders on that. I think that we had a debate about that at JupyterCon. What really struck me was Michelle was really highlighting how important culture is for solving these kinds of problems. I think it's the broad spectrum of what we've been talking across the panel here... is that their examples of teams that have come up with a certain cultural way to approach a problem and get a lot people working together. I think that, trying to find that through roles, or what's the checklist? I think you really had to spread that culturally.”
Nathan also indicated that regulated environments often have cultural expectations regarding collaboration
“I think that there's an interesting trend in the industry where it's the highly regulated environments that have so many requirements, but they also have so much need to try and get something done, that it becomes a precise imperative. We see it in the intelligence community, in finance, in healthcare, in places where you have these outer controls over privacy. But were seeing a lot more interesting evolution in open source. A year ago I would never have thought of that. But it's happening.“
Cross-Training
While earlier on in the conversations about the current state, both Heineike and Warden mentioned being a part of or currently seeing multi-disciplinary cross-functional teams, Nathan referenced how deliberate cross-training within the enterprise can be an effective way to address collaboration-oriented tension points:
“One suggestion that I've seen along these lines was to do a lot more cross-training. A good example that I can reference ... Richa Khandelwal and she's one of the managers for data engineering at Nike in Portland. She gave a really great talk on this at OSCON about how they've taken the people on the data engineering team and put them through data science boot camps so that they can get a flavor for what does it mean to be working with data scientists and vice versa. They're taking the people from one team and cross-training them into another. I think that that's a really good approach to break down some of the walls and also level-set expectations and hopefully get people who are more full-stack.“
These are just a few of the suggestions to address tension points. If interested, there are additional insights within the full written transcript section of this blog post.
What Would You Like the Future State to Be?
After spending the bulk of discussion iterating on identifying and problem-solving for collaboration-oriented tension points. We wrapped up the panel by discussing the potential future state….or what they would like the future state to look like. Unsurprisingly, despite inhabiting different roles, there were similarities regarding what each of the panelists hopes for the future of data science. Each of the panelists cited curiosity as well as providing additional insights.
Heineike: “….data science was kind of becoming a thing when it first got coined. There was a very wide eclectics of people who ended up identifying with the term. People from all kinds of backgrounds, who wandered in because they were curious about some data set. I think as the field matures, it's kind of interesting because there are these formal programs you can go to, there's a lot of people going through machine learning programs and computer science departments. I'm kind of curious to see how this plays out. We’re seeing a maturing of the field and some clearer paths that people can follow. That means a kind of a narrowing in, of some sense, of the kind of backgrounds people have as they come in. And on the other side you got all these tools, and models, and methods that are available that make it actually possible for a really broad set of people to come in and engage with these and bring different kinds of perspectives. I really hope that we can keep some of this variety and leverage the fact that there are these tools and it's great and not end up going down the path where everyone has to come through the same programs and through the same companies and have a single path, so we can keep some of that creativity.“
Nathan: “I love the curiosity part. You think you're going to law school and becoming a practicing attorney, your stock in trade is to be skeptical. We need another profession that's supposed to be curious. And they can counterbalance each other. I like surveys, we do a lot of surveys, and I've seen some interesting talks about people doing stories about hiring data scientists, and analyzing data of what it takes. TapRecruit has been doing a lot of that. And an interesting thing they found is that looking for a “senior data scientist” role is actually not as good as just trying to get somebody in a “data scientist” role. So you’d get people who are incredible environmental scientists, or astrophysicists, or whatnot, applying for the latter instead. And then, trying to look at the requirements for junior data scientists are really interesting because they cut to the basic kinds of traits, like curiosity, like being willing to spend 80% of your time cleaning data. I think that I would like to look less at the higher level and more at what's the incoming path and what are the traits there.”
Warden: “I really like that idea of curiosity as well. One of the reasons I spent a bunch of time when I was at Jetpac trying to find great travel photos and trying to go through billions of Instagram photos, Facebook photos, and Flickr photos to find perfect photos for a particular place. And, we did things like we looked for where hipsters hung out by trying to identify mustaches on people's faces, and things like that. But what I realized was that we had people on our team who were way better than the programmers at figuring out what the guidance should be. But they had to come through us to give us the requirements and have us build the models. One of the things I'm proudest of... is I was able to set up a system that did transfer learning on top of a pre-trained image classifier, where our marketing people could just give it 100 photos in 10 different categories, and it would actually learn to do recognition of like, "oh, you want to recognize dogs in photos? Just give it a bunch of photos with dogs in and a bunch of photos without dogs." And without any programmer intervention, they'd have this model they could deploy in the app and create their own. And really what I'm hoping is that I'm kind of helping to put myself out of a job as a programmer. We have this very weird structure where we have this priesthood of people who understand how to write rules in this very arcane language. We have to minimize all these trivia to be able to speak to computers. If we can change things so that anybody who can give a bunch of examples can select the right kind of model, then just give it a bunch of examples, get something half-decent, and then iterate really fast, that's what I'm really excited about in the future. That's what I'm thinking about. I'm hoping that it becomes something that diffuses throughout the whole organization, rather than being this very siloed and compartmentalized. I'm kind of hoping web service bootstrapping up in the enterprise, like the early 2000s. People would just start a web server, and other people would go to it inside the intranet, and the IT department was not involved at all. And they [IT] got really annoyed, and then they had to eventually give up and just let people do it. I'm kind of hoping that machine learning becomes something that sales, marketing, support, and everybody else just finds easy enough to pick up and solve their problems."
Conclusion
Collaboration between data science and engineering is a known challenge. This challenge has the potential to stymie innovation and hobble the acceleration of data science work. Should we, in data science, just shrug our shoulders and say “That is just the way it is.” or say “This is too hard of a problem to solve. I’d rather solve something else.” Yet, isn’t data science grounded in the idea of solving for previously unsolvable problems? I have heard so many stories, from brilliant data scientists and exceptional engineers, of their frustrations regarding collaboration around developing and deploying models. This is not an insurmountable problem. For example, the panelists have kindly provided different insights to help address this problem. From my perspective, there has been a lack of in-depth analytically rigorous public discourse from multiple perspectives on this topic. Specifically, analytically rigorous discourse that is anchored in the intention to help each other and build off of each other’s ideas. The intention of hosting this panel and providing other insights from multiple industry experts is to actively contribute to the public discourse and work towards addressing collaboration-oriented tension points. If you are interested in contributing to this public discourse, please feel free to contact me at content(at)dominodatalab(dot)com.
Full Transcript
This section provides the full transcript of the panel discussion. The text has been edited for readability.
Ann Spencer, Domino: Hello. Thank you so much for spending your evening with us. I appreciate the time and especially I want to thank our panelists as well. My name is Ann Spencer and I am the head of content here at Domino. Domino is a data science platform that focuses on accelerating data science work. It is my responsibility is to ensure that our content reflects one of our core values. One of our core values is to seek truth and speak the truth. It's extremely important to us that our content is infused with this value and that it's woven throughout. Because our intention for our content is to provide a high level of value as well as to spark conversations, really honest, candid, respectful conversations about how to move data science forward. In a previous life, I was the data editor at O’Reilly Media, focusing on data science and data engineering, which is how I know the panelists. In a previous life, I collaborated with them and I am very thankful that today my current role at Domino that I can continue to collaborate with as well as be inspired by them. I would love it if each of you could take a moment to introduce yourselves to the audience.
Paco Nathan, Derwen: Hi, my name is Paco. Paco Nathan. I'm with Derwen based in Sebastopol. I also used to work for O'Reilly up until fairly recently. My background is in machine learning and I led data teams within the industry for a number of years. Then I went off to become an open-source evangelist for Apache Spark for a while. Mainly I've been working with Strata Data Conference and AI Conference; with conferences worldwide. I really like to try to get a birds-eye view of what's going on in the industry and talk to a lot of different teams and find out they are up to.
Amy Heineike, Primer: Hi, my name's Amy Heineike, VP Product Engineering at Primer. My background is mathematics and I was a data scientist originally. I was reflecting on this earlier and my background as a mathematician led me to Quid, then Primer, which started four years ago. We're building machines that can read and write text, including large corpuses and unstructured data, summarized insights into helpful reports.
Pete Warden, Google: I'm Pete Warden. Tech leader of the Mobile Embedded side of TensorFlow and I had a startup called JetPac that was acquired by Google about four years ago. I really enjoy that with TensorFlow, I get to work with loads of Google teams and also loads of external teams who have all sorts of really interesting data problems I get to learn all sorts of stuff and occasionally help some of them.
Ann Spencer: Thank you very much for introducing yourselves. So, tonight's panel's topic is about collaboration. Specifically, collaboration on developing and deploying models. The structure of our conversation this evening; we're going to kick it off with the current state of collaboration around model development and deployment. Then we're going to talk a little bit about the potential future state or what people hope will be the future state. Then we'll open it up to some questions from the audience. Just a housekeeping thing, I do ask that if you decide to put forth a question, it would be great if your question is delivered within a single breath, as well as end with a question. I feel like that's not going to be a problem for this audience but I think we've all been there. Also, we will close out the donations for La Cocina and the raffle tomorrow to give everyone an opportunity to donate. I definitely appreciate that we've raised hundreds of dollars for La Cocina so far, if you would like to participate in the raffle to win, a bronze level pass to Strata, please feel free after the event. Let's kick it off with a discussion about the current state of collaboration around model development and model deployment. Based on everyone on the panel's extensive experience in data science and engineering, what are some of your thoughts about the current state?
Amy Heineike: One of these that we've done at Primer, we kind of talked about this a bunch [gesturing to Pete Warden], is we've ended up bringing in people who could bridge data science and engineering. We’ve called the team “product engineering” that includes people who know how to build machine learning models, know how to do data science, have a bit of product intuition, and know-how to put things into production. I think for us, what we've really thought about is how we can make it so that it's actually a quite easy process for a team to (individual, or a small team of individuals who share their skills) be able to solve an interesting data problem and then actually ship the feature. That's been really fun and actually quite liberating that we can have the ability to contribute. I should say, there's a couple of people from Primer here. We're hiring like crazy at the moment and they are super smart and really fun to talk to, you should chat with them afterward if you're curious about anything.
Pete Warden: I think you mentioned full-stack machine learning earlier.
Amy Heineike: I did say that, yes.
Pete Warden: That's a phase I really liked when me and Amy were chatting earlier about this, was the idea of having full-stack machine learning. Because what I've seen, as well, is that if you don't have ownership of what the end-user experience is, you just end up building academic models that are great for getting metrics in something like ImageNet but you don't learn all of the things you need to learn to turn that into a really effective model for end-users. Having that combination of product and research skills; it's a superpower. You're actually able to get products out to people in a way that a lot of teams who don't theoretically know machine learning, may struggle with if they don't have that kind of experience. Trying to encourage people to actually take ownership of the whole process has been really important.
Paco Nathan: I'll take a perspective from maybe the other side because like you said, I think these are very good examples. I'll take a perspective from a lot of the companies that I talk with and case studies that are introduced where, within the enterprise, there is quite a division between data science and data engineering teams. They are almost siloed. It seems like a year ago I would have heard the argument between, say, people wanting to do Python or R on the data science team and the data engineering team would want to recode everything in Java. There's been that going on for a long time. I believe that's changed and we're seeing a big uptick on, for instance, Jupyter being used in fairly and stodgy environments and bridging the multi-language gaps and I think Arrow and others were probably contributing to that as well. I hope the silos are coming down.
Ann Spencer: Speaking of silos, it sounds like with silos there are definitely tension points that can potentially arise from collaboration. Would you like to speak a little about that?
Amy Heineike: Sure. One of the things that we've really learned is if you think about building a data product….and that's assuming something interesting that pops up in the data….it's a huge amount of work to build a data-driven product… a small slice of that is building machine learning models. The rest of that whole pipeline is broad.... It includes “how can we define the problem we’re actually trying to solve?”, “what's the raw data that we're going to be running on?”, “How do I clean and prepare that?”, “what labeled data can I use for training the algorithm? Do I have to bring in a team of people to do the labeling, and what tools do I need to build to enable them?” All of that stuff. Then, there are all these questions about “how does that model fit into my overall application and codebase? and then the outputs of it; “where does it fall into the user experience? and what things do I need to show so that people understand the outputs of the model to make it interpretable?” There's this whole universe of stuff that isn't building the model but it's very related to building the model. It is technical and challenging to figure out. I think it's useful to think about who's responsible for these different pieces. ….“I think if you have teams that have a very narrow view and that are only excited about a slice of that pie and think that it's somebody else's job to do the rest of it…or, if you're in a situation where they can't touch the rest of it because it's somebody else's job and they’re not allowed to get into that, maybe there’s a different codebase that they can't touch, that seems like it's going to be the place where the tension comes in. I think it's interesting just reflecting on them. Reflecting on this broad sort of questions and skills that are required to pull together something. It's interesting. One thing that I've really enjoyed is realizing how many different ways there are at looking at some of these problems from an engineering lens; mathematical, machine learning lens or UX lens, and getting it to work on it at Primer. We started four years ago. Four of us in a room, two broken computers, somehow there's 55 of us now. I look around the room and see all these people with deep skill sets, all these different areas, and actually, very eclectic, different kinds of backgrounds. Just seeing that all come together and realizing that as you look back, it took the astrophysicist to optimize this and it took the engineer who actually was on Broadway for a year before she went back to the masters to think about the language generation. It takes all these kinds of eclectic people that come together to do it.
Pete Warden: What this actually made me think about was a lot of the successful teams I know within Google which is the Mobile Vision team and what they've done; they're the team that produced MobileNet which is one of the industry-leading models actually doing image recognition and image classification in a really, really small footprint so it fits well within the mobile app. But what's really interesting about their approach is that they are this ... again, coming back to the full-stack idea, they have Andrew Howard who is the person who came up with this really groundbreaking architecture, driven by the requirements of, "Hey, we need something as accurate as possible that's as small and fast as possible as well". So, relative engineering trade-offs. But he's actually sitting and working with the team who actually collect novel kinds of data. They're actually trying to figure out better ways of labeling data, coming up with ways of getting rid of incorrect labels, getting more data, and figuring out when they need more data. In the exact same team, he's working with Benoit Jacob and some other people who are doing these low-level assembly optimizations to make this stuff run really, really fast on a phone. They're doing all of this ARM NEON work and that's actually feeding back into the design of the model because they're like, "Hey, if you have things that are multiples of eight in this layer, that means that my assembly routine can actually run a bit faster". And the other side of it is, if they can actually make the model more accurate by including the training data or filling out the labeling, then that means they can keep the accuracy at the same level, shrink the model, and see the accuracy degrade from a higher level so they can actually improve latency by improving the labeling of the data set. When you have a team that is unified around a set of requirements and everybody's immersed in all of those requirements, you get these really amazing results out of it. That's the opposite of tensions but it's a good counterpoint to the teams where everything is set.
Paco Nathan: These kinds of cross-teams are multi-disciplinary. This is the right way to go. What I see driving the other direction a lot in the enterprise is there may be an entrenched incumbent team that does business intelligence and has done this for years and years and they own their data warehouse. And some team trying to do different techniques, obviously what they're going to do is create a different team, give them a different role, different names. Then you have competing teams inside the same organization. Now we're introducing much more learning and exciting things that are beyond the scope of the data scientists in the enterprise are doing. So, now they're introducing a new role for machine learning engineer, having a new team, and it will bifurcate. I think that's one of the kinds of tensions that happens in enterprise but on the flip side of it, there are reasons why organizations make a choice to silo it, because of compliance issues — in some places in Finance, we're seeing where there may be a team that works with the data and develops the models that they have to hand off to another team to audit. Literally different people look at it before it goes into production, and they firewall the teams so they can't work together. It kind of destroys that interdisciplinary thing but it's how we're trying to grapple with compliance.
Ann Spencer: Just to unpack the tension thing a bit, I remember when I was the Data Editor at O'Reilly years ago, I remember there were many data scientists that were coming to me at the time that had indicated that they just got their Ph.D., they were just starting out and there was a moment of tension of misplaced expectation. Right? "I thought I was going to work on this, then on the start date, that's not what I'm working on". At the time, I remember there were programs like the Insight Fellows that came up to help with that. Are you seeing anything similar currently at conferences or anything about expectations for people who are actually entering the field versus what they're experiencing once they start?
Pete Warden: One of the things I really like at Google is that there is a very good culture and expectation that we will be doing engineering. On the research side, there's very much an engineering culture which isn't about, "Hey, you can't do research" but it's about thinking about all the trade-offs to get towards an eventual goal. I've seen it be a really supportive environment where you can ... I was talking about Andrew Howard and he's publishing stuff that at NIPS and things like that. He's doing these groundbreaking architectures but he's also spending a bunch of his time listening to client teams and talking to the people who do the NEON assembler and understanding, what are the goals?, what are the options I've got here to trade things off? and that can be very hard to explain to somebody who's coming directly from an academic background. And I think, we talked about Andrej Kaparthy, he's got a great slide. He's the head of AI at Tesla. He has a great slide of when he's doing his Ph.D.; he just has his pie chart while he shows ... 90 percent of his time, he was thinking about models and 10 percent at the time, he was thinking about data. He has this chart and now he is in industry, he spends 80 percent of his time thinking about the data, and purely because in the academic setting, the data set is a given. You're trying to compare your results against other people so you have to all agree on a data set and how to be stable. Whereas most of the time in the industry, the way you can actually make improvements is by updating, improving, changing your data set, changing the labels and that can be a real shock to the system because it's a whole different skill that it's very, very hard to pick up in academia. I see that as one of the biggest shocks.
Amy Heineike: Yes, that's right. I think it comes back to this question of a really broad range of work that needs to be done beforehand to pull off a data-driven product. I think we've talked before; there are a couple of people out there who've managed to pull off a role where they're really spending the whole time thinking about more model architectures. There are only a few people who can manage it and I think there are only a few problems out there that have people super dedicated making these models but I think most of the work that's being done requires this kind of breadth of engagement of sub-problems. I think they're super interesting but they're just different from the ones that people are expecting.
Paco Nathan: One suggestion that I've seen along these lines was to do a lot more cross-training. A good example that I can reference ... Richa Khandelwal and she's one of the managers for data engineering at Nike in Portland. She gave a really great talk on this at OSCON about how they've taken the people on the data engineering team and put them through data science boot camps so that they can get a flavor for what does it mean to be working with data scientists and vice versa. They're taking the people from one team and cross-training them into another. I think that that's a really good approach to break down some of the walls and also level-set expectations and hopefully get people who are more full-stack.
Ann Spencer: I remember some of our other conversations as we've ... it sounds like we're definitely problem-solving to address some of the tensions, right...and I remember Amy had this great framing of "being curious" or "being more curious about the problem space". Would you care to unpack that a little bit?
Amy Heineike: Yes, I think as we get into the problems if you can really embrace the fact that there are sub-problems, some great sub-problems, and if you can delight in that, then I think you've got a very great, happy career in front of you. I think it comes down to that. There's this eclectic piece here ....we've actually got quite a few computational scientists on the Primer team, Anna is a Ph.D. in computational chemistry work [points to audience member], people who became interested in algorithms because they wanted to help crack a problem. I think if you look at the user problems as this interesting thing to crack if you find yourself following all these different paths, for a while you're immersed in UX, and you're like, "Oh, how did users think about machine learning outputs." Maybe they don't understand them the way I understand them and have geeked out on the model for a long time and having bridged this. Or, you suddenly, find yourself getting interested in how to deploy the code more easily or scalably. The more curious you get about more of these pieces and I think that's kind of fun. You can kind of bring in different people who have resources put together, work together, and have fun.
Paco Nathan: Just kind of a riff off of it.... hiring for data science teams, that was always what worked better when we were staffing.... was to go out to especially physical sciences, or physical engineering, hiring up people who would have been on some track, say Aero/Astro or I hired people out of Physics a lot. Because they had a lot of natural curiosity. There's also a lot of great ways now that the hard sciences.... and what we're seeing in data science....are helping to inform each other. Because I'm sure that a lot of you have seen it, but there's a big dialogue now about reproducibility in science. Some of the tooling that we've developed out of this process of data and analytics is now feeding back into more traditional scientific research to help make research reports more reproducible. There is a dialogue about how can we put this not only in science but also in enterprise and industry to make results across an organization reproducible so that you're not arguing over the results you're seeing.
Pete Warden: And just to riff on the reproducibility side....that's also a big challenge with a lot of the machine learning papers that are out there. I think maybe if you have might pick up a machine learning paper and just tried from just reading the paper, to actually reproduce results, it can often be really really tricky. It's really tricky even for people inside Google sometimes. And we have our own, reproducibility problems that on the machine learning side we need to really improve the tooling and improve the expectations. And part of the challenge is, that some of these papers, the data set is very hard to make it available. So, that really sort of stymies a lot of our attempts.
Paco Nathan: Do you have requirements then? If people are publishing something inside of Google? Do they need to attach their code? Or have a URL for some repo? How does that work?
Pete Warden: I always like them to use TensorFlow [delivered with a grin and chuckles abounded] I mean seriously, it's highly encouraged that there be some sort of code artifacts. I don't think it's necessarily even written down. It's just understood that to be useful to the world, even to other people in Google, that kind of needs to happen.
Paco Nathan: More and more now you're seeing research papers using Zenodo [https://guides.github.com/activities/citable-code/] and other ways of being a reference on, a Github repo on paper, or even being links that can be updated after the media is published. I know that Fernando Perez published about that recently, sort of tips for keeping a report sustainable.
Amy Heineike: I would like to take a slightly different take on the paper question. Because one of the things that we've found is that there's sometimes a really cool paper that comes out, and when you unpack it you realize there's a ton of complexity in there that you don't need that you kind of throw away. The key to reading the paper is trying to figure out what was the nugget insight that they had? And then you riff off that. I feel like there are a lot of ideas happen because of that. In a way the reproducibility, it doesn't matter if you reproduce it exactly. What we wish we had a better way to know whether we're going to be super disappointing by a paper or if it turns out it was really cool. There's nothing cool about going into the publication and then actually when you strip it down, there's nothing there for you to go on.
Ann Spencer: In the vein of addressing some of the tension points, specifically about papers, I remember prior conversations people were talking about trade-offs....about a support network for getting published and access to researchers and things like that. Would you be willing to unpack that a little bit?
Pete Warden: One of the things that I've seen as most encouraging to researchers is people being very clear they need what the researchers are working on, actually giving them direction and saying, "Look if you could solve this problem, this would be a really big deal." One example is, I'm doing a lot of work on some of my microcontrollers and like trying to do things really tiny negative footprint devices. In order to do any image recognition on that, you only got like 250 kilobytes of RAM to play with...you don't have much space. I actually really care about, "okay, how good can you get an image network that is that small?" Some of the researchers internally are like, "Oh, yeah, I did think about playing around with something like this, but I didn't know if anyone would actually use it so I was not going to publish it. But here's my results. Hey, do you want to sort of collaborate and stuff? " That's the flip side of the full stack. It doesn't have to be this whole thing that's imposed on researchers. Maybe it's just researchers that I interact with and talk to, they're actually willing to talk to me. They're actually really excited to hear about ways their research can be used and to get ideas on the directions, the important unsolved problems they should be thinking about. That really makes their day, when you're like, " Oh, we actually took this and we thought it was interesting and we unpacked it and we used it in this way. Your work actually made a difference. You aren't just a PDF creator."
Paco Nathan: There's a really interesting case study that we came across through a Jupyter Conference. It's from DoD. I want to bring it up because it's a different kind of borrowing. It's similar in the sense that people are researching things that they would like to share. Only, within the intelligence community, you can't just walk over to a person in the next cube and share results because you could go to prison. They had to find ways to allow to continue us to have that cross-collaboration without violating the really strict data privacy controls. There's this product called nbgallery [https://nbgallery.github.io/] that came out of In-Q-Tel's Lab 41. Dave Stewart, from DoD, has been doing talks about this. They were a little sheepish at first when they approached us because they were like, “Do you think anyone outside of the DoD would be interested?” Frankly, any bank should be interested in this. They came up with ways to have a problem statement and a solution represented in a notebook. And then automatically be able to tear out any data in the notebook, such that other teams could look at it and use it. Ostensibly they would put other data in and still have the security compliance. What's strange about this, is now you have the intelligence community committing their code back on GitHub. And like developing things for Jupyter that are being put back in the platform and then companies like Capital One or others are going in and adopting. I think that there's an interesting trend in the industry where it's the highly regulated environments that have so many requirements, but they also have so much need to try and get something done, that it becomes precisely there: we see it in the intelligence community, in finance, in healthcare, in places where you have these utter controls over privacy. That’s where we’re seeing a lot more interesting evolution in open source. A year ago I would never have thought of that. But it's happening.
Amy Heineike: Yeah, for us, one of the design principles we have for the products is that, we're taking lots of text and summarizing it, everything that we show should have the ability to click through and see where it came from. And that's to show you that it is a really interesting thing... where it came from. Maybe it came up as a fact that came up within lots of documents. One of the problems we've thought, okay, if it came up in different places ... what are the different ways that it showed up? and which one of those ways it shows was representative? It's a lot of interesting sub-problems that come up as you try and solve things.
Pete Warden: Just one more thing about motivation. I was thinking about this when you were talking about that. One of the things I actually like to do is...I am not great at building models....I like putting together something that's like "hey, it's my simple speech recognition thing" and have all of the infrastructure is like "hey, here's a data set I've created, here's some metrics I've created" and just kind of put it out there. With the knowledge that people are much more focused on the model side than me, their eyes will light up, and they will be "ah, we can do way better than this." They don't have to worry about all the painful stuff with data cleaning. data gathering, and all this other stuff. That's been a good way of motivating. That leads on to things like Kaggle. It's a data competition framework site where you can compete to see how well you can solve machine learning problems with thousands of people across the world. You get these ready-made columns with metrics that all these companies have put up. I found that to be a really great way to sort of get into machine learning and learn the real practicalities of it.
Amy Heineike: Pete's blog [www.petewarden.com] is about some of the best little tutorials. I think one of my favorites was when you get a cat detector. [laughter]. A few lines. Just a few lines.
Ann Spencer: On that note, since we're seeming to provide practical advice for people, is there anything else that you would like to relate in terms of best practices for people, or people just starting out when they are trying to problem-solve some of the collaboration they're seeing?
Paco Nathan: Another really good example that I like is what's happening at the data infrastructure at Netflix. Michelle Ufford [https://twitter.com/MichelleUfford] is one of the leaders on that. I think that we had a debate about that at JupyterCon. What really struck me was Michelle was really highlighting how important culture is for solving these kinds of problems. I think it's the broad spectrum of what we've been talking across the panel here... is that their examples of teams that have come up with a certain cultural way to approach a problem and get a lot of people working together. I think that, trying to find that through roles, or what's the checklist? I think you really had to spread that culturally.
Amy Heineike: I think within the industry where everything is changing all the time, and the one constant is feeling like you don't know what you're doing. And you have to keep learning stuff. I feel like every six months I kind of take stock and realize there's some enormous new thing that I have to learn that I have no idea how to do. And there's normally like a key moment where you're like, "okay. I got this." I think if you can embrace that, "okay I'm always going to be learning and I'm going to be doing something new and that's okay." And also realizing everyone else is in that boat. There's going to be some things that you know, and there's going to be a bunch of things that you don't know. And if you can be pretty humble about saying, " Actually I don't know what you're talking about." when they tell you something they're very excited about, and you're waiting for the explanation so that you can learn from each other quickly. I think that's truly key.
Pete Warden: On the practical advice, to echo that, if you can find the right material, getting to the point where you can do something useful with machine learning can be surprisingly quick. The traditional academic path which is the Ph.D., and things like that, and that is super valuable if you want to really dive deep into the research, be creating new models from scratch and things like that. But for getting started and actually using the stuff, there's loads of great material out there that you can just pick up. I think Fast.ai have great free courses which are sort of very engineering-focused and very practical. There's all sort of really great solutions out there. Don't be put off by thinking oh I have to learn all of this jargon. There are lots of ways you can dive in and start playing with stuff, and learn by doing it.
Paco Nathan: Can I say something about Agile? So that's one thing I encounter a lot, especially with enterprises because you mention anything about engineering or data and you are expected to hear the word Agile in response. It's sort of this echo-response thing. And so there's like a generation of management that's grown up on this, right? I'll reference David Talby, who I really like his talks out of Pacific AI. He's done a lot of work with NLP in healthcare and work in hospitals. And he comes in and shows these failure cases of when they deploy models that do the opposite of what they intended right after deployment. When you're thinking about an agile team — whether you're using an agile, kanban, or whatever methodology variant. When you're thinking about building a web app, you put your really senior people on the problem upfront, so your architects, your team leads, they will get involved very early in the process. And then as the product matures you get more and more people involved who aren't as senior. And so they're building different unit tests and smaller features. There's this kind of maturity curve versus what's the level of the talent. One of the things that David Talby brought up is, with machine learning, with the data science parts being put into production, it's basically the opposite. If you just have a data set you want to create a model, that's like a homework assignment. You can do that off the shelf. Once you get models deployed in situ, in production, and you start seeing all the weird edge cases that nobody ever anticipated, and you have to troubleshoot those, that’s where the really senior people have to come in, who have that expertise. You can start out small, and really bring in big guns later on. In my perspective, what is data science? There are components of machine learning, data munging, and stats that can get really weird — I had to deal with a lot when I was an undergrad. It's that troubleshooting of machine learning models in production where understanding some real depth in statistics is where you can avoid being in trouble as a company. And not a lot of people have that competence. I'll say that it's not agile, it's the opposite. I don't know what agile is spelled backward but it beats me. [laughter]
Amy Heineike: I think one of the interesting things that seem to be happening is..... so one day data science was kind of becoming a thing when it first got coined. There was a very wide eclectics of people who ended up identifying with the term. People from all kinds of backgrounds, who wandered in because they were curious about some data set. I think as the field matures, it's kind of interesting because there are these formal programs you can go to, there's a lot of people going through machine learning programs and computer science departments. I'm kind of curious to see how this plays out. We’re seeing a maturing of the field and some clearer paths that people can follow. This means a kind of a narrowing in, of some sense, of the kind of backgrounds people have as they come in. And on the other side you got all these tools, and models, and methods that are available that make it actually possible for a really broad set of people to come in and engage with these and bring different kinds of perspectives. I really hope that we can keep some of this variety and leverage the fact that there are these tools and it's great and not end up going down the path where everyone has to come through the same programs and through the same companies and have a single path, so we can keep some of that creativity.
Paco Nathan: I'm curious, would you have a chief data scientist at your company? Would you go that route? Like at an executive level. This is something we've been asking a lot of companies. Or would you rather try to have it diversified across teams? That sort of leadership.
Amy Heineike: At Primer, all our products are data-driven, so I think most people, the majority of the engineering team has some kind of computational algorithm experience. It's a very mature [company] question.
Pete Warden: I don't know. My first instinct is it sounds like having a Chief Programming Officer, you know? [laughter] It is like this skill that should be kind of like this horizontal skill. You hope that people in management kind of actually understand. I can't say whether they do.
Ann Spencer: It sounds like we're slowly moving into this future state with Amy discussing potentially what teams could look like. Right? Including Paco riffing on the potential of a chief data scientist role. [laughter] So, based on that. Let's go to the future state. Amy touched a little bit about what she would like to see. What about everyone else in terms of the potential future state? Either what you think is going to happen or what you would like to see.
Paco Nathan: I love the curiosity part. You think you're going to law school and becoming a practicing attorney, your stock in trade is to be skeptical. We need another profession that's supposed to be curious. And they can counterbalance each other. I like surveys, we do a lot of surveys, and I've seen some interesting talks about people doing stories about hiring data scientists, and analyzing data of what it takes. TapRecruit has been doing a lot of that. And an interesting thing they found is that looking for a “senior data scientist” role is actually not as good as just trying to get somebody in a “data scientist” role. So you’d get people who are incredible environmental scientists, or astrophysicists, or whatnot, applying for the latter instead. And then, trying to look at the requirements for junior data scientists are really interesting because they cut to the basic kinds of traits, like curiosity, like being willing to spend 80% of your time cleaning data. I think that I would like to look less at the higher level and more at what's the incoming path and what are the traits there
Pete Warden: I really like that idea of curiosity as well. One of the reasons I spent a bunch of time when I was at Jetpac trying to find great travel photos and trying to go through billions of Instagram photos, Facebook photos, and Flickr photos to find perfect photos for a particular place. And, we did things like we looked for where hipsters hung out by trying to identify mustaches on people's faces, and things like that. But what I realized was that we had people on our team who were way better than the programmers at figuring out what the guidance should be. But they had to come through us to give us the requirements and have us build the models. One of the things I'm proudest of... is I was able to set up a system that did transfer learning on top of a pre-trained image classifier, where our marketing people could just give it 100 photos in 10 different categories, and it would actually learn to do recognition of like, "oh, you want to recognize dogs in photos? Just give it a bunch of photos with dogs in and a bunch of photos without dogs." And without any programmer intervention, they'd have this model they could deploy in the app and create their own. And really what I'm hoping is that I'm kind of helping to put myself out of a job as a programmer. We have this very weird structure where we have this priesthood of people who understand how to write rules in this very arcane language. We have to minimize all these trivia to be able to speak to computers. If we can change things so that anybody who can give a bunch of examples can select the right kind of model, then just give it a bunch of examples, get something half-decent, and then iterate really fast, that's what I'm really excited about in the future. That's what I'm thinking about. I'm hoping that it becomes something that diffuses throughout the whole organization, rather than being this very siloed and compartmentalized. I'm kind of hoping web service bootstrapping up in the enterprise, like the early 2000s. People would just start a web server, and other people would go to it inside the intranet, and the IT department was not involved at all. And they [IT] got really annoyed, and then they had to eventually give up and just let people do it. I'm kind of hoping that machine learning becomes something that sales, marketing, support, and everybody else just find easy enough to pick up and solve their problems.
Amy Heineike: There's kind of another part which is about the kind of adoptions. One reason it's been interesting working with all kinds of clients is actually realizing sometimes how hard it is for somebody looking at the results from the machine learning algorithm to understand how to think about it. So how to understand precision-recall, for example. It turns out precision-recall is not that easy to explain to someone who wants a system that's going to spit out results for them. It's actually conceptually quite hard. It will be interesting, as models and as systems built on the data used by people in normal situations... so they've got their Google Assistant at home, right. They've got Siri, and they're starting to think how do I understand this? There are a lot of logical questions we'll have to grapple with. When the algorithms go wrong. While it is fine that the Google search doesn't find you the thing that you wanted immediately, there are plenty of systems.... including banking compliance...a lot of serious stuff where we need to be careful. I think there's a part of this which is about getting more people involved because we can build. And there's probably another part, which is getting more people involved, so they can bring their expertise, help us think about the implications of what we're building, and what that means for the users of our algorithms.
Ann Spencer: Let me just do a quick time check. I think we're about ready to go to questions. But before I do that, does anyone have any final thoughts about collaboration around model development and deployment, before we open it up the floor to questions?
Paco Nathan: Okay, so one trend out of the O'Reilly conferences recently: deep learning has done so well at these conferences, with so much about models and machine learning, but now we're starting to see it branch out more, where it's not just about the learning part. I mean, there's a lot of other aspects to intelligence and cognition, like understanding context, being able to do things like schedule. There are a lot of other areas that could be really interesting, so I am really hopeful that we get a lot of people working on the learning parts. But then also, again, as a computer science discipline, go after the harder problems. Like when we were in "AI research" in the early 80s, AI was about something with a much broader context, and the learning part was just a fraction of that. So I'm hoping that we'll solve that and move on to harder things.
Ann Spencer: Let's open up for questions. Let's see if they can get the mic situation going. Since you're holding it, do you want to ask a question?
Southard Jones: Sure. I'm Southard. I run marketing here at Domino. As a marketer, the question I had is, I have a lot of conversation about the collaboration between data engineering, data science, folks who are deploying models, folks who are developing models. What about the guy like me in marketing that doesn't understand anything? I mean, seriously. What about line of business? Where do ....[mic went out]
Ann Spencer: How do you see them fitting in? How do you see the business stakeholder fitting into this whole thing?
Paco Nathan: Mainly we can start to define what are the metrics on the risk side and what are the metrics on the performance side. The KRIs and KPIs can modify the business rules that way? And the answer is, in a lot of ways yes, but probably not entirely. How does it really fit? How can you get more transparency into what the machine learning products are doing? I think that we'll have to grapple with that. I think that's a real hard problem for the exec layer and the board layer right now. People who make decisions are having machines join them in doing decisions and how do we trade that off?
Amy Heineike: I think we kind of have a duty as people who build the algorithms, to figure out how to explain them and talk about some of these core issues. I think labeling is one interesting example, where the model is reflecting back new information from patterns that were in the labels. You know, that's a surprising way to think about it, and it's not obvious, if you're coming into it, what this data stuff is. But yeah, I think that is something we should take very seriously is figuring how to explain this to people engaged.
Pete Warden: I think a really important role is you, you know, the people in marketing, the people on phone support as well, are the people who actually know what our customers need way more than anybody who's sitting at their desk, let alone doing model creation, or coding, or anything like that. And the more of a chance you actually get, and the more chance you're given to actually just sit down with the people who are labeling the data, who are building the models, who are writing the metrics to evaluate how well the models are doing, and say look, "I know you haven't been thinking about this, but we have this massive problem. Everyone is complaining about, you know, whatever it is, and your model isn't helping. " And then, a lot of the time, people will go "oh, oh, okay," that's... If you can actually... and it shouldn't have to be on your shoulders to do that really, but unfortunately it often is.
Paco Nathan: So there was an interesting talk at Velocity, where one of the machine learning people at GitHub (Ojomu Miller) gave a keynote [https://www.oreilly.com/ideas/a-new-vision-for-the-global-brain-deep-learning-with-people-instead-of-machines], and she talked about deep learning models and the kind of architecture you see in hierarchies, different layers. And then she took the org chart and rotated it 90 degrees, and did the same thing, and said "We're used to having organizations of people who have a similar kind structure. Machine learning models look at it flowing one way, but with people organizations, we look at it flowing the other way. But it should be similar". And what that says is that there's a lot of feedback loops. Looking at how we deploy machine learning into organizations for customers, I think you have to think about your people and their domain expertise. And you think about what your models are doing, and you think about your customers, and all three of those have two-way feedback between them. Your people are helping to train the models and should be learning from them. You should be like, aggregating what your organization knows in models to some degree. The feedback from the models should reflect that, and if it doesn't, then there's a problem. And likewise, the models are interacting with the customers through a product, and if that doesn't work, then you get a support call, right? And hopefully, you've also got professional services, salespeople, and whatnot, that are talking with your customers. You've got these two-way dialogues going on at each touchpoint. And I think that if you look at a thing not like a linear system — here is some input, here is a black box, and here is some output, — but if you look at a thing as almost like an organism, with a lot of feedback loops, and a lot of dynamics, then you're in a much better way for the people who know how to run the business to actually really be interacting with this in a really healthy, dynamic way.
Audience Member B: So I was going to kind of flip that question on its head and ask if you're sitting in the data science or engineering box, how can you foster that kind of two-way feedback between you and... I am a data scientist.... you and the engineers, or vice versa, or between yourself and the product, or marketing, or something like that? Even what we've been talking about today, like being a full-stack data scientist, or something else.
Ann Spencer: Who would like to answer that first?
Amy Heineike: One thing that we've found that has worked well at Primer, having product teams where you have the data scientists, the engineers, the product manager, all in a team together. They're going to be interacting with each other every day and getting to talk directly. I think there's probably going to be an organizational thing that could help you....
Audience Member B: We're a startup so we are tiny
Amy Heineike: … So you're sitting next to each other, and it's a good thing. You've just got to talk a lot more actually. You want the data scientists to not be intimidated to go read the code for the application, and then to start asking the questions about it, and getting to explain what it is that they're stuck on. And then the engineers to be asking questions about the data science pieces and what it is that they’re stuck on. And then getting that dialogue going.
Paco Nathan: At a larger company, like 170 people, what would we do is the different data scientists were assigned to different product teams. But we wanted to get more feedback amongst the people who wouldn't get to talk to each other as much or see their projects. So we did a weekly seminar, and we would invite stakeholders. We would invite the head of finance to come in and just kind of listen. We would ask the people not to be too aggressive about a lot of probing questions, but sort of like, you could engage, but don't try to put them on the spot, because we're really trying to share here. It was more like a graduate seminar that they were having us run. And it worked generally well. There was a lot of great feedback between teams that way.
Amy Heineike: We run tech talks within Primer, we do this... every other week we do ones where the machine learning team gets to present, and then we have company-wide ones. Everybody's been really excited about them, especially the tech talks, the sales guys would show up, and everyone would show up, because a lot of people are at the company because they're excited the fact that we can really pursue models that are state of the art and addressing these super interesting problems. I think sometimes that kind of forum is a nice thing to do. To kind of present, or maybe talk about interesting challenges we've run into, and things like that.
Pete Warden: This is a bit different, but one of my favorite ways, especially when we were a small team, of actually product prototyping was something called Wizard of Ozzing, where you actually have a man behind the curtain instead of a bottle. Something on the other end of like, a chat, or you know, who's getting like a screencast but doesn't actually get to be in. So the people... basically has to pretend to be the model, given what they're actually seeing on screen. It's both kinds of like a really fun party game. But it really, it's like the iterations you get on the actual product. Because oftentimes, the product team won't, or the product person if it's a start-up, they won't really understand all that they have to think through and mentally model like kind of what a machine learning solution is actually able to do. If you can show them even a person with 25 years of being alive in the world, can't solve this problem with the information they are given, then we're probably not going to be able to train a model to do that. And get them thinking about "okay, what do we actually need to produce?". And yeah, so that's one of my favorite ice breakers for that if you want to.
Southard: We have it [microphone] working now.
Ann Spencer: Oh, lovely. There was a question on this side, and I don't know exactly where it was.
John: Hi, my name's John. Question about that, that troubleshooting tuning cycle right after production. Any guidance on how long that lasts? Does it converge? Any sort of variance across production targets? You know, like APIs, web apps, or what about sort of embedded targets? And then as you do updates, what's the expected behavior of it down the road?
Paco Nathan: A lot of our things are really important. The first part's easy. It's ongoing. I mean, if something is operational in your business, it should be ongoing. You should be monitoring it. It kind of feeds into the second part of the question. Once you deploy these kinds of products into something automated, or something automated with your customers, they're going to change the interaction with your customers. We are finding ways to change the data that's coming in. You're going to expect that it's going to have to be tuned and calibrated. But I don't think it stops. And I think that that should be the perspective on the operations, like C-Level kind of operations of the business, that this is part and parcel to what you're giving.
I would like to think of any kind of problem that we're approaching now. I think I can speak generally, because pretty much any company is into technology now, and any company has to deal with data now. I think that we should look at these problems where we're going to solve them with a team of people plus automation. And for any given problem, it would be a trade-off. Some problems will have fewer people at the start. Some will have less automation at the start. But I think having that as a baseline and expectation, and the models that we deploy are going to be ongoing, tuning, calibrated. We're just barely being able to understand the math now for what we call fairness. And that's going to be a very long dialogue, with a lot of compliance around it. And that's another area where this is going to be ongoing. Does that-
John: Yeah.
Amy Heineike: I think one point on that is that data always changes too because the world is always changing. Even if the model gets rebuilt at one point, it's not necessarily going to carry on being a good model.
Pete Warden: And yeah, at least when I was at a start-up, we used to joke that the ultimate metric that we really care about in models was the app store rating. So...
John: And what about the targets, right? I mean there's sort of a difference between something that's deployed in a vision system in a car and a global app, right? Any sort of wisdom or experience there?
Paco Nathan: Well, speaking... Some of my background was in ad tech, and so being at a large ad network, and the executives were always asking what's the lift. And you had to prove that, and they were kind enough to put monitors, big LCD monitors, on top of every team's area, so that you could report your hour-by-hour metrics, and they could walk around the building and just sort of watch and point at people who weren't performing quite as well at that hour. There's no pressure.... but yeah, I think that ultimately you have to get past some sort of initial deployment phase, to where you're showing what kind of lift or what kind of baring this has on mitigating risks, or whatever it is your intention is, your objectives are. And hopefully, you get to some point where that seems positive. You get some turn-around investment. And if you don't reach that, then there's probably a bigger problem. Or at least that's how I look at it. Does that answer?
John: Mm-hmm.
Ann Spencer: I think we're actually over time, but does anyone have a final question that they want to squeeze in? Oh wait, has that... Do you need it closer, or are you good? Okay. Let's see if the mic works.
Audience Member C: Yeah, when you think of the latest in the upcoming global focus on automated user-friendly media assistants like Google Cloud, AutoML, do you think that they won't mature in a couple of years, or do you think that you would say these data scientists jobs are in trouble?
Paco Nathan: Well, I'll just throw in one data point. We just did this big survey at O'Reilly. We surveyed, we got over 11,000 respondents worldwide for the adoption of machine learning, mostly in the enterprise. And we were looking to see, that was one of the questions, and adoption of AutoML was I think at 2%. There's a long way to go before it's completely outdated.
Amy Heineike: I do think the framing of what problem you're trying to solve, and making sure that you're actually defining a problem worth solving, and that the data going in and the labels coming out are actually reflecting the problem you want to solve. And that's not trivial, so if we can get to the point where the model piece is trivial, there's still all of this work that's going on, and it's huge... Because like, Kaggle has been around. If you want to get out there and have thousands of smart Ph.D. students kill your problem with an amazing model, just stick it on Kaggle. The trouble is the prep for the Kaggle contest is very hard, so that's why we didn't solve data science by having Kaggle.
Amy Heineike: Do you like AutoML?
Pete Warden: What you said.
Ann Spencer: Okay, thank you so much for your time this evening. We're actually going to be not shutting down for another 15 minutes or so. When I attend these things, I am actually not the person that asks the questions in front of everyone. I'm usually the person that asks the questions after like, this part is shut down. So feel free to eat. I think there's some candy, and sweets, and stuff like that. We won't be shutting down until after 8:30.
Paco Nathan: Is there more pizza?
Ann Spencer: I think there's more pizza. I see pizza from here, so feel free to come up and ask questions. And thank you very much.
This transcript has been edited for readability.
Ann Spencer is the former Head of Content for Domino where she provided a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly, focusing on data science and data engineering.
Summary
RELATED TAGS