IT and data science can collaborate better. Here's how.
Josh Poduska2022-07-12 | 11 min read
There’s a lot of momentum right now with machine learning (ML) and artificial intelligence (AI), and we have an opportunity to do something exceptional: build products and solutions that make a real difference in our industries and our world.
But, data science and IT teams have many obstacles they need to clear to bring data science solutions to fruition, not the least of which is effectively working together.
This was the topic of a panel discussion I moderated with Forrester Principal Analyst Dr. Kjell Carlsson, one of our guest speakers. We sat down (via Zoom, of course) with easyJet’s Director of Data and Analytics Ben Dias along with Chief Information Officer Cesar Goulart (with a healthcare digital solutions company) and Data Scientist Peter Shen (with a leading pharmaceutical company) to discuss best practices for strong partnerships between data science and IT.
Upfront, I should say that we invited this group not only for their expertise, but also because together they can describe how such partnerships can advance three fundamental goals for data science today:
- Innovating in the marketplace. For example, Peter Shen shared how his company uses computational data science research across immunology, compositional chemistry, and biology to bring life-saving and life-changing medicines to patients.
- Productizing data. Cesar Goulart shared how data science teams at his healthcare digital solutions company use machine learning to create decision support tools that help healthcare organizations improve clinical, financial, and operational performance.
- Improving operational efficiency. Ben Dias discussed how the easyJet data science team responded to the operational challenges presented during the COVID-19 pandemic by expanding their use of automation to support increasing workloads such as more frequent schedule updates. They also began applying advanced AI techniques, like reinforcement learning, along with external data to detect and respond to market shifts much earlier.
Setting the stage for the discussion
To start, Kjell shared recent research on the importance of data science and why closing the communications gap between IT and data science is vital. Some key trends from Forrester’s 2020 survey on data science include:
- The largest increase in AI use Forrester has seen from year to year. Close to 70% of firms Forrester surveyed in 2020 reported that they are implementing, have implemented, or are expanding AI, machine learning, and deep learning solutions compared to 54% in 2019.
- Increasing importance of AI to businesses with over 70% of high-growth companies saying AI will be the most important factor determining competitiveness in the next three years. Close to 40% of these high-growth firms say AI is already the most important factor.
- Significant investment in AI with 59% of firms using AI and machine learning saying it’s one of their larger or largest investment areas. (21% of high-growth companies say AI is their largest investment area.)
- Evidence of the value of AI with a majority (75%) of those who have implemented AI reporting a positive impact from their investments.
Best practices for building better partnerships
Given the importance of data science, some might think building seamless partnerships between data science and IT would be easy. After all, we’re working toward the same goal. But as the panelists shared, some fundamental challenges get in the way. Below I’ve summarized a few of the challenges they shared in their discussion, followed by the best practices they’ve applied to address them.
Challenge #1: Data science and IT teams speak “different languages”
In a previous role with Royal Mail, easyJet’s Ben Dias had responsibility for both data science (which is where he started his career) and data engineering, which helped him see just how differently the two groups work and think. For example, as he worked with his data engineers standing up a new Hadoop cluster, he saw firsthand the importance of governance and controls in ensuring the process went smoothly, and the potential for disaster without these processes. “The why is a big learning for me,” he said. “[Data engineers] want to control it…so if something goes wrong they can fix it quickly.”
What you can do: Ben advises that when making requests, both IT and data science must outline not just what they need but why they need it. This enables teams to understand and actively engage on where controls and governance make sense and where they may not.
Challenge #2: Everyone expects collaboration and alignment to “magically happen”
As Cesar Goulart emphasized in the discussion, “It doesn’t work just to place a team of new highly skilled people somewhere and expect that things are just going to sort themselves out.”
What you can do: Cesar shared how the vice president of analytics works closely with the company’s vice president of product management and IT’s lead solutions architect to ensure everyone is aligned on shared objectives and to create venues for collaboration. This active, ongoing engagement has helped smooth the path on everything from gaining access to subject matter experts to solving potential deployment problems before they derail a proof of concept demonstration.
Challenge #3: Organizations don’t separate the research and production phases
This hit on one of the most common contention areas mentioned earlier—balancing the flexibility data scientists need to explore new ideas during the research phase with the controls and governance that IT needs to ensure that everything operates smoothly.
What you can do: Data Scientist Peter Shen adapts traditional software development processes using the Domino Enterprise MLOps platform for baking reproducibility into the research lifecycle early. For example, he uses Domino to create snapshots of the data and methods data scientists use to test different ideas. This ensures they can reproduce results as they move to the next phase of research and build on different branches of thinking over time while documenting all artifacts as models inch closer to production. At easyJet, Ben Dias uses a “Lean Startup” approach that gives data scientists the freedom to explore different ideas and tooling in a somewhat controlled environment. They use the same templates, languages, and tooling where possible, but they can step outside the lines when needed in the early development stages.
That said, Cesar Goulart emphasized that for companies delivering AI-driven products or exposing models via APIs, there’s less flexibility in what technologies data scientists use. “We have to have a standard stack,” he said. “I need to have all of the handshakes between the technology on the product side and the technology on the analytic side to harmonize.” For example, having all the source code in one place for quality assurance so there’s a structure to deal with any performance issues.
Challenge #4: Conversely, they don’t consider the production phase early enough
Throughout the discussion, panelists highlighted the need to have processes that help address Enterprise MLOps (including model monitoring) and bring IT into the loop early. As Cesar Goulart highlighted often, data science teams aren’t comfortable sharing their ideas before they’re fully baked. They usually wait until they’ve figured out the details before they start talking to IT, which slows things down.
What you can do: While in many companies it’s not realistic to put everyone on the same team, Cesar emphasized the value of a coordinating body (for example, a Center of Excellence) to harmonize research activities with technology proof of concepts, infrastructure deployments, and product management roadmaps. As part of Ben Dias’ Lean StartUp approach, there’s a clear understanding and agreement by all at the outset of the processes and gates required as model development progresses, so they know from the start what to do and when to harden the solution for running successfully in a production environment. Where possible, he reuses existing processes, such as current IT and operational guidelines for service design, service transition, and governance, when bringing new models into production rather than creating an entirely new process. But there will be areas that current software deployments typically don’t cover, model monitoring being the most significant, and teams will need to build out new processes (it can come from either side of the aisle).
If you haven’t already, I encourage you to listen in on the full discussion to hear more from Ben, Cesar, and Peter on their work to build a bridge between data science and IT. One of the key takeaways all panelists emphasized was that ultimately data scientists will need to embrace some of IT’s governance principles, and IT will need to embrace data science’s need for flexibility. This is one of the reasons why an Enterprise MLOps platform such as Domino is so valuable: it fosters collaboration among data scientists, data engineers, IT, and the business so they can achieve that elusive balance between the control and flexibility that Ben, Cesar, and Peter talked about.
And as Kjell pointed out, these are good problems to have. They’re an indication that data science is rapidly maturing to make a real difference in our world.
Learn more
Watch the Webinar, “Reaching Across the Aisle,” for ways data science and IT can partner to build a better enterprise.
Read the report, “Organizing Enterprise Data Science,” to learn more about the best practices data science leaders use to build an enterprise data science strategy.
Read The Forrester Report, “The Total Economic Impact of the Domino Enterprise MLOps Platform,” to learn more about the value Domino delivers.
Josh Poduska is the Chief Field Data Scientist at Domino Data Lab and has 20+ years of experience in analytics. Josh has built data science solutions across domains including manufacturing, public sector, and retail. Josh has also managed teams and led data science strategy at multiple companies, and he currently manages Domino’s Field Data Science team. Josh has a Masters in Applied Statistics from Cornell University. You can connect with Josh at https://www.linkedin.com/in/joshpoduska/
RELATED TAGS