The enterprise platform to build, deliver, and govern AI
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.
One of the biggest challenges in data science today is finding the right tool to get the job done. The rapid change in best-in-class options makes this especially challenging - just look at how quickly R has fallen out of favor while new languages pop up. If data science is to advance as rapidly as possible in the enterprise, scientists need the tools to run multiple experiments quickly, discard approaches that aren’t working, and iterate on the best remaining options. Data scientists need a workspace where they can easily experiment, fail quickly, and determine the best data solution before they run a model through certification and deployment.
One thing that amazes me is how huge the data science ecosystem is. There are about 13,000 packages in R, and hundreds of thousands in Python. But if you look at the Kaggle survey, over four years the percentage of data scientists using R dropped more than 40 percentage points - from 64% to 23%. At the same time, the fraction of data and business analysts using Python increased dramatically from 61% to 87%. Last year, Python jumped to the top of programming languages, passing Java and C on the TIOBE index, while Swift moved a little, and Julia and Dart lost ground. With so much change, it’s hard to know what approach to take. An effective data scientist shouldn’t have to choose - they should be able to try multiple approaches.
One thing that doesn’t surprise me is how long it can take an IT department to roll out a new update. I’ll give them credit, they have a big task with validation, checking licensing and usage rights, scheduling server time, getting clusters deployed, etc. But I’ve heard many customers say that it takes them six months to get a new Python package deployed by the IT team, so some just sneak their personal laptop into the office to run their work. People are creative, and they will find ways, but a data scientist shouldn’t have to go around their IT department to get their work done.
The last big challenge here is that if data scientists are limited in the tools they can use by infrastructure or IT requirements, then they’re going to frame their research, experiments and results to fit within that software framework - which adds an artificial limit to the type of creative thinking that can lead to the biggest advancements. As they say, if all you have is a hammer, everything looks like a nail.
The solution is to expand your tool set, to build a sandbox where you can quickly try multiple approaches, so that you just need to ask for IT help for the final deployment. If it takes too much time to provision new tools, the IT roadblock will limit results, creativity and eliminate options that might provide a better solution. Without better access to software tools and new frameworks, data scientists have to choose expediency over insights.
When researchers have a well defined sandbox (or MLOps platform), they can minimize sunk costs by making it more efficient and cheaper to try a new approach than spending time on minor refinements. Instead of spending a month tuning hyperparameters for a 0.25 percent improvement, they can try four different approaches, and one of those may yield a more dramatic boost in performance.
Agile development, iterative programming, minimum viable product - this is how software is made today, and how smart businesses work. It’s all about rapid prototyping, iterating and failing fast. But for most data scientists, they’re more worried about IT confiscating their laptop than they are about how they can do more with better tools.
With the right platform, researchers will have a better option than their laptop, they’ll have a way to spin up clusters, deploy multiple models, and capitalize on GPU acceleration. And IT will know that developers have their own sandbox that’s safe, secure, and governed. I can’t emphasize the need more - 55% of data scientists reported to Kaggle that they have no Enterprise Machine Learning tools in Kaggle's State of Machine Learning and Data Science 2021 report. Without more structure, we’re flying by the seat of our pants.
Why don’t more companies take a more structured approach? There are several good reasons, mostly focused on legacy and governance concerns.
By utilizing a workbench that can support rapid experimentation, data science teams can deliver better results faster, because they are able to fail faster and find a better path.
Some of the potential benefits of an MLOps platform include:
It’s really exciting to be in data science today - we’re seeing the value of this discipline be recognized by companies around the world, and data science teams are more important to the bottom line than ever before. But if companies can create a data science infrastructure that supports the team’s efforts to fail faster and secure better results, they can develop a world-class data science organization. Their teams will deliver the most relevant results. Their data scientists will be able to use leading edge technologies. And companies can encourage innovation, while still maintaining governance.
I think these are all important benefits, and the next step in advancing data science, that can be achieved by failing faster.
* This article was originally written for and published by TDS.

Nikolay Manchev is a former Principal Data Scientist for EMEA at Domino Data Lab. In this role, Nikolay helped clients from a wide range of industries tackle challenging machine learning use-cases and successfully integrate predictive analytics in their domain-specific workflows. He holds an MSc in Software Technologies, an MSc in Data Science, and is currently undertaking postgraduate research at King's College London. His area of expertise is Machine Learning and Data Science, and his research interests are in neural networks and computational neurobiology.
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.