Python is the New Excel

Kjell Carlsson2023-02-16 | 6 min read

Return to blog home

It's becoming clear that the traditional “citizen data scientist” approach, focusing on no-code tools, has become an evolutionary dead end. Organizations who have pursued this route have little to show beyond PoCs and one-off successes despite years of investment in training and underutilized, proprietary tools. The best that can be said is that these efforts have been a costly way of democratizing data prep and business intelligence. In reality, they have been a step in the wrong direction for analytics and data science maturity.

By contrast, code-first strategies that emphasize open-source languages like Python, R, and frameworks like Ray, Spark, and Dask are delivering sustained results. It's not just about data scientists that are using code. By upskilling analysts and subject matter experts on code-first analytics and data science tools, everyone ‘speaks the same language’. They become bilingual in code and the language of the business. This increases the rate at which projects are put into production, productivity, and even boosts the ability to hire and retain talent.

How code won the war

In hindsight, we should have seen it coming. Code-based approaches to analytics and data science have become the dominant paradigm for delivering business outcomes, just like what happened in application development two decades prior. In the ’90s, Apple (with Hypercard), Microsoft (with Visual Basic), and others promoted visual, no-code tools as a way to democratize software development. These efforts were consigned to the waste bin of history (in the case of Hypercard) or pivoted towards democratizing the use of code (in the case of Visual Basic).

Code has the same advantages over no-code in the context of analytics and data science as in software development. Code is portable, meaning it can be deployed in platforms far beyond the ones it has been developed in. It can be shared and governed more effectively with source code management tools. Code is inherently easy to iterate with, and scriptable, making it easier to innovate and experiment with. And there is always a broader community of users for industry standard coding languages than visual tools, meaning it is easier to hire talent and find support. On all of these points nearly every visual tool, even open-source ones, has proven to be fatally deficient.

But of course, code does have challenges. It requires an understanding of what you are doing and requires meticulous attention to detail. However, since these are critical elements of delivering business value, this can be just as much a strength as a weakness. There is also the steep learning curve and manual effort required to generate and debug code. This is much harder to overcome other than by putting in the time. But, recent advances in automatic code generation are flattening the learning curve and speeding up the writing and debugging of code.

Why analytics and data science are better with code

 

Code

No-code

Impact

Easy to deploy projects everywhere thanks to wide support for open coding languages

Closed platforms make it hard to deploy projects anywhere

Productivity

Easy to search for, share, branch, and manage code for reuse, consistency, or expert input

Limited access and familiarity with no-code platforms prevent reuse 

Innovation

Easy to experiment and iterate by tweaking parameters, running batch scripts, and importing libraries with new innovations

Tedious to modify steps and adjust parameters, constrained functionality, and restricted or delayed ability to import external libraries 

Employment

Easy to hire from a wider pool of talent trained with code, and wanting to use code to further their professional development 

Restricted to a narrow pool of talent trained on no-code tools, or inexperienced, part-time users

3 key elements of every code-first analytics & data science strategy

  1. Hire and train analysts that are eager to learn code. Analysts (and domain experts) who want to learn how to code – and, even better, those that are already coding – want to make the investment in learning how to build analytics applications and apply data science methods. They are indicating that they want to get into details and undertake the kind of sustained learning and problem-solving necessary to execute impactful projects. They are more likely to see projects through to completion because it is just as important for their own professional development.
  2. Accelerate skills and time to value with code generation and code repositories. To overcome the steep learning curve in syntax for the many steps in every analytics project, provide them with easy-to-use tools that help them generate that code and code repositories from which they can copy the best examples of that code.
  3. Provide end-to-end platforms for code-first analytics and data science. To deliver impact, analysts need more than the ability to code, they need access to secure and governed platforms where they can access the languages, tools, and infrastructure they need to conduct their analysis, share it, train and deploy models, and host analytics applications. Sharing the same platform as data scientists means they can easily collaborate, their work is part of the same governance processes, and they participate in the same analytics and data science workflows.

Python is the new Excel

Leaders are under unprecedented pressure to deliver more business impact from data than ever before. The current climate of economic uncertainty demands both the agility to detect and respond to market changes faster, as well as driving productivity to do more with less. The use of data is the key to doing both. Data and analytics leaders are under additional pressure because they also need to justify the vast sums of money, time, and effort that has been poured into data platforms and new teams during the last few years.

To deliver this value quickly organizations need to rethink the way they scale analytics and data science talent. They need to reverse their traditional thinking around “citizen data science” – broad 101 training using no-code tools – and focus on the analysts and data-savvy domain experts using the code-first tools that their data scientists are already using.

Kjell Carlsson is the head of AI strategy at Domino Data Lab where he advises organizations on scaling impact with AI technologies. Previously, he covered AI, ML, and data science as a Principal Analyst at Forrester Research. He has written dozens of reports on AI topics ranging from computer vision, MLOps, AutoML, and conversation intelligence to augmented intelligence, next-generation AI technologies, and data science best practices. He has spoken in countless keynotes, panels, and webinars, and is frequently quoted in the media. Dr. Carlsson is also the host of the Data Science Leaders podcast and received his Ph.D. from Harvard University.