Shattering the Myth of the Citizen Data Scientist

Kjell Carlsson2022-04-13 | 6 min read

Return to blog home

It is finally time to kill the expensive, dangerous myth of the “Citizen Data Scientist” and shift attention to a neglected, real-life person that is already critical to your model-driven business today, and who will be even more important in the future – the “honorary” or part-time data scientist.

The “citizen data scientist” (CDS) concept was invented with good intentions - to promote data, machine learning and AI literacy - but it has done more harm than good. The pipe dream of an enterprise consisting of CDSes inevitably leads to expensive initiatives that, at best, lead to one-off insights that in most cases, have nothing to do with data science. Many companies that take this approach end up with nothing to show for it.

Worse yet, attempts to grow CDSes can actively hurt your data science teams. They focus attention and funding away from your actual data science initiatives, towards analytics 101 courses and business intelligence platforms or data prep tools that have primitive data science capabilities. In the worst cases, data scientists are expected to use toy tools, leading either to shadow IT, or to an exodus of your best data scientists.

Models Drive Outcomes At Scale, Lone Insights Don’t

What’s wrong with having more data-literate employees and giving them visual data preparation tools, with some easy-to-use advanced analytics methods? Nothing, except that they do not usually drive transformative business value.

Even when CDS programs are done right, these individuals can only extract ad-hoc insights that deliver one-off opportunities for your business. That’s because transformative results only come from developing a growing portfolio of machine learning models that go into production and continuously drive decisions, actions, and applications. In other words, you need data-driven insights as part of the model building process, but insights alone are a necessary-but-not-sufficient step towards developing production grade models.

These models are at the heart of nearly every one of the fastest-growing companies today. They power the recommendation engines for streaming video services, the results for search engines, the autonomous driving features in cars, and the targeted discounts from every top retailer.

These essential, mission-critical – often regulated – models cannot, and should not, be created by anyone other than professional data scientists for the same reason that a hospital should not be staffed with “citizen surgeons,” airlines should not rely on “citizen pilots,” towers should not be built by “citizen architects,” and your C-suite should not consist of “citizen managers.”

Expecting citizen data scientists to take the place of your professional data scientists is at best futile, and at worst leads to badly performing models that risk your organization’s reputation, regulatory compliance, or bottom line. (And if you’re really interested in creating ethical, law-abiding AI models, catch our Track 2 panels at Rev 3, where we’ll explain emerging AI standards and how to meet them.)

The Citizen Data Scientist is Dead, Long Live the Honorary Data Scientist

It is time to dismantle the flawed notion of a CDS and pivot to the real-world persona – in addition to the professional data scientist – that actually contributes to substantial business value. This is the “honorary” or part-time data scientist. This individual has a day job separate from being a data scientist, and is often a data analyst or a data-savvy line-of-business professional. They could work hand in hand with a professional data scientist, create descriptive analytics applications, or conduct investigations that span into the realm of advanced analytics.

“Honorary data scientists'' can deliver value on their own, but they are also vital to every data science project, because they bring understanding of the business and its data. In addition, they can often shoulder data prep and other tasks to free up your data scientists. In their existing roles, they might not have the training, experience, focus, or incentives to be professional data scientists, but with the right support, they drive data science outcomes and can be an important pipeline of future data scientist talent. (Catch sessions at Rev 3 for insights on how to win the data science talent war and create an analytics-driven workforce.)

Empower Your Honorary Data Scientists

Citizen data scientists have been both mythical and optional for enterprises. The same is not true of your honorary data scientists. If you are doing anything with data science, you already have honorary data scientists but may not realize it, and it is almost certain that you need more of them and to make them more effective.

They also need to be supported. They need training, processes that facilitate how they work with data science teams, and they need tools. Since they are either working closely with data scientists, learning how to be data scientists, or are already doing some of the work of data scientists, they need access to the same tools and platforms that your professional data scientists are using. They probably also need more guard rails around sensitive data and limits on how much infrastructure they can consume.

Even better, give your honorary data scientists platforms (such as Domino Data Lab’s Enterprise MLOps platform) that provide integrated support for a wide range of professional data science tools. However, don’t limit them to “toy” tools that are not meant for data science, tools that are incompatible with the tools your data scientists are using, or tools that don’t help them practice professional data science. Doing so only hurts them, your data scientists, and ultimately your business.

Kjell Carlsson is the head of AI strategy at Domino Data Lab where he advises organizations on scaling impact with AI technologies. Previously, he covered AI, ML, and data science as a Principal Analyst at Forrester Research. He has written dozens of reports on AI topics ranging from computer vision, MLOps, AutoML, and conversation intelligence to augmented intelligence, next-generation AI technologies, and data science best practices. He has spoken in countless keynotes, panels, and webinars, and is frequently quoted in the media. Dr. Carlsson is also the host of the Data Science Leaders podcast and received his Ph.D. from Harvard University.

RELATED TAGS

SHARE