The enterprise platform to build, deliver, and govern AI
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.
It's the sixteenth anniversary of Joel Spolsky's "Joel Test," which he described as a "highly irresponsible, sloppy test to rate the quality of a software team."
Back then (the late 1990s), software development was:
We think data science is going through a similar phase of evolution and maturation, so we thought it would be helpful to write something like the Joel Test for assessing the maturity of your data science program. It's our "highly irresponsible sloppy test to rate the quality of a data science team."
Here's our first draft, let us know what you think:
These are not the only factors that will determine the success of your data science program. For example, the questions above don't cover anything related to the connection between data science work and business drivers ("do all your data science projects have a clear business goal and engaged business stakeholders?"). And you still need great people on your team.
However, if you answer "yes" to all or most of the questions above, then you're working in a way that makes good outcomes much more likely.
We've seen organizations where it takes over a month for a new data scientist to even begin contributing. Onboarding can be delayed because new hires spend time getting the right software installed on their computer; finding and getting access to the right versions of internal resources (code, data sets) to use; and learning how to follow internal processes.
There is a flourishing ecosystem of open-source tools for data science. No single tool will be a panacea—rather, organizations will be most effective when they are agile enough to experiment with new tools and techniques. To that end, trying a new package should be possible at the speed of your natural research process, rather than becoming a bureaucratic IT approval process.
As data volumes grow and data science algorithms become more computationally intensive, it's critical to have access to scalable compute resources. As with the point about packages above, research will progress faster if IT or dev ops processes aren't a bottleneck for data scientists.
The first question of the original Joel Test is "do you use source control?" In our experience, source control is necessary but insufficient for robust data science, because source code alone is not enough to reproduce past work. Rather, we think it's important to have a record of experiments—including the results, parameters, data sources, and the code that were used to produce them. The most mature organizations will also be able to re-instantiate the underlying software environment (e.g., which version of language, packages) to reproduce a past result.
Data science is a team sport. During the course of a project, you'll likely get feedback both from technical colleagues and non-technical stakeholders. How are you sharing results and recording feedback and conversations? If it's happening over email, there's a good chance that those conversations and the organizational knowledge you're accumulating will be lost. It won't be available to new people who look back at the work later; it will be lost if the project members leave the organization; it's not searchable or discoverable later.
A good data science collaboration platform will keep work and discussion centralized, make it searchable, etc. There are plenty of ways to do it, and email is a convenient way to get work into such a platform, but email should not be the primary way that collaboration happens.
If engineers must be involved to integrate data science output into business processes, you are delaying your time-to-market, thus reducing the value of your data science work. Infrastructure and platforms can empower data scientists to quickly "productionize" their work without an extra—and some times very long—step.
Many data scientists believe they make their biggest impact when they answer a question, produce a model, or create a report. Actually, the longer-lasting, more leveraged impact is made when their work contributes to the collective knowledge of the organization in a way that can be built upon in the future. Therefore it is important that, as research progresses, it's persisted in a way that can be discovered and reused later—and the other side of that coin is that people have an easy way to find and reuse that past work.
Searching across dozens of network folders, Sharepoint sites, and repositories is not an effective way to preserve organizational knowledge. There should be a single system of record, even if that yields results that link out to auxiliary systems.
We took this one straight from Joel's list. Data scientists are expensive, value-adding people—equipping them well is a great investment.
Banner image titled "Graffiti & Street Art At Portobello (Dublin)" by William Murphy. Licensed under CC BY-SA 2.0

Nick Elprin is the CEO and co-founder of Domino Data Lab, provider of the open data science platform that powers model-driven enterprises such as Allstate, Bristol Myers Squibb, Dell and Lockheed Martin. Before starting Domino, Nick built tools for quantitative researchers at Bridgewater, one of the world's largest hedge funds. He has over a decade of experience working with data scientists at advanced enterprises. He holds a BA and MS in computer science from Harvard.
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.
In this article
Watch the 15 minute on-demand demo to get an overview of the Domino Enterprise AI Platform.