The Ethical Data Scientist: Three Leaders Share Their Vision for Data Responsibility
By Domino Data Lab2018-12-125 min read
By Domino Data Lab on December 12, 2018 in
High-profile data scandals have made waves in recent years - from Cambridge Analytic accessing Facebook’s user information to influence voter opinion to Equifax exposing the personal information of 148 million Americans. Consumers are now more concerned than ever about how companies are collecting, storing and using their data. That also means data scientists are reckoning with what it means to be ethical data stewards in a field that is still a bit of a Wild West.
At Rev, Domino’s annual summit for data science leaders, three experts shared their vision of what data responsibility can and should look like.
Defining Data Responsibility
Natalie Evans Harris, COO at BrightHive, a data platform for social service providers, says data responsibility involves three components:
- Compliance: Following laws and regulations, such as rules around privacy and the European Union’s General Data Protection Regulation. “Often, the compliance arm is treated as a ceiling, and really it should be treated as a floor,” she says.
- Culture: Building an organization that encourages conversations about ethics throughout a product’s life cycle, including data collection, analysis, deployment and retirement. “How do we build the capacity to think not just about the intended solution, but also those unintended consequences? Not just what can we do with data, but what should we do with data?” she asks.
- Social responsibility: Promoting transparency and communicating with customers and the public in a way that builds trust.
Why Data Ethics Are Important
Beyond the inherent value of being ethical, companies and nonprofits are responding to other drivers in moving toward data responsibility. For many, the motivation is to follow the law and avoid punishment or scandal, while for others it’s a response to more customers asking about how their data is used, Harris says.
Others see data responsibility as a way to stand out from the competition. Chad Wilsey, Director of Conservation Science at the National Audubon Society, says transparency around the group’s quantitative impact was a way to boost funding and public support.
For Margit Zwemer, VP of Systematic Active Equities at BlackRock, it comes down to risk: “If you’re not protecting your data, you’re at risk of being hacked or having a data breach.”
How Teams Can and Should Be More Responsible
Until an industry-wide shift takes hold, it’s up to individuals, teams, and organizations to cultivate data responsibility. But no roadmap exists, and figuring out how to do this can be challenging and expensive. “Asking every data scientist to be a philosopher and InfoSec expert is putting a lot of burden on people who shouldn’t necessarily have to be showing leadership in that space…because those cultures aren’t yet in place,” Zwemer says.
To address the gap, Harris founded the Community-driven Principles for Ethical Data Sharing, an effort to crowdsource a code of ethics involving more than 800 data scientists. The code enshrines principles like informed consent, security, transparency and preventing unfair bias. “There’s nothing groundbreaking in these principles, but they’ve served as a launching pad for people to create their own individual ethos and approaches for the way they build products,” she says.
Making Data Responsibility Stick
An optional code of ethics isn’t a substitute for real industry standards. What is it going to take for companies to truly follow ethical data practices en masse? Harris expects that regulations along the lines of the GDPR will expand and that universities will increasingly incorporate ethics discussions into data science curricula. But she thinks the cachet of being an ethical company will be even more effective. “We’re going to get to this place where it’s going to be cool to be responsible and ethical with the way that you use people’s information,” she says.
Zwemer thinks technology companies will eventually face training and reportability requirements around data responsibility, akin to the regulatory obligations in industries like finance. But she says companies will only take data ethics seriously if it affects the bottom line, with customers or shareholders punishing players that don’t comply. Given the fast pace of technological innovation, she predicts a lot of missteps will occur before that happens. “We’re creating new problems as fast as we’re solving them,” she says. “It’s going to take a lot of falling down and getting back up.”
Domino powers model-driven businesses with its leading Enterprise MLOps platform that accelerates the development and deployment of data science work while increasing collaboration and governance. More than 20 percent of the Fortune 100 count on Domino to help scale data science, turning it into a competitive advantage. Founded in 2013, Domino is backed by Sequoia Capital and other leading investors.
Subscribe to the Domino Newsletter
Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.