Tagged with " research data"

Zimmer Discusses OkCupid Data Release and Ethics of Big Data Research

May 31, 2016

A group of Danish researchers, led by Aarhus University graduate student Emil O. W. Kirkegaard, recently publicly released a dataset of nearly 70,000 users of the online dating site OkCupid, including usernames, age, gender, location, what kind of relationship (or sex) they’re interested in, personality traits, and answers to thousands of profiling questions used by the site.

When asked whether the researchers attempted to anonymize the dataset, Kirkegaard replied bluntly: “No. Data is already public.” This sentiment is repeated in the accompanying draft paper, “The OKCupid dataset: A very large public dataset of dating site users,” posted to the online peer-review forums of Open Differential Psychology, an open-access online journal also run by Kirkegaard:

Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.

To those concerned about privacy, research ethics, and the growing practice of publicly releasing large data sets, this logic of “but the data is already public” is an all-too-familiar refrain used to gloss over thorny ethical concerns,.

In response to this problematic data release, CIPR director Michael Zimmer published an editorial in Wired: OkCupid Study Reveals the Perils Of Big-Data Science” (Wired, May 14, 2016). He states, in part:

The OkCupid data release reminds us that the ethical, research, and regulatory communities must work together to find consensus and minimize harm. We must address the conceptual muddles present in big data research. We must reframe the inherent ethical dilemmas in these projects. We must expand educational and outreach efforts. And we must continue to develop policy guidance focused on the unique challenges of big data studies. That is the only way can ensure innovative research—like the kind Kirkegaard hopes to pursue—can take place while protecting the rights of people an the ethical integrity of research broadly.

Zimmer also appeared on the WUWM Milwaukee Public Radio show Lake Effect to discuss “Big Data Research Creates Ethical Concerns”, noting that:

So when a researcher like this says, ‘Well this stuff was already public,’ what he kind of really means is like, ‘This stuff was visible to other users who happen to also create a profile,’ and those aren’t the same thing,” says Zimmer. “Psychologically I think it’s important for users when they sign up for this thing to have this assumption, or these set of expectations, that I know this data is kind of public but it’s meant for this community… Doing this kind of research sometimes violates that assumption.

Zimmer joins NISO-RDA Working Group on Privacy Implications of Research Data Sets

Feb 11, 2016

CIPR Director Michael Zimmer has been named a co-chair of the RDA/NISO Privacy Implications of Research Data Sets Working Group, a joint NISO and Research Data Alliance project, focusing on the privacy implications of shared research data. Zimmer will be joining Todd Carpenter (NISO) and Bonnie Tijerina as co-chairs leading this effort.

The working group will explore issues related to scientific research data sets that contains human subject information, as well as related datasets that have the potential to be combined in a way that can expose private information. The goal of the group is to develop a framework for how researchers and repositories should appropriately manage human-subject datasets, to develop a metadata set to describe the privacy-related aspects of research datasets, and to build awareness of the privacy implications of research data sharing. While privacy is related to the ethical, legal and data publishing issues surrounding data management of which privacy is a part, this working group is focused specifically on privacy-related concerns.

The group will focus on world-wide legal frameworks and the impacts these frameworks have on data sharing, especially with human-subject data. After gathering these legal strictures and comparing the differences and similarities, the group will begin crafting a set of principles that will provide guidance to the researcher and repository communities on how to manage these data when they are received. Building on these, the group will craft a set of use cases on how the principles will be applied. After these elements are completed, an effort to advance the principles through promotion and community outreach will be developed and executed.

The group has released a case statement that is open for comment.