Ethics Clinic: Failure to Produce Data

The accessibility of research data is a complex issue that much of the scholarly community is trying to unravel and respond to. As this occurs, many journals’ policies around data availability, sharing, and retention are evolving. One component of this issue is the number of recent research misconduct cases that have involved allegations of data fabrication or falsification, which subsequently revealed that many of the coauthors never saw the data, and for which the data are no longer available. This need for original data, to confirm the integrity of a published article, raises questions regarding the obligation of coauthors to review supporting data even when they are not the author generating it, and to produce it when questions arise. During the Failure to Produce Data session, we were separated into two groups, given two such cases, and asked to consider questions around our expectations for data availability, our understanding of coauthors’ responsibility to confirm and retain data, and our opinion on what a journal’s response should be in these situations.

In the first scenario, an editor of a journal received allegations of misconduct related to a published article. When the editor contacted the author group, it became clear that the coauthors never reviewed the raw data, and the author responsible for the data wouldn’t share because he alleged it was secured from a confidential source. Upon further inquiry, the editor discovered that the affiliated university found research misconduct in two other articles coauthored by the same person and has determined his body of work is suspect based on a pattern of conduct and lack of evidence that data existed in multiple instances. When coauthors of some of the older published articles, some published 15 years earlier, were asked to produce anything that would support the existence of the original data or the collaboration, they protested, citing the length of time since the publication.

In parsing through this scenario, members of the group I was in said that they would want to assess the journal’s current policy on data sharing and accessibility, as well as the policy in place at the time the article was published to determine whether the author was in breach of either. We found that among the group, the journals represented had varying policies on data sharing, but everyone felt that if there were allegations of misconduct or fraudulent findings, the authors should be expected to produce the research data used. Debra Parrish, the moderator, shared with us that legally, the statute of limitations on research misconduct is six years from the point of last use. Beyond that, the majority of my discussion group agreed that they would follow the institution’s lead in this case and likely publish an Expression of Concern (EOC) to identify the possible misconduct and ongoing inquiry. We also discussed that the Committee on Publication Ethics (COPE) expectation of EOCs is that these are resulting in a retraction of the referenced article(s) or of the EOC once the inquiry has completed, but what we heard from the experience of the group was that many have seen institutions drag out inquiries for years and never conclude in decisive findings.

The second scenario examined an author group of an article published one year earlier in a journal with a policy that requires authors to share data if requested by another researcher. A researcher contacted the journal office to let them know that she hadn’t received the data as she’d requested. The author group responded that they were planning to provide, but that their current workload was keeping them from doing this quickly. The question we were asked to discuss for this scenario was “What steps, if any, would you take as an editor?”

As we discussed this scenario and the question put before us, we came up with more questions than we did decisive answers. For example, can or should journals act as mediators, requiring authors to supply data when requested by a third party? Aren’t some raw data difficult to produce in a usable or easily read format, and if so, isn’t there time and cost involved in reproducing it for that? Should the journal policy be more specific, identifying the expectation of timeframe for sharing requested data, and if so, what punitive measures should be in place? Beyond these questions, my group also discussed how some journals have statements collected from authors on whether they’d be willing to provide raw data or statistical code, and perhaps this type of policy reduces the number of authors who don’t comply with sharing because they’ve volunteered to do so. And, perhaps, a scenario such as this advocates for open data, requiring authors to publish raw data (whether through a repository or other means) with their article, to take the journals out of the mediator role.