Big data is big business these days in scientific, technical, and medical publishing. At least that’s the idea. It’s increasingly referred to and built upon in the articles we publish. More and more journals are setting policies that require authors to make their data available. Some are doing so right at the start, from the beginning of the peer review process. Others are requiring that authors make the data publicly available before publication.
How are publishers dealing with data availability issues on a day-to-day basis? This session explored what some publishers have discovered about author compliance with their journals’ policies and what the editorial/production offices have to do to ensure that compliance. The speakers all agreed there is a growing demand for transparency, and they addressed aspects of change from what’s in progress to current policies about data sharing, guiding principles, and recommendations for the future.
Cathy Stack, of the Annals of Internal Medicine, pointed out both the benefits and risks of open data. Although we stand to gain improved transparency, reproducibility, the advancement of research, and better patient care, we also risk, for example, compromise of patient privacy, misuse of data and data misinterpretation. She cited some findings of the Committee on Strategies for Responsible Sharing of Clinical Trial Data—fall 2013, formed by the Institute of Medicine, which, in 2015, published its recommendations for sharing clinical trial data (iom.nationalacademies.org/ Reports/2015/Sharing-Clinical-Trial-Data.aspx). How we ensure compliance with data availability can depend on whether researchers and authors are making data available at or before publication. The pros and cons lists are long. Publishers’ tools for sharing include editorial policy but also incentives (credits, acknowledgments) and venues such as databases and repositories. According to Rebecca Barr, of Nature Research Journals, many issues—ranging from not-yet-released data records to failure to deposit needed data at all—surface as late as the copyediting stage. Would it help to obtain the data at peer review stage? In a mandatory data-deposition environment, we will eventually drown in data. She noted that “implementation matters—is your data worthy of deposition?”
Helen Atkins, of the Public Library of Science, pointed to her organization’s policy, revised in March 2014, that now requires authors to make all data available before publishing (with some exceptions, mostly for privacy involving experimental subjects, for example). Authors must provide a “data availability statement.” Before that time, it was hard to find the data underlying research papers—the data might be lacking in descriptive metadata, or authors simply did not provide it; data not stored centrally was another big challenge. Atkins reported that while enforcement of the new policy initially took a lot of staff time, over the past year, author compliance and understanding have improved.
What to Expect in the Future?
These speakers agreed that content cannot be repurposed if data sources are not shared. Beyond policy changes, the solutions will result from industry standards, public data repositories, data descriptors, formal credits, accession codes, and community support. Stack pointed again to the Institute of Medicine and its January 2015 recommendations: “Biomedical journals have an important role to play in advancing the creation of an environment in which sharing of clinical trial data is a standard and an expectation for publication in the scientific literature.”