Research data sharing: Developing a stakeholder‐driven model for journal policies

Conclusions of research articles depend on bodies of data that cannot be included in articles themselves. To share this data is important for reasons of both transparency and reuse. Science, Technology, and Medicine journals have a role in facilitating sharing, but by what mechanism is not yet clear. The Journal Research Data (JoRD) Project was a JISC (Joint Information Systems Committee)‐funded feasibility study on the potential for a central service on journal research data policies. The objectives of the study included identifying the current state of journal data sharing policies and investigating stakeholders’ views and practices. The project confirmed that a large percentage of journals have no data sharing policy and that there are inconsistencies between those that are traceable. This state leaves authors unsure of whether they should share article related data and where and how to deposit those data. In the absence of a consolidated infrastructure to share data easily, a model journal data sharing policy was developed by comparing quantitative information from analyzing existing journal data policies with qualitative data collected from stakeholders. This article summarizes and outlines the process by which the model was developed and presents the model journal data sharing policy.


Introduction
Research data are presently a publicly funded resource that passes into private hands without explicit permission or remuneration to the public purse. The overwhelming volume of UK research across disciplines is funded by government via research councils and institutions of higher education and by nonprofit-making institutions set up for the public good. Organizations wish to maximize value in their investment, and there is growing belief among funders that access to data is part of that value. The Organisation for Economic Co-Operation and Development (OECD, 2007) has published guidelines on access to publicly funded data, in which it is stated that "sharing and open access to publicly funded research data not only helps to maximise the research potential of new digital technologies and networks, but provides greater returns from the public investment in research" (p. 10). However, after the creation of research outputs, the data on which these outputs depend have, in the first place, tended to be left in the possession of the researchers, who may use or neglect it as they see fit. This is not easy to justify, but it seems even harder to rectify. More recently, publishers have identified data as a resource and that facilitating access is capable of producing further revenue streams, but this apparent solution to the problem promises to exacerbate the public/private dilemma. Therefore, it is important that the strength of the case in principle for sharing research data, both for reasons of transparency and for the potential of reusing it in new research, receives formal recognition from international and national research bodies, research funders, learned societies, and the researchers themselves. These are the key stakeholders in research. and ultimately it is their interests that should drive the research data sharing process.
The data with which these stakeholders are concerned are in fact a more complex set of resources than it might at first seem. Our starting point was a Royal Society (2012) definition: "Qualitative or quantitative statements or numbers that are (or are assumed to be) factual. Data may be raw or primary data (e.g., direct from measurement), or derivative of primary data, but they are not yet the product of analysis or interpretation other than calculation." We found that what tended to be discussed or listed in data sharing policies ranged through software, video, geodata, geological maps, ontologies, web content, data models, and a great deal more. Although we sought to confine our attention to researchgenerated data as such, we found it impossible totally to ignore supplemental material deposited with data that were actually behind the research results reported in the articles. On supplemental materials, the National Information Standards Organization together with the National Federation of Abstracting and Information Services (NISO, 2013) has recently issued a set of recommended practices to address the lack of guidance on selection, delivery, aids to discovery, and preservation plans. These are intended to assist publishers and editors to guide authors and peer reviewers in dealing with supplemental materials. As such, the recommended practices feed directly into a journal policy of the kind we model later in this article.
Firm statements on data sharing calling for openness and freely available access to publicly funded research data have been made by the International Council for Science (ICSU, 2004) and the U.K. Royal Society (Royal Society, 2012) in addition to the OECD statement (OECD, 2007). Similarly, funding bodies are requesting data management plans from researchers as part of their funding applications. This includes making the data openly accessible. For example, the Arts and Humanities Research Council (AHRC) funding guidelines "expects" digital outputs to be "freely available" to the research community. In the United States, the responsibility of authors to share data has been clearly set out by the National Academy of Sciences (2003), in a statement that also identifies the need for journals to specify data sharing policies for the benefit of authors. Furthermore, the Opportunities for Data Exchange project (ODE) underscores the need for publications and their supporting data to retain their essential integration (Reilly, Schallier, Schrimpf, Smit, & Wilkinson, 2011). The Brussels Declaration (STM, 2007) is a statement from the publishing industry supporting the principle of free availability of access to research data although reflecting some of the unease about open deposit of accepted manuscripts in rights-protected archives. However, despite all this weight of positive comment, the mechanisms by which sharing might be effectively implemented remain topics for discussion rather than functioning aspects of the research world.
This article explores what can be regarded as the pivotal aspect of any general mechanism for data sharing, the role of research journals and, in particular, the data sharing policies they present to their authors. This is an essentially pragmatic approach, recognizing that the most effective policies are those that present themselves to researchers at a point in the research process at which there is an immediate incentive for compliance and the opportunity to do so. The approach recognizes that, although both funders and employing research institutions may have policies that apply to the researcher, awareness of and compliance with such policies can remain very low. Such policies are not typically presented to the researcher at the point at which the data become available to be archived, nor do they offer an immediate incentive for compliance. However, when a data archiving policy is presented during the process of publication with the research complete and data processed, then the incentive exists for the research to comply in order to publish the results of their research.
We believe that publishers and publisher policies have a key role to play in the wider adoption of data archiving, and the development of model policies may assist in this. This article reports the findings of the Journal Research Data (JoRD) project at the Centre for Research Communications (CRC) at the University of Nottingham, which was funded by the Joint Information Systems Committee (JISC, www.jisc.ac.uk), and draws attention to the strong indications in these findings for the shape of model data sharing policies for adoption by journals. It seems almost indisputable that the policies best capable of delivering transparency and reuse opportunities mandate deposit of data, provide guidance on structures and metadata, and direct authors to suitable web-linked repositories. Such policies not only benefit the researchers themselves and fellow researchers in the same and related fields but also stimulate archiving and linked data activities that complement the basic act of deposit. Examining large numbers of existing policies, as we did, provides a view of what a model policy might say based on current practice. However, there is an alternative, that of a model policy that goes back to direct consideration of stakeholder concerns. We used both approaches in this study, analysis of existing policies and identifying stakeholder concerns through qualitative research.

Literature Review
The literature reveals that, until quite recently, publications concerning what would now be framed as data sharing issues frequently discussed them in terms of data withholding. Campbell et al. (2002) identified the pattern of data withholding in genetics, based on the evidence of a substantial survey. Blumenthal et al. (2006) and Vogeli et al. (2006) also set out the issues in a context of data withholding, but by the end of the 2000s Hodson (2009) could claim that the data culture had changed to one in which research collaboration, facilitated by the Internet, had led researchers generally to acknowledge the need to share data. It is, of course, open to question how deeply felt is the commitment of researchers and whether there is symmetry in attitudes toward others sharing data with a researcher and that researcher sharing data with others. What is more, these concerns clearly vary across the spectrum of disciplines. Hrynaszkiewicz and Altman (2009) discuss the issue in terms of raw clinical data, and Pienta et al. (2010) show that there is a sharing culture in the social sciences despite lack of structure in the available resources. Intellectual property issues are common to all disciplines, because, by establishing the intellectual rights of synthesized ideas and the data from which the syntheses are derived, researchers can seek to consolidate their claims to research topics, innovations, and conceptual direction. Reichman and Uhlir (2003) pursue the legal aspect of this intellectual property-based approach, but the bulk of the current literature concentrates primarily on the value of sharing rather than defining obstacles. Neylon (2009)  Arguably, effectiveness of deposit procedures is the crucial issue. Data that are notionally open and sharable may be in practice nothing of the kind, because they are insufficiently structured, lack metadata, or have not been deposited in a repository that offers the capacity to realize external access fully. Authors who have considered the most appropriate method of disseminating the underlying data to the research community during their writing process generally look toward the concept of linked data. Kauppinen and Espindola (2011) identify what they call the four silver bullets of linked data, but Bechhofer et al. (2011) adopt a more nuanced view. Delivering data fit for linking from the accumulations of notes, measures, mentions, readings, and statistics that arise in the course of research requires a substantial organizational input on the part of the researchers. This is a message that goes well beyond the requirement simply to agree that the data should be made available for sharing. It is a message that cannot easily be given the necessary detailed specificity in high-level declarations of principle from governments, international bodies, and learned societies. The policies of funding institutions should set it out clearly and explicitly so that structured data gathering can be built into the research process and so make data capable of being structured readily available at the point of deposition, most likely at the time of contact between the researchers and the journals in which they hope to publish their findings.
Such policies from funders, and from the research institutions that employ the productive researchers, are of course "upstream" of publisher policies in the research process and so will produce data with deposit requirements already attached. Therefore, journal policies should be able to accommodate pre-existing conditions and choices for deposit that have already been made about the data, with some process for resolution of any potential conflict between different policies that might arise. In spite of the primacy of the funders' and institutional policies, for the pragmatic reasons noted earlier, it is journal policies that are thus central to the wider adoption of the whole data sharing enterprise, and the literature is beginning to reflect this. In the mid-1990s McCain (1995 surveyed 850 journals, discovering that only 132 had identifiable policies. The important, though unremarkable, conclusion was drawn that the best policies set out strong compliance sanctions. A smaller survey of medical journals by Schriger, Aroa, and Altman (2006) found contradictory approaches and little strong guidance. Since then, there has been a series of important papers by Piwowar, usually with Chapman (including Piwowar & Chapman, 2008b, 2010a,b, and Piwowar, 2010. Perhaps the most valuable to the JoRD project (Piwowar & Chapman, 2008a) builds on McCain's work, using the data on gene expression microarrays to explore policies in depth. The article classifies policies according to their strength (strong, weak, nonexistent); the relationship of policy strength to the journal's impact rating, and the number of instances of data submission that can be identified. The authors conclude that there is a wide variation in policies; some evidence that, when there is policy, then instances of data sharing increase; no real suggestion that a strong policy discourages authors from submitting their articles to a journal; and some evidence on the factors that make data sharing difficult for authors. The Permanent Access to the Records of Science in Europe project (PARSE.insight; Kuipers & van der Hoeven, 2009) has produced helpful data on attitudes to data sharing and a strong viewpoint on what must be done (Smit, 2011;Smit & Gruttemeier, 2011). The work of Stodden et al. (2013) is based on research of a type broadly similar to ours, and conducted more or less contemporaneously, but concentrating on the sharing of code that will enable computational results to be replicated.

Survey of Journals
We chose 400 international and national journals to represent the top 200 most cited journals (high-impact journals), and the bottom 200 least cited (low-impact journals), equally shared between science and social science, based on the Thomson Reuters, 2011, Journal Citation Report. There was some duplication between the two indices, and in those cases one incidence of the journal was removed. This left a total of 371 journals. We did not top up the total, to avoid disrupting the impact factor ranges analyzed. Thirtysix subject areas were covered over both the broad disciplinary areas. The selection of journals that we analyzed originated from a mix of large commercial publishers, academic presses, and independent publishers.
We sought data policies on each journal's web page. Typically, we found policies in the notes for authors or statements of editorial policy. Once we had located a data policy, we broke it down into categories such as what, when, and where to deposit; accessibility of data; types of data; monitoring data compliance; consequences of noncompliance; and policy strength based on Piwowar and Chapman's (2008a) definition of strong and weak journal policies. These were then entered onto a matrix for comparison. When no policy was found on a journal's website, this fact was indicated on the matrix. In the first stage of analysis, we looked at a series of individual policies in considerable detail and continued adding to the number of policies looked at in this way until we ceased to discover fresh features. This exercise provided a set of criteria that could be used for the analysis of all the remaining policies. Our results were based on the use of these criteria.

Stakeholder Consultation
To complement the survey of journal policies, we sought to establish the views of key stakeholders, using qualitative methods based on the sampling and analysis techniques of grounded theory. This structured approach allowed us to focus on stakeholder perceptions within a short time frame, and iterative data selection with comparative analysis ensured that gaps in knowledge were filled. Views of individuals working for the publishing industry in the United Kingdom were elicited on the principles underlying data sharing, the drivers for change, and the challenges faced in effecting change. We selected the individual respondents by purposive sampling for their expertise. Twelve came from a range of publishing backgrounds, from large to small, subscription to open-access enterprises, together with four representatives from funding agencies (two of whom were interviewed jointly), one data service manager, one representative of research administrators and managers, and two academics. Thirteen structured interviews were conducted for the project, each lasting 1 hour. Six written responses to the interview questions were also obtained. Later in the project, interviews with four representatives of the academic library world were added.
At this stage we suspected that the data collected from the interviews were biased toward the point-of-view of journal editors and publishers and did not sufficiently reveal the opinions of researchers and authors. Therefore a focus group of U.K. researchers was organized. Participants were selected by snowball sampling, initially through a contact from a scientific debate forum. They represented a range of arts and sciences backgrounds. We used the results from the focus group discussions and indications from the literature review to formulate questions for an open survey of researchers, which was posted online for 1 month via the project blog (convenience sampling). Seventy researchers worldwide responded from every disciplinary area, covering 36 different scientific areas. After each stage of data collection, we open coded the data and identified patterns in response that formed categories, which allowed the comparison of views across the range of stakeholders.

The Survey of Journals
We found at the time of analysis that the overall landscape of journal data sharing policies was patchy and inconsistent. Such a situation appeared inadequate in an environment in which the rhetoric and policy advise and encourage data sharing. For example, some journals had multiple policies (two or three), whereas 50% of the journals examined had no data sharing policy at all. Among the 230 journal policies found, 76% were by Piwowar and Chapman's (2008a) definition weak, with the remaining 24% being strong. Significantly, the journals with high impact factors tended to have the strongest policies. Not only did fewer low-impact journals actually have any data sharing policy but those policies were less likely to mandate data sharing. In general, they merely suggested that authors might wish to share their data. We examined the policies we identified to discover whether they included any stipulation on which data might be linked to an article, where the data should be deposited, and when in the publishing process it should be made available. Table 1 shows a summary of the main points that we discovered. As can be seen, some policies did specify types of data to be deposited. For example, data sets, multimedia or specimens, samples, or material were the most commonly mentioned types of data. Structures, protein or DNA sequencing, and program code or software were referred to but less frequently. Many policies were not at all specific, using the terms supporting information, unspecified data, and other data. Other policies made a distinction between data that were integral to the article and supplemental data. Supplemental data might enhance the article but were not essential to support its argument, and a small percentage (7%) asked for the quantity of supplemental data to be limited or to be included only after discussion.
What is even more important is that few of the policies specified where the data should be deposited. A few talked of deposit but were vague regarding where. Others referred to the use of a repository but were not explicit with regard to which repository. Only 15% named a specific repository. Statements on expectations for access were notably lacking, with only 12% policies commenting on this. Accessibility options that were mentioned ranged from low cost to closed access, with only a low number of policies suggesting free or open access (see Table 1). Perhaps most damning of all, only one policy discussed the inclusion of metadata with deposits. On the question of when the data should be deposited (either before publication or when publication occurred), there was again a lack of consistency and direction. Just over half of policies that were specific about this broadly mentioned depositing data along with the submission of the article, with roughly one-quarter indicating that the data should be available for the peer-review process and slightly more than one-quarter of policies basically remarking that deposit at some later stage, typically on publication, was acceptable (Table 1). In summary, we found low numbers of policies (for barely half of the journals surveyed), with the overwhelming majority of them weak and confusing. The weakness can be illustrated by the fact that only 10% contained mention of sanctions in the event of noncompliance.

Stakeholder Consultation
There were low levels of mutual understanding among the stakeholder groups that were sampled in the interviews, focus groups, and online enquiries. Stakeholders made assumptions about each other's views and actions and had obviously made little attempt to investigate the broader landscape. Although all stakeholders purported to be in favor of shared data and were willing to list the benefits of data sharing, they all raised caveats and concerns and identified barriers to the sharing of data. For instance, it was clear from researchers' comments during the focus group and from the online survey that they understood the expectation that data will be shared. At the same time, the online survey demonstrated a less positive reality. About 40% of the respondents admitted that they did not allow others access to their data, and the rest shared mainly only with collaborators and colleagues. Researchers are not yet sharers by instinct; this underscores the importance of policy clarity in changing behavior and awareness and advocacy of policy from funders' institutions and publishers. As noted, it is at the point of publication that policy should be set out in the most specific terms for it to be effective. The publishers who should present policy to authors on their websites and in the pages of their journals, in fact reveal anxieties over the capacity of the current digital infrastructure to allow data to be reliably linked to articles, if the data were distributed among a variety of databases and other repositories. Some of them were also not confident that their own databases would be viable alternative places of deposit, because of the increasing file size of research data deposits and requirement for greater storage capacity. This implies that research institutions and funders have the opportunity to take the archiving issue in hand, and they should do so through clear, enforceable policies and clear, easy-to-use deposit venues and processes.
A series of other anxieties emerged from the consultation. Both researchers and publishers considered that it would be difficult to deposit and link data in the original state in which they were gathered. There was a need for data to undergo a certain basic level of refinement before it might be shared. Raw qualitative data, for instance, might well be recorded in ways only truly understood by the data gatherer. This difficulty in the sharing and interpretation of purely raw data has been corroborated by the findings of Work Package One of the Policy Recommendations for Open Access to Research Data in Europe (RECODE) project (http://recodeproject .eu/). Similarly, large collections of quantitative data would require the correction of statistical errors before being fit to share. The context of the data gathering was also a factor; the data might have been gathered with a promise of confidentiality or might have been gathered in order to complete a study (report or PhD thesis) for which there is a commitment that it should remain undisclosed for a specified amount of time. The currency of data was also an issue, with the danger that some data might be too dated by the time of publication to be of value for subsequent research. This difficulty relates to a wider requirement, identified by the publishers, that linked data in a journal article should be "fit for use" and "replicable." Data have been saved unstructured, not supplied with sufficient metadata, and in formats that have subsequently become incapable of retrieval.

Developing a Model Policy
The initial assumption that many of the problems of data sharing could be addressed in the publication process through the presentation by journals of strong, clear policies on the issue was not contradicted by the research. The goal of identifying a model policy that could be recommended to journals therefore became a consistent focus of our activities. As we began to accumulate information about a large number of journal policies, it seemed for a time that a model policy would emerge from analysis of this material. At this stage, we assembled a draft policy based on relevant and useful aspects of existing policies. This took the following 16-clause form.
• Α general statement outlining the benefits of data sharing • Α clear statement of whether it is the policy of the journal, the publisher, or a professional association • The type of data to be included in the article or linked to the article • The format of the data, covering any disciplinary guidelines • Instructions related to the data, such as data citation and other metadata • Whether data are required or requested to be shared, and any limit to the quantity of data • Where the data are to be held, according to the data type • Where to state what data are available and how to access the data • When during the publication process data should be made available • Whether embargo periods are allowed and for what length of time • Whether the data should be made openly accessible, free, or of low cost or be subject to other restrictions • Any terms or conditions for the reuse of data that should be stated by the author • Whether exceptions to the data policy are allowable • The method by which author compliance with the policy will be monitored • A statement of the consequences to the author of noncompliance with the policy • A statement of the journal procedure for dealing with complaints from other researchers should their requests for data not be met However, we gradually became convinced that this was not an adequate basis for a model policy. The accumulated features of existing policies tended to reflect the confusion, amounting at times to contradiction, in what publishers and editorial committees had so far set out. It became clear that an effective process required us to focus our attention on the views of the various stakeholders in the data sharing process. The first lessons this emphasis offered were that the current digital infrastructure is in a state of flux, with variation between publishers, repositories, and systems such that no powerful encouragement to share data emerges. We were clear that • Publishers vary widely in their approach to sharing data on which articles are based. • Guidelines to authors concerning what type of data are acceptable, where the data should be deposited. and when the data should be deposited in the publication process are mainly vague. • Researchers in all disciplines are generally in favor of sharing data but perceive barriers that they do not know how to overcome.
• Researchers considered that they would benefit from clear publisher and journal policies on data format and place of deposit. • Publishers also perceive barriers to linking and embedding data.
To find a way through the difficulties this presented, we brought the distinction made by Piwowar and Chapman (2008a) between strong and weak policies to the center of the process. They identified the following characteristics of a strong policy.
• A motivating statement for the benefits of data sharing to the scientific community • A general statement implying support for data sharing • Types of data that can be included in articles • Whether the data should be available for peer review • The wording of data sharing instructions and whether data deposit is a condition of publication • An instruction for the location of data archiving, for example, a web page or publicly accessible repository • The format of data • The completeness of data sets • The timing of when data will be made openly available • Possible consequences of noncompliance with the journal data policy Consideration of these points assisted us in the process of identifying key findings from the qualitative research. A major finding of our study was that it would often be impractical to include all data that supported the results reported in a journal article. Data formats and file sizes vary across a wide spectrum, very often depending on the set of methods used in the research. Qualitative research generates data in the forms of documents and text, such as excavation and field observation notes, or transcripts of interviews or reports. Quantitative methods produce numerical data, which typically are held in spreadsheets. Many types of data might be generated from one piece of research, so an article might have to include extra text, numerical data sets, and digital images, which would increase the file sizes. In particular, the publishers showed concern about the ultimate file sizes required should large data sets be integrated into each and every article. Certain publishers are indeed attempting to produce online journal articles that have the capacity to include many kinds of data, for example, Elsevier's Article of the Future (http://www.elsevier.com/about/mission/ innovative-tools/article-of-the-future). However, such a capacity is unlikely to be available for every journal. This creates a requirement that a journal policy should clearly state the extent to which data can or cannot be included as an integral part of an article.
Linking crucial data to a journal article from a specific institutional repository is a reasonable alternative to overloading a publisher's server, although this transfers the associated long-term cost to the host institution. Funders currently do not include such longer term costs as part of a research grant, and institutions may be reluctant to see these included within current overheads. Publishers also indicated a number of concerns about linking data from repositories. First, hyperlinks should be permanent. A broken URL would not reflect well on the publisher or the author. Second, publishers queried whether there is a procedure for data citation, because there are currently few standard data citation schemes. Both authors and publishers are concerned about intellectual property rights, and at present the potentially divisive implications of this are not made fully obvious in existing policies. There is also the concern of continued data preservation should a repository close. It is also fair to say that similar concerns could be expressed in return by institutions should publishers host the material.
It is possible that the concerns expressed by the publishers can be allayed through the current development of data repositories that have the remit of securely storing data with reliable and easy linkages. For example, the Dryad Digital Repository collaborates with partner journals and data citation systems and uses permanent URLs (Queens University, 2013). Similarly, the Australian National Data Service (ANDS) is a national repository for research data generated by Australian Institutions (ANDS, 2014) that also incorporates data citation systems with digital object identifiers (DOI, 2013). The concept of data citation is currently being explored by researchers, particularly with the rise of data journals and the continuing development of DataCite, which is a worldwide organization that works with data centers and publishers by providing persistent identifiers for data sets and other digital items (DataCite, 2013). Although digital repositories are a recent phenomenon, and their longevity has not been tested, responsible repository managers have policies that would come into play should a repository close. For example, the policy of Dryad, states that "in the event that Dryad can no longer maintain the Repository as an active service, all Dryad-registered DOIs will be updated to resolve to the copy at the CLOCKSS 1 archive, which will continue to provide free access to the Content under the same licensing terms" (Dryad, 2013).
A consistent message from the research was that a major barrier to the open sharing of data was not the reluctance of researchers but their inadequate knowledge of where to upload the data. Many were not aware of data repositories, and those who were showed concern about their general infrastructure. The obvious implication was that a journal data policy should state whether the data should be deposited in a named repository with a trusted content policy, whether a permanent uniform resource locator (URL) should be used, and whether any data citation style is necessary. The timing of the release of data raises an interesting point; researchers were concerned not about the point in the publication process but about the point in their research at which the data should be made openly accessible. Articles are written not only at the conclusion of studies but at intervals during the research process. It may or may not be appropriate to release the data at the same point, depending on such things as the established PhD premise that the research must be unique, the possible sensitivity of some forms of data, and ethical constraints that should protect human subjects.
While the JoRD project was looking at social science and science journals in a global sense, the European Data Watch Extended (EDaWaX) project was examining the policies of economics journals from the point of view of German economists. EDaWaX researchers started from a perception that economics journals had to mandate data sharing policies in order to ensure that economics research data would become available for replication and validation. The requirements for data availability policies that EDaWaX suggest (Vlaeminck, 2013) are summarized as follows.
1. A journal data policy should stipulate that sharing data is mandatory. 2. The original data with any necessary instruction for computation must be made available. 3. The data files must be given to journal editors before an article is published. 4. All the submitted files must be publicly available, unless they contain sensitive data. 5. The journal data policy should contain a procedure for the method by which sensitive data sets could be used to replicate research. 6. The journal should contain a replication section, which would include results of failed replications. This would encourage authors to provide high-quality, welldocumented data. 7. Data should be submitted in open formats, preferably ASCII, to allow preservation and interoperability. 8. The version of the operating system and software used for analyzing the data should be supplied.
The terseness of these recommendations is commendable, but they are not universal in their application. For instance, numbers 5 and 6 on replication are probably not relevant to a general research data policy. They are also quite categorical that sharing should be mandatory. A model journal research data policy, to cover many disciplines, might reasonably allow a journal to express whether the deposition of data is recommended or mandatory. More universal is the recommendation that data should be made openly accessible. EDaWaX considered the issue of the sensitivity of some data (for reasons including the personal, commercial, and national). We also encountered these concerns. A model policy might respond by including exemptions, procedures for closed access, or embargo periods for sensitive data.
Our initial model policy draft of the JoRD project covered the three questions of where, what, and when. That is, where data should be deposited, what type of data should be deposited and in which format, and at what time data should be during the publication process, with also the possibility of embargos for the release of data at the correct time during the research process. The handling of sensitive data was not specifically addressed. The initial policy briefly mentioned data referencing under other instructions 1 CLOCKSS: Controlled Lots of Copies Keep Stuff Safe. regarding data, but a full and clear statement about data citation and metadata in general is required by stakeholders. Similarly, many stakeholder concerns about intellectual property rights (IPR) for data should be allayed by the inclusion of recommendations about metadata associated with authors, such as DOIs and Open Researcher and Contributor Identity (ORCID) identifiers. ORCID identifiers are small pieces of unique code that can be used to identify academic authors entered in the ORCID registry, which can be found at http://orcid.org/content/initiative. Other IPR issues, particularly regarding funders' IPR, can be addressed by authors supplying clear statements on the IPR status of the data and any reuse rights or restrictions. The quality issues of URLs and linked data also should be mentioned, with guidelines about choice of permanent URLs or universal resource indicators (URIs). Some researchers were under the impression that depositing data would automatically preserve or "future-proof" it. To respond to this misapprehension, we thought that a policy should include a statement on the need for appropriate formatting and metadata as key contributions to the preservation process.
The following model framework for a journal research data policy was developed from the insights we have outlined. We stress that it is not a policy in its own right but that it is capable of being used as a kind of "policy engine" from which journal policies could be developed. We envisage a process whereby such policies are developed cooperatively between funders and research institutions on the one hand and publishers on the other. In the event of difficulties, a resolution process is needed, which will as a prerequisite recognize the ultimate right of the funders to mandate the fate of the data that have been generated by research for which they-or most often the public-have paid.

Journal Research Data Policy Model Framework
1. Policy statement on the benefits of data sharing, for example: • XYZ Publishing believes that the data used to draw conclusions from articles should be made widely available to the research community in order to facilitate collaboration, prove validation, and encourage replication and reuse of the data. XYZ Publishing considers that such transparency benefits the author by greater exposure to their work and increased citation and improves the quality of science. 2. Designation of the policy owners, for example, either of the following statements: • This research data policy is the policy of the Society of XYZ. • This research data policy is the policy of the editorial board of the Journal of XYZ. • This research data policy is the policy of XYZ Publishing. 3. A request that authors provide a statement identifying the original funder(s) of the research that produced the data, or different parts of the data, for example, the following statement: • Authors are required to name the funder that sponsored the research and collection of data on which an article is based. 4. A clear statement of whether depositing data is mandatory for publication or is a recommendation, for example, either of the following statements: • It is a mandatory requirement of the publication of the submitted article that all data on which the article conclusions are based will be deposited by the author or authors in a location that is freely and openly accessible. • It is recommended that all data on which the article conclusions are based should be deposited by the author or authors in a location that is freely and openly accessible. • It is not necessary to make data associated with this article openly accessible. 5. A clear statement on whether the data can or cannot be included as an integral part of an article or whether hyperlinks should be included in the article or whether appendices that led to the data can be saved on a server different from that on which the article is held, for example: • Data will be embedded into the published article or appendices. • Data is not be embedded in the published article or appendices. • Data will be accessible through hyperlinks in the article that lead to another server that is/is not controlled by XYZ Publishing. • Arrangements should be made for interested researchers to have access to the data. 6. A statement on whether the data should be deposited in a specifically named repository or a location of the author's choice, for example: • Data are required to be deposited in the data depository, Dryad (http://datadryad.org/), where the Journal of XYZ is an integrated journal. • Data should be deposited in a repository that is accredited by the Society of XYZ. • Data may be deposited in repository that has the XYZ Data Seal of Approval. • Data may be deposited in the lead author's institutional repository. • Data may be deposited in a trusted repository on the discretion of the author or authors. • Data may be obtained by arrangement with the author(s). 7. A clear statement on the form of URL that should be used should the data be linked to the article from another server, for example: • URLs used to link to the data must be permalinks.
• URLs used to link to the data must be digital object identifiers. • Authors should/may use Uniform Resource Indicators to link the data to the article. • Authors should/may use Persistent Uniform Resource Locators to link the data to the article. 8. A clear statement on the type of data that would be accepted, bearing in mind the distinction between essential and supplemental data, for example: • Acceptable forms of data that can be linked to or embedded in articles are video images/audio files/software/spreadsheets/text-based files/DNA sequences.
• Unacceptable forms of data to be linked or embedded into articles are video images/audio files/software/ spreadsheets/text-based files/DNA sequences. 9. Guidance on the selection of data from larger data sets that would be the most relevant to the published article, for example: • If the published article is based on a limited quantity of data that were taken from a larger data set, only the data necessary for the article need be deposited. • If the published article is based on a limited quantity of data that were taken from a larger data set, we require that the entire data set be made publicly accessible. • If the published article is based on a limited quantity of data that were taken from a larger data set, the author may choose to deposit some or all of the data set. 10. A clear indication of the format of data accepted, with an explanation given about the expectations for data preservation, for example: • Data will be accepted in any format.
• Data will be accepted only in ASCII-format in order to aid data preservation and interoperability. • Data will be accepted in open formats in order to aid data preservation and interoperability. • Data that require access to code so that findings can be replicated will be deposited with that code. 11. Guidance on data citation style if data citation is required, for example: • It is not necessary to reference the data.
• Authors may choose to reference the data.
• Data should be referenced using the following method (example given from Dryad). • Data may be deposited when the article is published with an embargo. 14. A statement indicating that ethical concerns with the publication of data from human subjects can be addressed, for example: • Prior to deposit, identifiers should be removed from human subject data, such as names, addresses, dates of birth, social security or national health numbers, telephone numbers, and so on. • Human subject and other sensitive data may be allowed an embargo before release. • Special arrangements may be made by authors for individual researchers to obtain human subject and other sensitive data. • In special cases, human subject and other sensitive data may be allowed an exemption. 15. Guidelines to authors on procedures for granting individual researchers access to sensitive data, for example: • In the case of sensitive data that should not be made public, authors should make arrangements with individual researchers to pass on data sets. • In the case of sensitive data that should not be made public, authors should make arrangements with individual researchers for how to replicate the study. • In the case of sensitive data, the contact details of the author will be supplied to interested parties. 16. A clear statement on the criteria for exemption, in the event of the policy allowing exemptions for certain types of data, for example: • The editorial board of the Journal of XYZ will consider exemptions to the research data policy should the author(s) be able to prove that publication of the data they gathered will: • Be seriously detrimental to the life or lives of persons or their families who were participants of the research • Provoke serious consequences for an established industry • Aggravate serious consequences for national security. 17. Author requirement to provide a statement concerning the IPR status of the data or different parts of the data (where reuse will be allowed, a clear statement on the reuse rights allowed, for example, using the Creative Commons licences [http://creativecommons.org/licenses] as the clearest and most widely understood reuse rights specifications) and a statement on accommodating pre-existing IPR and/or reuse requirements arising from applicable funder or institutional policies, including embargo periods and treatment of sensitive data, for example: • These data are the result of funding from XYZ Funders, with shared IPR among the authors, their institutions, and the funders in line with relevant policies. The data are released on an Attribution Non-Commercial Share Alike (CC BY-NC-SA) license after 6 months embargo from the time of publication, in line with the funding policy. 18. Guidance on whether the method of data analysis should be declared, for example: • The method of data analysis should be made clear in the related article. • A detailed method of data analysis should be provided to allow replication of the study. • The author(s) may choose to outline the data analysis.
19. Information on metadata and author identifiers, for example: • Data sets are required to be given an overall digital object identifier (DOI). • Each item of data are required to be given a DOI. • Data should be submitted with a README file that describes coding and software, abbreviations and terms used, units of measurement, and details of any other associated data. 20. A prominent and clear statement on policy compliance expectations, including any reasonable time limits allowed between publication and data deposit, for example: • XYZ Publishing expects that all authors will comply with the research data policy. • XYZ Publishing will not publish an article until a notification is received from repository X that it has been duly deposited. • XYZ Publishing will allow authors one calendar month from the data of publication for the deposition of data. 21. Finally, a prominent listing of consequences of noncompliance with the journal research data policy and monitoring methods of noncompliance, for example: • Should the Journal of XYZ receive complaints from other researchers who cannot access data associated with a published article, the authors will be approached and evidence of data deposit must be produced. • Should an author not comply with the policy of the Society of XYZ, membership in the organization will be revoked. • Should data not be deposited within the given time limit, XYZ Publishing will no longer publish papers written by the author(s) of the associated article.

Conclusions
A model policy is no more than the term implies, a suggestion. The JoRD project was in a position to both accumulate the content of existing policies and to design a policy on the basis of qualitative research. The model outlined here is confidently offered to publishers, editors, editorial boards, and organizations such as scholarly societies and research institutes. They will nevertheless have to examine it closely to assess its fit with their specific needs and adapt it as necessary. What is utterly essential in our is that journals should offer a policy, and offer the best policy that they can devise. This model is intended to facilitate and strengthen that process.