Journals and repositories: an evolving relationship?

It is now widely accepted that there are two routes to open access (OA): OA repositories and OA journals. It is often assumed these are distinct alternative parallel tracks. However, it has recently become clear that there is potential for repositories and journals to interact with each other on an ongoing basis and between them to form a coherent OA scholarly communication system. This paper puts forward three possible models of interaction between repositories and journals; services such as arXiv and PubMed Central, and the work carried out by the RIOJA project, are working exemplars and pilot implementations of these models. The key issues associated with the widespread adoption of these models include repository infrastructure development; changing ideas of the ‘journal’, ‘article’, and ‘publication’; version management; quality assurance; business and funding models; developing value‐added features; content preservation; policy frameworks; and changing roles and cultures within the research community.


I ntroduction
The Budapest Open Access Initiative (BOAI) statement, drafted in 2001, identified two routes to open access (OA): 'open electronic archives' (now commonly referred to as 'repositories') and OA journals. 1 The idea of the two routes, sometimes referred to as the 'green' and 'gold' routes, respectively, is now widely accepted. It is often assumed that these two routes are distinct parallel tracks, alternatives rather than complements. Many advocates of OA have tended to favour one route rather than the other, and this has been reflected in the literature and in professional discussions.
However, it has recently become apparent that there is potential for repositories and journals to interact with each other on an ongoing basis and between them to form a coherent OA scholarly communication system. A number of working exemplars have emerged which have involved an interaction between repositories and journals; other models of interaction are currently being investigated and piloted.

Background
In order to understand the models, it may be useful to begin by providing some working definitions of the key concepts: repositories, journals, and open access.
A repository may be defined as a set of systems and services which facilitates the ingest, storage, management, retrieval, display, and reuse of digital objects. Repositories may be set up by institutions, subject communities, research funders, or other groups. They may provide access to a variety of digital objects, including peer-reviewed journal articles, book chapters, theses, datasets, learning objects, or rich media files. This paper concentrates on the scholarly literature and provision of access to so-called 'eprints', electronic copies of research arti-cles or similar outputs. An eprint may take the form of a 'preprint', a version of a paper prior to peer review, recently designated by NISO as the 'author's original' or 'submitted manuscript under review'. 2, 3 Alternatively, it may be a 'postprint', a version of a paper in which changes have been made in response to peer-reviewers' comments, either in the form of the 'accepted manuscript' produced by the author or the 'version of record' produced by the publisher (using the NISO terminology). Such material may be deposited in the repository in a variety of ways, but one common characteristic of repositories is a workflow that allows authors to deposit their content themselves (known as 'selfarchiving').
A journal may be defined as a type of publication containing a cumulative collection of quality-assured articles, normally within a particular subject area, added to at regular or irregular intervals under a single ongoing title. Within the research community, the quality assurance component of this definition is particularly important. Quality assurance, typically through peer review, is an essential feature of scholarly communication which is valued by researchers as a means of improving the research outputs of authors and also as a filtering and timesaving mechanism for readers. The frequency of publication of journals varies, with some electronic journals now making articles available as soon as they are ready (whether or not they are retrospectively grouped into an issue). But, however often it is published, the fact that a journal has a single ongoing title is important, not least because it allows the journal to act as a brand. Researchers within particular subject communities recognize and trust certain journal brands. Publishing in a certain journal might be, for example, an important 'esteem indicator' in a given subject community.
Both repositories and journals may or may not be OA. OA has been defined many times in the literature, but the working definition that will be used here is where the full content is freely, immediately, and permanently available and can be accessed and reused in an unrestricted way. The fact that the content is free at the point of access does not, of course, mean that no costs have been incurred in generating it; rather that the costs are not met as part of the access process itself, but at other stages of the content production process. The immediacy of availability in the definition is also important. One of the major rationales for OA is to improve the timeliness of scholarly communication; anything that undermines this undermines OA itself. Similarly, permanence is important. Content must be available to be accessed or cited, in order for it to play a proper role in the scholarly communication process; that means that there must be a commitment to its ongoing availability, in terms of both the persistence of access paths and preservation of the content itself. Provision for reuse is just as important as access itself. The BOAI emphasizes this in its definition of 'open access': permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. 1 Many of the advantages of OA can only be fully realized, as Clifford Lynch has explained, when such reuse opportunities are fully exploited. 4 These definitions are the starting point of the analysis in this paper. However, it is clear that, as practices change, assumptions of what journals and repositories are, and the roles that they perform in the scholarly communication process, may also begin to change.

Models
This paper presents three models of ongoing interaction between journals and repositories in an OA context. The models illustrate processes and also highlight the roles of different agents in these processes. In each model, different stages of the process are grouped under the headings of 'journal' or 'repository', indicating the main locus of activity. The identification of these models and the creation of associated workflow diagrams are designed to help clarify thinking on the issues and to identify areas for possible future investigation.

Model 1: 'Repository to Journal'
Model 1 is in many respects built around the 'conventional' journal publication process, represented under the 'journal' column in Figure 1. Here the author writes a paper with the intention of placing it in a peer-reviewed journal, and submits it to the journal for refereeing. The editor and referees between them carry out the peer-review process and a decision is made whether to accept or reject the paper. Assuming the paper is accepted, changes will normally be required in order to improve the paper for publication. When these changes have been made, the author submits the final version of the paper, and, following copyediting and formatting by the publisher, the paper is published in the journal.
The repository is involved in this model at two points, as illustrated in Figure 1. Firstly, the paper can be deposited (self-archived) in the repository in its pre-refereed form (preprint or 'submitted manuscript under review') by the author at the same time it is first submitted to the journal. The repository then makes the paper publicly available. Secondly, the author can deposit a version of the paper following peer review (postprint, normally in the form of the 'accepted manuscript'). The repository will then make that version of the paper available at this stage. The first of these (preprint self-archiving) is not a necessary part of the model, and may in practice only be carried out in certain subject communities that have a preprint culture. Preprint cultures developed some time ago in certain disciplines, primarily to facilitate the rapid dissemination of research results and to enable authors to assert priority. 5 However, even in the case of postprint deposit, it is likely that the paper will be made available in the repository before formal publication in the journal. The final version of the paper deposited is normally expected to be the author-produced final version (the 'accepted manuscript'). This may differ (at least in format and often in detail) from the publisher-produced, formally published version (the 'version of record'). 6,7 This model is already working in a number of subject areas, notably high-energy physics (a discipline with a well-established preprint culture) using arXiv. It is described by Henneken et al., 8  it is likely that the paper will be made available in the repository before formal publication in the journal able is that usage of the journal and of the repository are in fact complementary. They present data from a set of high-energy physics journals and the arXiv and ADS repositories which show that, following publication in the journal, usage switches in a marked way from the repository to the journal. This leads them to characterize the relationship between repositories and journals as a 'productive coexistence'. There is therefore no need in this model for any changes to existing business models or publication practices associated with subscription journals or repositories. They can continue to coexist in a complementary way for the foreseeable future. 9

Model 2: 'Journal to Repository'
Model 2 also involves the conventional journal-publishing process of paper submission, peer review, paper revision, copy-editing and formatting, and formal publication (as illustrated in Figure 2). There is, however, one crucial difference in the journal-centred stage of the process compared with Model 1. The journal has to be an OA or 'hybrid' publication, allowing the content to be made OA on publication (usually via a business model that allows payment of an author-side OA fee before publication). In Model 2, the repository comes into play only after the completion of the journal publication process. There is no pre-publication archiving of either the preprint ('author's original') or the postprint ('accepted manuscript'). However, once the paper has appeared in a journal, the role of the repository is considerably enhanced compared with Model 1. Following formal publication in the journal, the author or the publisher deposits the 'version of record' of the paper in the repository. The paper is then processed further by the repository, often involving moving it into a new file format and restructuring or tagging the content. The repository will also carry out any remaining necessary preservation actions on the digital object. It then makes the paper openly available for access, sometimes following a delay or embargo period, and also takes ongoing responsibility for the paper's long-term preservation (something that is not directly addressed in Model 1). This model has also already been implemented. Robert Terry has described the model in relation to UK PubMed Central (UKPMC), which has been operational since 2005. 10 Terry provides an account of the issues from the perspective of the Wellcome Trust, which devoted considerable effort in 2005 and 2006 to negotiating with publishers to achieve agreements allowing the once the paper has appeared in a journal, the role of the repository is considerably enhanced deposit of papers in UKPMC (if necessary, after an embargo period). The preferred process with UKPMC is for the journal publisher to deposit an XML document into the repository, without any author intervention being necessary at this point. UKPMC has created facilities to format the submissions to allow for reuse and analysis of the content, enabling processes such as content analysis through data mining. In addition, Wellcome has put funds into the development of UKPMC itself. Further upstream in the process, it also funds grant holders to pay journal OA fees, something that is important to ensure the system as a whole can work. 11 Model 2 involves the repository making content available only after formal publication in the journal. The processes involved do, however, include some areas of potential duplication. Both the journal publisher and the repository process the content and prepare papers for dissemination. Both also manage a techinical infrastructure which makes the content available. A possible variation on Model 2 (Model 2a) might be proposed which partially addresses this duplication. 12 In this model (illustrated in Figure 3) the conventional journal publication process is followed until the copy-editing and paperformatting stage. After that, the publisher deposits the content in the repository as a means of making it available. The publisher then needs to link to the paper from its own site rather than managing the content itself. In other words, the publisher uses the repository as a venue for publication, rather than maintaining its own infrastructure to support content delivery. Such a model begins to look rather like an overlay model (see below), and involves a shift in the responsibilities of the different agents within the scholarly communication process. It does, however, fit with current trends of 'cloud computing' (where storage of content is outsourced to a remote provider).
This model is a theoretical construct and is not being used as yet. It does, however, form an interesting bridge to the third model, the overlay journal.

Model 3: 'Repository to Overlay Journal'
Model 3, shown in Figure 4, is a further step away from the conventional journal publication process. Here the author produces an article without necessarily submitting it to a particular journal. However, when an initial version of the paper is complete, it is deposited by the author in a repository. The repository then makes the paper publicly available.
At this stage, the paper is identified by an the publisher needs to link to the paper from its own site rather than managing the content itself overlay journal for possible inclusion. This could either be done by the author submitting the paper for consideration by the journal, or by the journal itself identifying the paper independently (or both). The journal editor then engages referees in the normal way to review the paper. Assuming the paper is accepted, it may then be revised by the author in response to referees' comments. At this stage it would also be possible to include a step in which the publisher copy-edits and formats the paper. In this case, the paper is then deposited within the repository as a publisher-produced version; otherwise, an author-produced version may be deposited. When the repository makes the final version of the paper available, the journal links to the paper from its own site. It does not, however, itself hold the content. Nevertheless, either the journal or repository may then expose the paper to additional post-publication quality measures, such as citation analyses or interactive open peer discussion. Something like this model has been described by John Smith 13 and Arthur Smith. 14 It is, as yet, the least mature of the models in terms of actual implementations, and perhaps the furthest from the conventional publishing model. There are, however, a number of pilot projects which are testing out this model, notably RIOJA (Repository Interface for Overlaid Journal Archives) based at University College London. 15 This model involves a greater degree of disaggregation of the different functions of scholarly communication, followed by their reaggregation in different combinations. However, the precise funding and business models -as well as the technologies which would support such a model -still need to be worked through. RIOJA has carried out some interesting work on the latter, developing a transfer protocol between the repository (in this case arXiv) and an overlay journal. Early work on business models suggests that some kind of author-side payment may be necessary to sustain such a model.

Scholarly communication: functions
The key features of these models may be further clarified when they are measured up against the now widely accepted list of functions of scholarly communication. These functions, first identified by Roosendaal and Geurts, 16   Registration is associated with defining and recording responsibility for a piece of work in a public way. This may often be connected with asserting priority -ensuring that work (and the ideas behind it) can be correctly attributed to a particular person or group. Certification is the quality assurance process which marks certain works as having been through particular quality-control processes, normally peer review. Dissemination is about the circulation of the work so that it finds its readership and makes an impact. Archiving relates to preserving long-term access to the content so that there is a reliable record of scientific and scholarly findings which can be read, cited and built on in the future. 9 Reward relates to the role played by journals in contributing to the reputation and status of authors in their subject community. Researchers want to publish their articles in the journals most respected within their community; most subject areas have widely accepted informal hierarchies of journals. These hierarchies often relate to, but do not always precisely correlate with, impact factors. A record of publication in important journals is a significant contributory factor in personal performance reviews, tenure and promotion applications, and grant proposals. Prosser summarizes the motivation of publishing in widely respected journals: 'The greater the kudos of the journal, the greater the chance of a successful future promotion or research grant.' 17 In the conventional scholarly publishing system, all of these functions are carried out by various parties focused on the journal itself. However, that changes with all of the models discussed in this paper. They all involve an extended role for the repository. Models 1 and 3 have the repository as primarily responsible for registration. In these models, a paper is first made publicly available in the repository, in a verifiable form which can be subsequently cited. One of the key arguments for OA repositories is the speed with which they can make material available, and this is obviously a crucial factor in the registration function.
The role of the repository is also extended in all of the models in the area of dissemination or awareness. The repository acts as a major vehicle for making the content available, either before its formal publication in a journal (as Model 1) or after (in Model 2). In Model 3, the repository has an ongoing role in making the content available, although readers may often access it via an overlay journal service. In Models 2 and 3, the repository also takes responsibility for archiving. A longterm preservation function is built into the workflows. This is not necessarily the case in Model 1; responsibility for long-term preservation of the digital object is not explicitly built into any part of this model.
In all of the models, the certification function remains the responsibility of the journal. In fact, this activity of quality control becomes the key focus of the journal, along with (possibly) a continuation of some editorial functions. Reward also probably continues to derive from publication in (or under the brand of) the journal. It is a byproduct of the journal publication process, which is likely to continue to be closely allied with the quality-assurance process and the academic reputation of the journal that develops over time.

Implementation issues
The models described here give rise to a number of key issues, some of which have a direct bearing on the possibility of their widespread adoption in the research community and publishing industry.

Repository infrastructure development
All of the models discussed in this paper rely on an infrastructure of working repositories set up according to good-practice guidelines and interlinked by agreed standard protocols. There has been a considerable growth in the number of OA repositories globally in the last four years, but there is still some way to go. A study carried out by Mark Ware in January 2004 identified 250 OA repositories (and other similar data providers), 45 of which were institutional repositories. 18 In November 2008, the OpenDOAR registry of OA repositories listed 1,287 repositories, 1,032 of which were institution-based. 19 in all of the models, the certification function remains the responsibility of the journal Despite the fact that the majority of OA repositories (80%) are institution-based, it is noticeable that many of the current working examples of the three models discussed here involve subject-based repositories. This is possibly because most institutional repositories are still at relatively early stages of their development and do not yet contain large numbers of papers. It remains to be seen how this will pan out in any future publication models. However, what is clear is that all repositories need to be constructed according to agreed standards to ensure they are interoperable; this is essential so that repositories can easily be cross-searched and mined. Further practices also need to be developed to verify and certify that repositories themselves are complying with relevant standards, in order to ensure there is widespread confidence in the repository infrastructure as a whole.

Changing ideas of the 'journal', 'article', and 'publication'
One of the most obvious issues associated with the above models is that of a change to the idea of the 'journal'. Most importantly, rather than all of the different functions of scholarly communication being bundled up within the journal, they are 'deconstructed' and then recombined in different ways. The primary focus of the journal then becomes (to an even greater extent than in the conventional process) quality control. This is the case even in the overlay journal model, although the relationships between the different process stages change considerably. In all the models, the journal maintains some kind of role as a brand, something that is important to academic users.
With changes to the journal come changes to the status of the 'article'. It is, for example, conceivable in Model 3 that a single article could be associated with publication in a number of different overlay journals. This would break the normal one-to-one relationship between article and journal. The consequences of such a possibility need to be more thoroughly investigated.
The concept of 'publication' itself also begins to shift in these models. The traditional idea of publication being a single event, when a paper appears in a peerreviewed journal, looks unnecessarily narrow in the context of these models. Since papers are made available at various different stages of the dissemination process in all of the models, publication itself begins to look more like a process rather than a single event. 20,21

Version identification and management
One consequence of the changing nature of publication is that new standards and processes need to be developed to manage the publication flow. Version identification is an essential feature of this. It is crucial that researchers are able quickly to identify the status of a paper (e.g. whether it has been peer reviewed or not) and are able to be confident that they can cite a given version of the paper in a particular form which will not be altered. Standards are beginning to emerge to facilitate this (such as the NISO recommendations 2 ) but further work is needed to achieve widespread acceptance and to embed such standards in practice.

Quality control/assurance practices
Models 1-3 all involve some separation of quality control from the dissemination of content. In all of the models, however, it is assumed that quality control still takes place primarily in the form of peer review. Peer review, for all of its failings, is still generally recognized to be the best form of quality assurance available to the academic community. 22 However, the models also allow for other forms of post-publication measurement (such as citation or usage metrics) to be easily deployed. Stevan Harnad has shown how OA repositories might be used as vehicles for citation analysis encompassing all of the literature they contain, rather than the subset of the literature held in conventional citation indexes. 23,24 Such post-publication methods of assessment are likely to be used to a greater extent in the future to complement traditional peer review, and to become a valuable way of identifying the outputs upon which a subject community is building its ongoing work.
One issue raised by the models discussed is the way in which the content of a given Stephen Pinfield a single article could be associated with publication in a number of different overlay journals article can be improved during the scholarly communication process. All of the models allow for the version of an article to be improved following peer review. They all also have the potential to allow for editorial and formatting changes to be made under the auspices of the journal. The precise way in which this could happen is, however, perhaps the least clear in the overlay journal model and requires further work. Nevertheless it is important to note that there is no reason in principle why this model should exclude it.

Business and funding models
The traditional subscription model of journal publishing is assumed to be sustainable within Model 1, but would be difficult to sustain under Models 2 and 3. Model 2 explicitly involves OA journal publishing, in the form of either OA or hybrid OA journals charging author-side fees. Model 3 has a less clear business model attached to it, but is also likely to involve author-side payment. Further work is required to identify key costs and funding streams that might support such a model. Funding models to support repositories in particular require further work. Funding from research funders or institutions to support the creation and maintenance of repositories is perhaps the most likely model. However, arriving at precise costings for repositories, and thus identifying appropriate levels of funding, remain significant challenges. To help address this, Alma Swan has recently provided an overview of the business case for repositories, looking at costs and value. 25 Research funders and institutions also need to create appropriate funding streams to support OA journal publishing. At present, institutions have ways of pushing funding in the direction of a number of activities to underpin research, such as the purchase of library materials. However, few institutions have equivalent funding streams for the payment of OA fees. Institutions and research funders need to work in partnership to agree on practices allowing for research funding to be used for the payment of OA charges, and ensuring research institutions make the funds accessible to authors. In the UK, work carried out in this area by the Wellcome Trust might be seen as an exemplar. 11 Recently, Universities UK has published a briefing document on these issues for individual institutions interested in setting up an OA publication fund. 26

Developing value-added features
One of the major potential advantages of OA is the opportunity it opens up for adding value to the scholarly communication process. Search and retrieval of content could become more straightforward in an interoperable OA environment, with more full content, for example, being opened up to harvesting by Web search engines. Furthermore, the potential to link the published output with the data that underpins it is a realistic possibility for a wide range of disciplines. New forms of analysis, such as text mining, are also facilitated when content is made freely available; automated text-mining software can more easily navigate content in various locations if it is not behind access barriers. All of the models discussed in this paper would allow for such developments, although they are to some extent built in to Models 2 and 3 (since in these models the repository formats the content and makes it openly available in such a way as to promote automatic analysis).

Long-term preservation
Long-term preservation of digital objects remains a significant challenge, whatever the model of publication or dissemination. Models 2 and 3 attempt to address this challenge by placing responsibility for long-term preservation firmly with the repository. However, significant technical and procedural challenges remain. Although a great deal of useful work has been carried out on digital preservation in the last five years by libraries and publishers (e.g. to preserve e-journal content), the impact on professional practice in shaping production services remains patchy.

Policy frameworks
The adoption of particular models of scholarly communication depends, to a certain research funders and institutions need to create appropriate funding streams to support OA journal publishing extent, on the policies of relevant bodies, particularly groups such as research funders, research institutions, and learned societies. Policies can have a significant impact on behaviours, although it can take time for policy changes to feed through to the actions of individuals. Between them, these groups are responsible for a range of policies which help to shape the behaviour of researchers in a variety of ways, including publishing practices.
One particular policy requirement, which is likely to have a significant impact on publication practices, is research evaluation. In the UK, the planned Research Excellence Framework (REF) may soon begin to affect behaviours (as many forms of measurement do). The REF will involve emphasis on metrics-based assessment of research quality, particularly bibliometrics based on citations. 27 It is likely, at least initially, that the REF exercise will be based on existing citation data sources such as the Web of Knowledge 28 or Scopus. 29 A possible unintended consequence of this is that the current pressure on authors to publish their work only (or primarily) in the traditional journals included within these indexes will be strengthened. This could have the effect of actually stifling innovation in scholarly communication in the UK.

Responsibilities
The responsibilities of the different agents involved in the scholarly communication process are different in these models from those in the traditional journal publication process. The models involve interactions between the author, the journal, and the repository. Within these categories different professional groups -including publishers, librarians, and IT professionals -and organizations -learned societies, academic institutions, publishing houses, funding agenciesall have potential roles to play. The skills and capabilities of these professional groups, and capacities of the organizations, need to be reviewed and adapted as changes occur in the scholarly communication process.
For example, there could be a change in the role of learned societies. As bodies which represent the interests of a particular profes-sional or academic community, with a mission to disseminate information about a given subject area, they have a potentially significant role to play in the certification function. Societies might, for example, develop to become providers of overlay services. However, this would undoubtedly require significant cultural and economic shifts for them as organizations.

Cultural change
In any change process, cultural issues are as important as (if not even more important than) technical challenges. Many of the potential changes discussed here will only be achieved as research and publishing cultures shift. Changes in cultures are determined by a highly complex set of factors, only some of which can be easily planned. It remains to be seen how the research community will respond to the challenges and opportunities it currently faces. In the ultimate analysis, it will be the working practices of researchers that determine the future shape of publishing and dissemination in any subject community.

Conclusion
Repositories and journals may interact in a future OA scholarly communication and dissemination environment. The models presented here provide illustrations of how this might work. The models are, however, more than just thought experiments; working exemplars already exist. The success of these on-the-ground implementations, whether at pilot or production stages, needs to be monitored and performance data gathered. Further experimentation and testing should also be encouraged. What is learned from such activities should be widely disseminated amongst key stakeholders who have interests in supporting the scholarly communication process. There are opportunities for all the key stakeholders to continue, and perhaps redefine, their roles in the research communication system. Achieving an optimum system (or systems) may not be painless for all of the stakeholders, but it will result in benefits for the research community and for anyone else who is interested in the outcomes of research. it will be the working practices of researchers that determine the future shape of publishing and dissemination