ePubs: an Institutional Repository for Large-Scale Experimental Facilities

Introduction

EPubs is the institutional repository for the Daresbury and Rutherford Appleton Laboratories, national laboratories which provide access to large-scale experimental facilities to the UK and International scientific community. ePubs has been in production since May 2004 and is intended to hold the scientific and technical output of staff and visitors to the facilities. In this paper, we review the history of and some key lessons learnt from the development of ePubs.

ePubs: the start

In 2002 the CCLRC Library Service undertook a feasibility study to examine the needs for an "e-publication archive". This study was undertaken in Spring 2003 and it consulted a wide range of internal stakeholders and the key conclusions were:

  • There was a perceived need for a single location for the published output of CCLRC
  • That for the departments with large scale experimental facilities, collecting all science outputs associated with Facility use were more important than limiting them to those produced by internal staff.
  • It is important to be able to link to internal procedures and systems such as annual reporting and the staff record system.
  • The importance of finding selling points for the authors themselves to encourage self- deposit.

Over the next year a system was written in-house, as at that time, there wasn't an external product which could satisfy the needs of the organisation. The ePublications archive went live externally in May 2004 and was affectionately shortened to ePubs soon afterwards.

Resourcing

As ePubs has matured from a project into a valued service within the Library's portfolio, the team who supports it has changed and the processes formalised. At present the ePubs team divides into four areas:

  • System administration for the server and Oracle database; the technical people, from specialists based outside of the Library.
  • Content management: The members of library staff who are responsible for the quality of the content.
  • Management and stakeholders: The service director who liaises with the wider community and is responsible for resourcing.
  • Development of the software: the software developer and the project manager for software enhancements. The project manager is a member of Library staff.

With a wider and more disparate team, communication becomes more important and this is reinforced with quarterly whole team meetings and formal procedures such as Change Control for the software and underlying infrastructure. The advantage of this team structure is that the separate parts are managed by experts and thus a production quality service is provided.

ePubs is embedded within the Library and Information Services five-year strategy and will be resourced to sustain the service over this timescale.

Content

ePubs has been in production for five years and as of December 2007 has 24750 metadata records and of those there are 1055 with full text entries. These entries span the fifty years that the Rutherford Appleton Laboratory has been in existence, due to a policy of allowing retrospective deposit of related material.

The content has been slowly increasing over the last five years but there have been a couple of internal developments which have increased the profile of ePubs and embedded it further into the internal business processes. The first is related to an internal metrics exercise to measure the outputs of the departments.

The second is the general theme within the facilities of holding information about a particular project from the cradle to the grave. An integrated view is taken so that it is possible to track research from the research proposal, which requests beam time on an STFC facility, through the data collection and analysis phase, to the publication of results. Ultimately, this will complete the cycle by linking the next research proposal to the results of the previous one within one system.

Library staff do basic metadata checking on record inputs. There is also the ability to check the database for data validation; these include empty fields, duplicates and journal titles.

Part of the ePubs input process is the ability to link the record to the institution's internal PEOPLE information system, containing information about staff and users. This means that authors who are inconsistent about their name have all their forms grouped under the same heading in an author index. The actual metadata record has the information as it appeared in the publication for accuracy but it is easier to locate the complete publishing record of a particular author.

Issues

A number of issues arose in the development of ePubs which should be highlighted.

Authors' attitudes to metadata input
One of the later developments was to provide a simplified, one page entry as users were put off going through three input screens, even though many of the fields were not mandatory. It is strange to reflect that the mere presence of fields put many people off starting.

Naming the repository: ePubs
ePubs became ePubs due to laziness in the team (shorter than saying ePublications archive). As we started to ramp up the publicity there was an internal discussion about whether it was a suitable name for a corporate system. However this discussion was too late in the process as the term was well-embedded and would have been too difficult to change. In retrospect we should have considered this issue earlier in the process.

Tracking organisational change
As previously mentioned, there are records dating back to 1957 and whilst there has been laboratories at RAL & DL over this time the umbrella organisation has changed and there have been many internal changes too. We are still discussing the best way of reflecting organisation change whilst ensuring this does not impact authors negatively. An example of this is the Energy Research Unit which has been in existence for over twenty years, but has been in six departments over this time. The ERU require one cohesive list of their publications. The formation of STFC from CCLRC and PPARC has caused further issues along this line.

Successes

A number of key success points can also be highlighted to influence future developments of institutional repositories.

Historic information
One of the factors which has encouraged take-up was the decision to put 7000 metadata records into our pilot system. Being able to demonstrate a system with a large amount of data in it helped to sell it to our stakeholders and brought forth offers of similar data for other science areas. It has meant that there is a relatively low proportion of full text entries but it is used for operational issues such as department bibliographies.

Institution buy-in
We have concentrated on areas which are most useful to our user community. For us this has meant inputting historic information and offering the same functionality to the various departments, whilst transferring the support costs and maintenance to the Library and Information Services.

Publicising download figures
ePubs captures the download figures for full text and this is available for internal logged in staff. Showing how many people can download a publication has encouraged others to do the same.

Conclusions

ePubs is embedded within the organisation but needs to engage the authors to enhance the amount of full text deposited.

Catherine Jones, Brian Matthews and Linda Gilbert
Science and Technology Facilities Council