Measuring Success

Ultimately, the measure of a repository's success is its utility and the utilisation of the material it holds. This is the end we are trying to achieve. For many of us, however, our day-to-day preoccupation is the means to that end - getting content into the repository in the first place, and the changes to institutional culture that that entails. There are a number of possible ways of measuring success of these day-to-day activities, and it follows that these metrics could be used to set certain targets for the development of the repository. The section describes some of these potential metrics, and where possible tries to give a feel for what real repositories are actually achieving.

Publication lists

Confining ourselves to publications - other content types will have analogous criteria - the obvious ideal is that an Open Access eprint of every academic publication produced by an institution should be deposited in its repository. In practice, any number of factors conspire against achieving that ideal. Some of these factors also apply to conventional metadata-only bibliographies, but these bibliographies tend to have more complete coverage, partly because there is less clerical effort involved, and partly because there may be management directives and administrative procedures in place for gathering the information. Such bibliographies provide a benchmark by which we can judge the completeness of OA repositories, and help facilitate comparative assessments of the usage and impact of OA eprints versus non-OA publications. Here are some possible sources for institutional bibliographies:

  • Institution-wide Bibliographies
    • Publications list on the institutional website - These are usually publicly accessible, but in some cases access may be restricted to institutional users only
    • Database maintained by research managers or some other administrative office for the purposes of Research Assessment Exercises (RAE). These may or may not be available to people outside the relevant office.
  • Departmental Publications Lists - Usually part of the department's website. They may also include forthcoming publications
  • Public Subject Databases - Here we are talking about databases such as Medline, Inspec, Chemical Abstracts, etc., where it is possible to possible to search by institution name.

In an ideal world, the institutional repository would be the institution-wide bibliography, or at least be used as the data source for that and for departmental bibliographies. In reality, a full-text-only repository is highly unlikely to cover 100% of the institution's publications, and therefore not fulfill this purpose. Consequently, either the repository would have to be able to accommodate metadata-only records as well as full-text items (which might not be acceptable because it dilutes the raison d'être of an Open Access repository), or a supplementary database is required.

For most current statistical purposes, an institution-wide bibliography is probably the most useful benchmark - certainly the most comprehensive. Departmental lists and the public subject databases can provide useful representative samples. Furthermore, subject databases could even provide a means of comparing the performance different institutions' repositories, because their compilation methods should be consistent across institutions, whereas department-compiled lists are much more variable.

Matching repository records to bibliography records cannot be fully automated. You would be fortunate indeed if both the repository and the bibliography recorded unique identifiers such as DOIs (Digital Object Identifiers), and textual fields are unlikely to be sufficiently unique or consistently formatted to facilitate reliable matching. Given the manual effort involved, it is advisable to record the mapping between items in the two resources in a database or table, so that repeat analyses only require new items to be processed.

Compliance rates

Repositories that accommodate metadata-only records - such as Southampton's - can be compared directly with departmental or institution-wide publication lists. A simple percentage of publications included in the repository should be a suitable metric. This calculation should also be transposed, however, since inevitably there will always be some relevant items that appear in one list and not the other, and vice versa. If both systems are equally efficient, we would expect both percentages to be about the same, otherwise the better database would yield a higher percentage.

Metadata is all well and good, but we really want to give people access to the full items, so a more interesting measure of the completeness of a repository is the number of items in a departmental or institution-wide bibliography for which the repository holds an OA eprint. Again this could be expressed as a percentage. There is precious little data available on how UK repositories are performing in this respect. The feeling is that this percentage is generally quite low, although we are starting to see evidence that certain factors such as mandatory deposition, mediated deposition, better software, and advocacy campaigns may improve things.

One of the better examples of compliance statistics outside the UK comes from CERN, which showed that its repository contains open access full-text copies of nearly three quarters of its own recently-authored documents - J.Yeomans (2006).

CERN repository content in 2006

Chart showing the proportion of full-text items deposited in the CERN Document Server.

It would be interesting to have similar statistics for as many UK institutions as possible. We will provide details here as and when they become available.

What compliance targets should you set? The ideal is of course 100%, and that is the figure to quote in advocacy campaigns. However, this is a target that is unlikely ever to be met, so it is therefore better to set achievable targets for internal purposes. For a repository that includes metadata-only records as well as full items, an achievable target could be that it holds at least 95% of all the institution's publications. 95% also seems appropriate for repositories with an institutional mandate requiring full texts to be deposited. (For instance, the Electronics & Computer Science EPrints Service, University of Southampton, for which there is a full text mandate, has achieved over 90% compliance.) For a repository without a mandate that only holds full-text items, a practical internal target might be 70%, although this suggestion will need revising as case studies become available. In either case, it will take time to reach the target, so it may be helpful to set intermediate targets for various stages of the repository project. At the moment, we have insufficient data to be able recommend suitable intermediate targets, but they could perhaps be set pro rata according to the percentage of people who will have been reached by a scheduled advocacy programme - for instance.

Counting Authors

Another ideal target is for it to become the norm for every institutional author to deposit their material in the institutional repository. This is, of course, another way of saying that every publication should go in the archive. However, this slant offers an alternative metric for assessing success - the percentage of known or potential institutional authors who have actually deposited material in the repository. Two possible ways of calculating this are:

  1. Comparing a list of authors of deposited items against a comprehensive list of potential authors and determining the relevant percentage
  2. Counting both the number of potential authors and the number of different actual authors of deposited items, and again determining the percentage

With Option No.1, it might also be appropriate to incorporate weightings into the calculation, based on the positions of authors in the institutional hierarchy - giving senior staff more weight - on the arguable assumption that more senior positions equate to higher profile research.

First you have to list and/or count your potential authors, and this not necessarily an easy thing. Who are your authors? Most academic staff will be publishing, but not all. Postgraduate research students (PhD and Masters, but not taught Masters courses), Postdocs, and research associates, will almost certainly be publishing their work. Additionally, some non-academic professional and technical staff may also publish, and possibly even the odd technician, undergraduate or taught Masters student.

Because it is clearly not practical to assess the publishing status of each individual member of an institution, it is highly unlikely that you can compile an accurate list of potential authors, or determine a true figure for their number. This makes Option No.1 above less appealing. Option No.2, however, is more feasible because it is only necessary to arrive at a reasonably accurate estimate of the number of potential authors, based on certain categories of person. We suggest counting:

  • Academic staff - professors, lectures & readers, etc.
  • Research associates & postdocs
  • Research students - Doctoral and Masters, but not taught masters
  • Managers of non-academic departments

This list omits the groups who are infrequent authors, but hopefully these should be counter-balanced by those members of the included groups who do not publish.

Where can you get this information? Information on students will almost certainly have to be obtained from Registration, and you may be able to get staff information from Payroll. It possible that data protection issues may prevent these departments from providing you with lists of names. However, they ought to be able to provide you with total numbers of each category of person. It is difficult to think of an alternative data source for student information, but it may be possible to get staff information from staff lists on departmental websites, University Calendars, and the like.

Multiple-authorship is the main problem that can affect metrics based on authors. Because papers often have several authors, it is quite possible that a person who has never even looked at the institutional repository to nonetheless have material deposited in it (by one of their co-authors). Such an author would appear on the lists of both potential and actual depositing authors. Is this acceptable? You decide. This complication probably makes Option No.2 above even more attractive.

We are not aware of any published author-based compliance rates for UK or overseas repositories, but our feeling is that most UK repositories would currently yields low scores for this metric.

Deposition rates

At its simplest, deposition rate tends to be quoted as number of items deposited per year, although different time units could be used. This is not a statistic that is particularly relevant to compliance, but it can be important when specifying the resources needed to support the repository. Such estimates should be based on the potential deposition rate, not the actual rate, because the target is for 100% compliance. Any short term surplus of resources can be assigned to ingest and advocacy initiatives designed to achieve the desired target.

Turnround times

It is to everyone's benefit to get papers into a repository as quickly as possible. There are two starting points to bear in mind. The first is the date of publication (or possibly even acceptance for publication), in which case it is up to the author(s) to start the deposition process. Repository adminstrators can have little influence over this, other than continuing to plug the message to authors about the need to deposit. The second aspect is the time it takes from the start of the deposition process to the item becoming publicly accessible in the repository. This is influenced by adminstrators because of their involvement in mediatation or moderation. The duration of the following steps could be logged:

  • Time to acknowledge new submissions (prior to processing). This could be an immediate automated email reponse
  • Time to check copyright and formatting
  • Time to resolve copyright issues - where necessary
  • Time to process the submitted files (mediated services)
  • Time to approve the deposited item (self-archiving)

The sum of all these is the overall time to public accessibility. Service level agreements could be formulated for all stages, except for the resolution of copyright issues. These can take a long time to complete, if publishers are unresponsive to enquiries.

References

Joanne Yeomans (2006) CERN's Open Access E-print Coverage in 2006 : Three Quarters Full and Counting, High Energy Physics Libraries Webzine, March 2006, No.12
http://library.cern.ch/HEPLW/12/papers/2/, Accessed 19th November 2010