Considering dark deposit

I have a slight feeling of dread.

In the inbox of the email address associated with MPOW’s institutional repository are more than a dozen notifications that a faculty member has deposited their research work for inclusion. I should be happy about this. I should be delighted that a liaison librarian spoke highly enough of the importance of the institutional repository at a faculty departmental meeting and inspired a researcher to fill in a multitude of forms so their work can be made freely available to readers.

But I don’t feel good about this because a cursory look of what journals this faculty member has published suggests that we can include none of the material in our IR due to restrictive publisher terms.

This is not a post about the larger challenges of Open Access in the current scholarly landscape. This post is a consideration of a change of practice regarding IR deposit, partly inspired by the article, Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records.

Institutional repository managers are continuously looking for new ways to demonstrate the value of their repositories. One way to do this is to create a more inclusive repository that provides reliable information about the research output produced by faculty affiliated with the institution.

Bjork, K., Cummings-Sauls, R., & Otto, R. (2019). Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records. Journal of Librarianship and Scholarly Communication, 7(1). DOI: http://doi.org/10.7710/2162-3309.2220

I read the Opening Up… article with interest because a couple of years ago, when I was the liaison librarian for biology, I ran an informal pilot in which I tried to capture the corpus of the biology department. During this time, for those articles from publishers who did not allow publisher PDF versions of deposit and authors who were not interested in depositing a manuscript version, I published the metadata of these works instead.

But part way through this pilot, I abandoned the practice. I did so for a number of reasons. One reason was that the addition of their work to the Institutional Repository did not seem to prompt faculty to start depositing their research on their volition. This was not surprising as BePress doesn’t allow for the integration of author profiles directly into it’s platform (one must purchase a separate product for author profiles and the ability to generate RSS feeds at the author level). So I was not particularly disappointed with this result. While administrators are increasingly interested in demonstrating research outputs at the department and institutional level, you can still generalize faculty as more invested in subject-based repositories.

But during this trial I uncovered a more troubling reason that suggested that uploading citations might be problematic. I came to understand that most document harvesting protocols – such as OAI-PMH and OpenAIRE – do not provide any means by which one can differentiate between metadata-only records and full text records. Our library system harvests our IR and it assumes that every item in IR has a full-text object associated with it. Other services that harvest our IR do the same. To visit the IR is to expect the full text of a text.

But the reason that made me stop the experiment pretty much immediately was reading this little bit of hearsay on Twitter:

Google and Google Scholar are responsible for the vast majority of our IR’s traffic and use. In many disciplines the percentage of Green OA articles as a percentage of total faculty output is easily less than 25%. To publish citations when the fulltext of a pre-print manuscript is not made available to the librarian, is ultimately going to test whether Google Scholar really does have an full-text threshold. And then what do we do when we find our work suddenly gone from search results?

Yet, the motivation to try to capture the whole of a faculty’s work still remains. An institutional repository should be a reflection of all the research and creative work of the institution that hosts it.

If an IR is not able to do this work, an institution is more likely to invest in a CRIS – a Current Research Information System – to represent the research outputs of the organization.

Remember when I wrote this in my post from March of this year?

When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:

– a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)

– a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)

– a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)

– a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)

If the map becomes the territory than we will be lost

The marketplace for CRIS is no different:

But I think the investment in two separate products – a CRIS to capture the citations of a faculty’s research and creative output and an IR to capture the fulltext of the same, still seems a shame to pursue. Rather than invest a large sum of money for the quick win of a CRIS, we should invest those funds into an IR that can support data re-use, institutionally.

(What is the open version of the CRIS? To be honest, I don’t know this space very well. From what I know at the moment, I would suggest it might be the institutional repository + ORCiD and/or VIVO.)

I am imagining a scenario in which every article-level work that a faculty member of an institution has produced is captured in the institutional repository. Articles that are not allowed to be made open access are embargoed until they are in the public domain.

But to be honest, I’m a little spooked because I don’t see many other institutions engaging in this practice. Dark deposit does exist in the literature but it largely appears in the early years of the conversations around scholarly communications practice. The most widely cited article about the topic (from my reading not from a proper literature review), is this 2011 article called The importance of dark deposit from Stewart Sheiber. His blog is licensed as CC-BY, so I’m going to take advantage of this generosity and re-print the seven reasons why dark is better than missing:

  1. Posterity: Repositories have a role in providing access to scholarly articles of course. But an important part of the purpose of a repository is to collect the research output of the institution as broadly as possible. Consider the mission of a university archives, well described in this Harvard statement: “The Harvard University Archives (HUA) supports the University’s dual mission of education and research by striving to preserve and provide access to Harvard’s historical records; to gather an accurate, authentic, and complete record of the life of the University; and to promote the highest standards of management for Harvard’s current records.” Although the role of the university archives and the repository are different, that part about “gather[ing] an accurate, authentic, and complete record of the life of the University” reflects this role of the repository as well.Since at any given time some of the articles that make up that output will not be distributable, the broadest collection requires some portion of the collection to be dark.
  2. Change: The rights situation for any given article can change over time — especially over long time scales, librarian time scales — and having materials in the repository dark allows them to be distributed if and when the rights situation allows. An obvious case is articles under a publisher embargo. In that case, the date of the change is known, and repository software can typically handle the distributability change automatically. There are also changes that are more difficult to predict. For instance, if a publisher changes its distribution policies, or releases backfiles as part of a corporate change, this might allow distribution where not previously allowed. Having the materials dark means that the institution can take advantage of such changes in the rights situation without having to hunt down the articles at that (perhaps much) later date.
  3. Preservation: Dark materials can still be preserved. Preservation of digital objects is by and large an unknown prospect, but one thing we know is that the more venues and methods available for preservation, the more likely the materials will be preserved. Repositories provide yet another venue for preservation of their contents, including the dark part.
  4. Discoverability: Although the articles themselves can’t be distributed, their contents can be indexed to allow for the items in the repository to be more easily and accurately located. Articles deposited dark can be found based on searches that hit not only the title and abstract but the full text of the article. And it can be technologically possible to pass on this indexing power to other services indexing the repository, such as search engines.
  5. Messaging: When repositories allow both open and dark materials, the message to faculty and researchers can be made very simple: Always deposit. Everything can go in; the distribution decision can be made separately. If authors have to worry about rights when making the decision whether to deposit in the first place, the cognitive load may well lead them to just not deposit. Since the hardest part about running a successful repository is getting a hold of the articles themselves, anything that lowers that load is a good thing. This point has been made forcefully by Stevan Harnad. It is much easier to get faculty in the habit of depositing everything than in the habit of depositing articles subject to the exigencies of their rights situations.
  6. Availability: There are times when an author has distribution rights only to unavailable versions of an article. For instance, an author may have rights to distribute the author’s final manuscript, but not the publisher’s version. Or an art historian may not have cleared rights for online distribution of the figures in an article and may not be willing to distribute a redacted version of the article without the figures. The ability to deposit dark enables depositing in these cases too. The publisher’s version or unredacted version can be deposited dark.
  7. Education: Every time an author deposits an article dark is a learning moment reminding the author that distribution is important and distribution limitations are problematic.

There is an additional reason for pursuing a change of practice to dark deposit that I believe is very significant:

There are at least six types of university OA policy. Here we orga-nize them by their methods for avoiding copyright troubles…

3. The policy seeks no rights at all, but requires deposit in the repository. If the institution already has permission to make the work OA, then it makes it OA from the moment of deposit. Otherwise the deposit will be “dark” (non-OA) (See p. 24) until the institution can obtain permission to make it OA. During the period of dark deposit, at least the metadata will be OA.

Good Practices For University Open-Access Policies, Stuart Shieber and Peter Suber, 2013

At least the metadata will be OA is a very good reason to do dark deposit. It might be reason enough. I share many of Ryan Regier’s enthusiasm for Open Citations that he explains in his post, The longer Elsevier refuses to make their citations open, the clearer it becomes that their high profit model makes them anti-open

Having a more complete picture of how much an article has been cited by other articles is an immediate clear benefit of Open Citations. Right now you can get a piece of that via the above tools I’ve listed and, maybe, a piece is all you need. If you’ve got an article that’s been cited 100s of times, likely you aren’t going to look through each of those citing articles. However, if you’ve got an article or a work that only been cited a handful of times, likely you will be much more aware of what those citing articles are saying about your article and how they are using your information.

Ryan Regier,The longer Elsevier refuses to make their citations open, the clearer it becomes that their high profit model makes them anti-open

Regier takes Elsevier to task, because Elsevier is one of the few major publishers remaining that refuses to make their citations OA.

I4OC requests that all scholarly publishers make references openly available by providing access to the reference lists they submit to Crossref. At present, most of the large publishers—including the American Physical Society, Cambridge University Press, PLOS, SAGE, Springer Nature, and Wiley—have opened their reference lists. As a result, half of the references deposited in Crossref are now freely available. We urge all publishers who have not yet opened their reference lists to do so now. This includes the American Chemical Society, Elsevier, IEEE, and Wolters Kluwer Health. By far the largest number of closed references can be found in journals published by Elsevier: of the approximately half a billion closed references stored in Crossref, 65% are from Elsevier journals. Opening these references would place the proportion of open references at nearly 83%.

Open citations: A letter from the scientometric community to scholarly publishers

There would be so much value unleashed if we could release the citations to our faculty’s research as open access.

Open Citations could lead to new ways of exploring and understanding the scholarly ecosystem. Some of these potential tools were explored by Aaron Tay in his post, More about open citations — Citation Gecko, Citation extraction from PDF & LOC-DB.

Furthermore, releasing citations as OA would enable them to be added to platforms such as Wikidata and available for visualization using the Scholia tool, pictured above.

So that’s where I’m at.

I want to change the practice at MPOW to include all published faculty research, scholarship, and creative work in the Institutional Repository and if we are unable to publish these works as open access in our IR, we will include it as embargoed, dark deposit until it is confidently in the public domain. I want the Institutional Repository to live up to its name and have all the published work of the Institution.

Is this a good idea, or no? Are there pitfalls that I have not foreseen? Is my reasoning shaky? Please let me know.

One thought on “Considering dark deposit”

  1. Hi

    Some comments

    “But during this trial I uncovered a more troubling reason that suggested that uploading citations might be problematic. I came to understand that most document harvesting protocols – such as OAI-PMH and OpenAIRE – do not provide any means by which one can differentiate between metadata-only records and full text records. Our library system harvests our IR and it assumes that every item in IR has a full-text object associated with it. Other services that harvest our IR do the same. To visit the IR is to expect the full text of a text.”

    This little fact stunned me too when I learnt of it. The workaround these days is to put something in the license field and you can easily do it for your discovery service, but there are different standards on what different aggregators look for. Thankfully, GS and JISC COre harvesters are smarter and actually crawl for PDF.

    “I am imagining a scenario in which every article-level work that a faculty member of an institution has produced is captured in the institutional repository. Articles that are not allowed to be made open access are embargoed until they are in the public domain.”

    which version are we talking that you propose to keep in the dark archive? The final published version when only the AAM is allowed? Or in cases where nothing is allowed to be embargoed? The main problem I see is that while there are some journals with embargos there are some that prevent archiving in perpetuity.

    “At least the metadata will be OA is a very good reason to do dark deposit. It might be reason enough. ”
    I might be misunderstanding you, in most cases, even if publishers do not allow self archiving, they normally provide metadata of the article via Crossref, which I think is open enough (and sure some have not so good quality).

    Sure, some like Elsevier don’t provide open references data, but I’m not sure how putting the metadata of a paper in your IR helps, unless you are talking about including metadata of the references as well? There’s a project that tries to encourage this https://opencitations.wordpress.com/2019/02/07/crowdsourcing-open-citations-with-croci/

Leave a Reply

Your email address will not be published. Required fields are marked *