When I was a child, the walls of books in the adult section of our modest public library always filed me with unease and even dread. So many books that I would never read. So many books I suspected – even then – that were never read. I was under the impression that all the books were so old that the authors must all be dead. Unlike my refuge – the children’s section of the library, partitioned by a glass door set in a glass wall – this section of the library was dark and largely silent. The books were ghosts.
I am imagining a library that is made up of two distinct sections. These sections may be on separate floors. They may be in separate buildings. But these sections must be separated and distinct.
One of these sections would be ‘The Library of the Living’. It would be comprised of works by authors who still walked on the earth, somewhere, among us. The other section would be ‘The Library of the Dead’.
When an authors passes from the earthly realm, a librarian take their work from the Library of the Living and bring it, silently, to the Library of the Dead.
And at the end of this text was this:
“We don’t have much time, you know. We need to find the others. We need to find mentors. We need to be mentors. We don’t have much time.”
In the inbox of the email address associated with MPOW’s institutional repository are more than a dozen notifications that a faculty member has deposited their research work for inclusion. I should be happy about this. I should be delighted that a liaison librarian spoke highly enough of the importance of the institutional repository at a faculty departmental meeting and inspired a researcher to fill in a multitude of forms so their work can be made freely available to readers.
But I don’t feel good about this because a cursory look of what journals this faculty member has published suggests that we can include none of the material in our IR due to restrictive publisher terms.
Institutional repository managers are continuously looking for new ways to demonstrate the value of their repositories. One way to do this is to create a more inclusive repository that provides reliable information about the research output produced by faculty affiliated with the institution.
Bjork, K., Cummings-Sauls, R., & Otto, R. (2019). Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records. Journal of Librarianship and Scholarly Communication, 7(1). DOI: http://doi.org/10.7710/2162-3309.2220
I read the Opening Up… article with interest because a couple of years ago, when I was the liaison librarian for biology, I ran an informal pilot in which I tried to capture the corpus of the biology department. During this time, for those articles from publishers who did not allow publisher PDF versions of deposit and authors who were not interested in depositing a manuscript version, I published the metadata of these works instead.
But part way through this pilot, I abandoned the practice. I did so for a number of reasons. One reason was that the addition of their work to the Institutional Repository did not seem to prompt faculty to start depositing their research on their volition. This was not surprising as BePress doesn’t allow for the integration of author profiles directly into it’s platform (one must purchase a separate product for author profiles and the ability to generate RSS feeds at the author level). So I was not particularly disappointed with this result. While administrators are increasingly interested in demonstrating research outputs at the department and institutional level, you can still generalize faculty as more invested in subject-based repositories.
But during this trial I uncovered a more troubling reason that suggested that uploading citations might be problematic. I came to understand that most document harvesting protocols – such as OAI-PMH and OpenAIRE – do not provide any means by which one can differentiate between metadata-only records and full text records. Our library system harvests our IR and it assumes that every item in IR has a full-text object associated with it. Other services that harvest our IR do the same. To visit the IR is to expect the full text of a text.
But the reason that made me stop the experiment pretty much immediately was reading this little bit of hearsay on Twitter:
Google and Google Scholar are responsible for the vast majority of our IR’s traffic and use. In many disciplines the percentage of Green OA articles as a percentage of total faculty output is easily less than 25%. To publish citations when the fulltext of a pre-print manuscript is not made available to the librarian, is ultimately going to test whether Google Scholar really does have an full-text threshold. And then what do we do when we find our work suddenly gone from search results?
Yet, the motivation to try to capture the whole of a faculty’s work still remains. An institutional repository should be a reflection of all the research and creative work of the institution that hosts it.
But I think the investment in two separate products – a CRIS to capture the citations of a faculty’s research and creative output and an IR to capture the fulltext of the same, still seems a shame to pursue. Rather than invest a large sum of money for the quick win of a CRIS, we should invest those funds into an IR that can support data re-use, institutionally.
(What is the open version of the CRIS? To be honest, I don’t know this space very well. From what I know at the moment, I would suggest it might be the institutional repository + ORCiD and/or VIVO.)
I am imagining a scenario in which every article-level work that a faculty member of an institution has produced is captured in the institutional repository. Articles that are not allowed to be made open access are embargoed until they are in the public domain.
But to be honest, I’m a little spooked because I don’t see many other institutions engaging in this practice. Dark deposit does exist in the literature but it largely appears in the early years of the conversations around scholarly communications practice. The most widely cited article about the topic (from my reading not from a proper literature review), is this 2011 article called The importance of dark deposit from Stewart Sheiber. His blog is licensed as CC-BY, so I’m going to take advantage of this generosity and re-print the seven reasons why dark is better than missing:
Posterity: Repositories have a role in providing access to scholarly articles of course. But an important part of the purpose of a repository is to collect the research output of the institution as broadly as possible. Consider the mission of a university archives, well described in this Harvard statement: “The Harvard University Archives (HUA) supports the University’s dual mission of education and research by striving to preserve and provide access to Harvard’s historical records; to gather an accurate, authentic, and complete record of the life of the University; and to promote the highest standards of management for Harvard’s current records.” Although the role of the university archives and the repository are different, that part about “gather[ing] an accurate, authentic, and complete record of the life of the University” reflects this role of the repository as well.Since at any given time some of the articles that make up that output will not be distributable, the broadest collection requires some portion of the collection to be dark.
Change: The rights situation for any given article can change over time — especially over long time scales, librarian time scales — and having materials in the repository dark allows them to be distributed if and when the rights situation allows. An obvious case is articles under a publisher embargo. In that case, the date of the change is known, and repository software can typically handle the distributability change automatically. There are also changes that are more difficult to predict. For instance, if a publisher changes its distribution policies, or releases backfiles as part of a corporate change, this might allow distribution where not previously allowed. Having the materials dark means that the institution can take advantage of such changes in the rights situation without having to hunt down the articles at that (perhaps much) later date.
Preservation: Dark materials can still be preserved. Preservation of digital objects is by and large an unknown prospect, but one thing we know is that the more venues and methods available for preservation, the more likely the materials will be preserved. Repositories provide yet another venue for preservation of their contents, including the dark part.
Discoverability: Although the articles themselves can’t be distributed, their contents can be indexed to allow for the items in the repository to be more easily and accurately located. Articles deposited dark can be found based on searches that hit not only the title and abstract but the full text of the article. And it can be technologically possible to pass on this indexing power to other services indexing the repository, such as search engines.
Messaging: When repositories allow both open and dark materials, the message to faculty and researchers can be made very simple: Always deposit. Everything can go in; the distribution decision can be made separately. If authors have to worry about rights when making the decision whether to deposit in the first place, the cognitive load may well lead them to just not deposit. Since the hardest part about running a successful repository is getting a hold of the articles themselves, anything that lowers that load is a good thing. This point has been made forcefully by Stevan Harnad. It is much easier to get faculty in the habit of depositing everything than in the habit of depositing articles subject to the exigencies of their rights situations.
Availability: There are times when an author has distribution rights only to unavailable versions of an article. For instance, an author may have rights to distribute the author’s final manuscript, but not the publisher’s version. Or an art historian may not have cleared rights for online distribution of the figures in an article and may not be willing to distribute a redacted version of the article without the figures. The ability to deposit dark enables depositing in these cases too. The publisher’s version or unredacted version can be deposited dark.
Education: Every time an author deposits an article dark is a learning moment reminding the author that distribution is important and distribution limitations are problematic.
There is an additional reason for pursuing a change of practice to dark deposit that I believe is very significant:
There are at least six types of university OA policy. Here we orga-nize them by their methods for avoiding copyright troubles…
3. The policy seeks no rights at all, but requires deposit in the repository. If the institution already has permission to make the work OA, then it makes it OA from the moment of deposit. Otherwise the deposit will be “dark” (non-OA) (See p. 24) until the institution can obtain permission to make it OA. During the period of dark deposit, at least the metadata will be OA.
Having a more complete picture of how much an article has been cited by other articles is an immediate clear benefit of Open Citations. Right now you can get a piece of that via the above tools I’ve listed and, maybe, a piece is all you need. If you’ve got an article that’s been cited 100s of times, likely you aren’t going to look through each of those citing articles. However, if you’ve got an article or a work that only been cited a handful of times, likely you will be much more aware of what those citing articles are saying about your article and how they are using your information.
Regier takes Elsevier to task, because Elsevier is one of the few major publishers remaining that refuses to make their citations OA.
I4OC requests that all scholarly publishers make references openly available by providing access to the reference lists they submit to Crossref. At present, most of the large publishers—including the American Physical Society, Cambridge University Press, PLOS, SAGE, Springer Nature, and Wiley—have opened their reference lists. As a result, half of the references deposited in Crossref are now freely available. We urge all publishers who have not yet opened their reference lists to do so now. This includes the American Chemical Society, Elsevier, IEEE, and Wolters Kluwer Health. By far the largest number of closed references can be found in journals published by Elsevier: of the approximately half a billion closed references stored in Crossref, 65% are from Elsevier journals. Opening these references would place the proportion of open references at nearly 83%.
Furthermore, releasing citations as OA would enable them to be added to platforms such as Wikidata and available for visualization using the Scholia tool, pictured above.
So that’s where I’m at.
I want to change the practice at MPOW to include all published faculty research, scholarship, and creative work in the Institutional Repository and if we are unable to publish these works as open access in our IR, we will include it as embargoed, dark deposit until it is confidently in the public domain. I want the Institutional Repository to live up to its name and have all the published work of the Institution.
Is this a good idea, or no? Are there pitfalls that I have not foreseen? Is my reasoning shaky? Please let me know.
Introduction: Secret Feminist Agenda & Masters of Text
I am an academic librarian who has earned permanence – which is the word we use at the University of Windsor to describe the librarian-version of tenure. When I was hired, there was no explicit requirement for librarians to publish in peer-reviewed journals. Nowadays, newly hired librarians at my place of work have an expectation to create peer-reviewed scholarship, although the understanding of how much and what kinds of scholarship count has not been strictly defined.
On my official CV that I update and submit to my institution every year, these peer-reviewed articles are listed individually. Under, “Non-referred publications”, I have a single line for each of my blogs. And yet, I have done so much more writing on blogging platforms compared to my peer reviewed work (over 194K words from 2006-2016, alone). And my public writing has been shared, saved, and read many, many times over my peer-reviewed scholarship.
Now, as I have previously stated, I already have permanence. So why should I care if my blog writing counts in my work as an academic librarian?
That was my thinking, so I didn’t care. That is, until a couple of weeks ago when a podcast changed my mind.
McGregor’s podcast is part of a larger SSHRC-funded partnership called Spoken Web that “aims to develop a coordinated and collaborative approach to literary historical study, digital development, and critical and pedagogical engagement with diverse collections of spoken recordings from across Canada and beyond”.
What did we learn about scholarly podcasting… How and when and where we create new knowledge, that’s what we call scholarship, generally, right?
Secret Feminist Agenda, 3.26
Their conversation about what counts as scholarship and how it can be valued is a great listen. And it opened the possibility in my mind to consider this writing a form of creative, critical work.
While most of my public writing is explanatory or persuasive in nature, there is definitely a subset of my work that I would consider a form of creative practice. I know that these works are creative because when I sit down to write them, I don’t have an idea of the final form of the text until it is finished. I am compelled to work through ideas that I feel might have something to them, but the only way to tell is to get closer.
The second passage that struck me comes in at the 52:26 mark, when Hannah tells this story:
Hannah: I met a prof at the Modernist Studies Association Conference a few years ago who was telling me that he does a comic book podcast with a friend of his and they’ve been doing it for years and it has quite a popular following, and I was like, “Oh, awesome! Do you count that as your scholarly output?” and he said “No, I don’t need to. I have tenure.” And I was like, “Well, but, couldn’t you use tenure as a way to break space open for those who don’t but want to be doing that kind of work? Isn’t there another way to think about what it means to have security as a position from which you can radicalize?”, but that so often doesn’t seem to prove to be the case.
Ames: “Well, and now we’re back to that’s feminist thinking -what you said there and what that person is illustrating is not feminist thinking…”
Secret Feminist Agenda, 3.26
Oof. Hearing that bit was a bit of a gut-punch.
I can and will do better.
That being said, I’m not entirely sure how my corpus of public writing should be accounted for. Obviously, the volume of words produced is not an appropriate measure. Citation counts from scholarly works might be deemed a valuable measure as but many scholars deliberately exclude public writing from their bibliographies, I feel this metric systematically undervalues this type of writing. And while page views and social media counts should stand for something, I don’t think you can make the case that popularity is an equivalent of quality.
And here is the script and the slides that I presented:
Good afternoon. Thank you for the opportunity to introduce you to OpenRefine.
Even if you have already heard of OpenRefine before, I hope you will still find this session useful as I have tried to make a argument why librarians should invest investigate technologies like OpenRefine for Scholarly Research purposes.
This is talk has three parts.
I like to call OpenRefine the most popular library tool that you’ve never heard of.
After that this introduction, I hope that this statement will become just a little less true.
OpenRefine began as Google Refine and before that it was Freebase Gridworks, created by David François Huynh in January 2010 while he was working at Metaweb, the company that was responsible for Freebase.
In July 2010, Freebase and Freebase Gridworks were bought by Google adopted the technology and Freebase Gridworks was rebranded as Google Refine. While the code was always open source, Google supported the project until 2012. From that point on, the project became a community supported Open Source product and as such was renamed OpenRefine.
As an aside, Freebase was officially shut down by Google in 2016 and the data from that project was transferred to Wikidata, Wikimedia’s structured data project.
OpenRefine is software written in Java. The program is downloaded on to your computer and accessible through the browser. OpenRefine includes web server software and so it is not necessary to be connected to the internet in order to make use of OpenRefine.
At the top of the slide is a screen capture of what you first see when you start the program. The dark black window is what opens behind the scenes if you are interested in monitoring the various processes that you are putting OpenRefine through. And in the corner of the slide, you can see a typical view of OpenRefine with data in it.
OpenRefine has been described by its creator as “a power tool for working with messy data”
Wikipedia calls OpenRefine “a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling”.
OpenRefine is used to standardize and clean data across your file or spreadsheet.
That being said, once you know the power of OpenRefine, you will like myself see all these other potential uses for the tool outside of metadata cleanup. In August of this year, I read this tweet from fellow Canadian scholarly communications librarian, Ryan Reiger and sent some links that had instructions that illustrated how OpenRefine could help with this research question
When introducing new technology to others, it’s very important not to over sell it and to manage expectations.
But I’m not the only one who feels strongly about the power of OpenRefine. For good reasons, which we will explore in the second section of this talk
If you asked me what is the most popular technology used by librarians in their work and support of scholarship, I would say that one answer could be Microsoft Excel. Many librarians I know do their collections work, their electronic resources work, and their data work in Excel and they are very good at it.
But there are some very good reasons to reconsider using Excel for our work.
This slide outlines what I consider some of the strongest reasons to consider using OpenRefine. First, the software is able to handle more types of data than Excel can. Excel can handle rows of data. Open Refine can handle rows and records of data.
For many day to day uses of Excel it is unlikely you will run into the maximum capacity of the software but for those who work with large data sets, a limit of a 1 million and change rows might and can be a problem.
But the most important reason why we should consider OpenRefine is the same reason why it’s fundamentally different than Excel. Unlike spreadsheets software like Excel, no formulas are stored in the cells of OpenRefine
Instead formulas are used to transform the data and these formulas are tracked as scripts.
Not only do the cells of Excel contain formulas that transform the data presented in ways that are not always clear, Excel sometimes transforms your data without clearly demonstrating that it is doing so. According to a paper from 2016, one fifth of genomics journals had datasets with errors from Excel transforming gene names such as SEPT10 as dates.
I want to be clear, I am not saying that Excel is bad and people who use Excel are also bad.
We can all employ good data practices whether we use spreadsheets or data wrangling tools such as OpenRefine. I believe we have to meet people where they are with their data tool choices. This is part of the approach taken by the good people responsible for this series of lessons as Part of the Ecology Curriculum of Data Carpentry.
And with that, I just want to take the briefest of moments to thank the good people behind Software Carpentry and Data Carpentry – collectively now known as The Carpentries as I am pretty sure it was their work that introduced me to the world of OpenRefine
This slide is taken from the Library Carpentry OpenRefine lesson. There is too much text on the slide to read but the gist of the message is this: OpenRefine saves every change you make to your dataset and these changes are saved as scripts. After you clean up a set of messy data, you can export your script of the transformations you made and then you can apply that script to other similarly messy data sets
Not only does this ability saves the time of the wrangler, this ability to save scripts separately from the data itself lends itself to Reproducible Science.
Here is a screenshot of a script captured in OpenRefine in both English and in JSON.
It is difficult for me to express how important and how useful it is for OpenRefine to separate the work from data in this way.
This is the means by which librarians can share workflows outside of their organizations without worrying about accidentally sharing private data.
As more librarians start using more complicated data sets and data tools for supporting research and their own research, the more opportunities there will be for embodying, demonstrating, and teaching good data practices.
I remember the instance in which I personally benefited from someone sharing their work with OpenRefine. It was this blog post from Angela Galvan which walked me through the process of looking up a list of ISSNs and running that list through the Sherpa Romeo API and using this formula on the screen to quickly and clearly present whether a particular journal allowed publisher PDFs to be added to the institutional repository or not.
And with that, here’s a bit of tour of how libraries are using OpenRefine in their work
I haven’t spent much time highlighting it, but one of the most appreciated features of OpenRefine are related to its data visualizations that allow the wrangler to find differences that make a difference in the data.
The slide features two screens captures. In the lower screen, OpenRefine has used fuzzy matching algorithms to discover variations of entries that are statistically likely meant to be the same.
I had mentioned previously that I had used OpenRefine to use the Sherpa/Romero API. This ability of OpenRefine to allow users access to the API who may not be entirely comfortable using command-line scripting or programming should not be understated. That’s why lesson plans that use OpenRefine to perform such tasks as web scraping as pictured here, are appreciated.
With OpenRefine, libraries are finding ways to use reconciliation services for local projects. I am just going to read the last bit of the last line of this abstract for emphasis: a hack using OpenRefine yielded a 99.6% authority reconciliation and a stable process for monthly data verification. And as you now know, this was likely done through OpenRefine scripts.
OpenRefine has proved useful in preparing linked data…
And if staff feel more comfortable using spreadsheets, OpenRefine can be used to covert those spreadsheets into forms such as MODSXML
Knowing the history of OpenRefine, you might not be surprised to learn that it has built in capabilities to reconcile data to controlled vocabularies…
But you might be pleasantly surprised to learn that OpenRefine can reconcile data from VIVO, JournalTOC, VIAF, and FAST from OCLC.
But the data reconciliation service that I’m particular following is from Wikidata.
In this video example, John Little uses data from VIAF and Wikidata to gather authoritative versions of author names plus additional information including their place of birth.
I think it’s only appropriate that OpenRefine connects to Wikidata when you remember that both projects had their origins in the Freebase project.
Wikidata is worthy of its own talk – and maybe even its own conference – but since we are very close to the end of this presentation, let me introduce you to Wikidata as structured linked data that any one can use and improve.
I was introduced to the power of Wikidata and how its work could extend our library work from librarians such as Dan Scott and Stacy Allison Cassin. In this slide, you can see a screen capture from Dan’s presentation that highlights that the power of Wikidata is that it doesn’t just collect formal institutional identifiers such as from LC or ISNI but sources such as AllMusic.
And this is the example that I would like to end my presentation on. The combination of OpenRefine and Wikidata – working together – can allow the librarian not only to explore, clean up and normalize their data sets – OpenRefine has the ability to extend our data and to connect it to the world.
The trouble is mine: I am not interested in a measured account of the lives of coders in America. I think the status quo for computing is dismal.
The way that we require people to have to think like a computer in order to make use of computers is many things. It is de-humanizing. It is unnecessary hardship. It feels wrong.
This is why Bret Victor’s Inventing on Principle (2012) presentation was (and remains) so incredible to me. Victor sets out to demonstrate that creators need (computer) tools that provide them with the most direct and immediately responsive connection to their creation as possible:
If we look to the past, we can find alternative futures to computing that might have served us better, if we had only gone down a different path. Here’s Bret Victor’s The Future of Programming 1973 (2013) which you should watch a few minutes of, if just to appreciate his alternative to powerpoint:
At the around the 11 minute mark, Evans sets the scene for the first unveiling of Tim Berner Lee’s World Wide Web, and it’s a great story because when Lee first demonstrated his work at Hypertext ’91, the other attendees were not particularly impressed. Evans explains why.
So why am I telling you all about the history of computing on my library-themed blog? Well, one reason is that our profession has not done a great job of knowing our own (female) history of computing.
(There was now deleted post from a librarian blog from 2012 that comes to mind. I’m not entirely sure of the etiquette of quoting deleted posts, so I will paraphrase the post as the following text…)
Despite librarianship being a feminized and predominantly female profession, [author of aforementioned blog post] remarked that she was never introduced to the following women in library school despite their accomplishments: Suzanne Briet, Karen Spärck-Jones, Henriette Avram, and Elaine Svenonius. And if my memory can be trusted, I believe the same was true for myself.
Is there a connection between a more human(e) type of computing that Bret Victor advocates for with the computing innovations from women that Claire Evans wants to learn from and these lesser known women of librarianship and its adjacent in computing? I think there might be.
When most scientists were trying to make people use code to talk to computers, Karen Sparck Jones taught computers to understand human language instead.
In so doing, her technology established the basis of search engines like Google.
A self-taught programmer with a focus on natural language processing, and an advocate for women in the field, Sparck Jones also foreshadowed by decades Silicon Valley’s current reckoning, warning about the risks of technology being led by computer scientists who were not attuned to its social implications
“A lot of the stuff she was working on until five or 10 years ago seemed like mad nonsense, and now we take it for granted,” said John Tait, a longtime friend who works with the British Computer Society.
I have already given my fair share of future of the library talks already, so I think it is for the best if some one else takes up the challenge of looking to into the work of librarians past to see if we can’t refactor our present into a better future.
Libraries are haunted houses. As our patrons move through scenes and illusions that took years of labor to build and maintain, we workers are hidden, erasing ourselves in the hopes of providing a seamless user experience, in the hopes that these patrons will help defend Libraries against claims of death or obsolescence. However, ‘death of libraries’ arguments that equate death with irrelevance are fundamentally mistaken. If we imagine that a collective fear has come true and libraries are dead, it stands to reason that library workers are ghosts. Ghosts have considerable power and ubiquity in the popular imagination, making death a site of creative possibility. Using the scholarly lens of haunting, I argue that we can experience time creatively, better positioning ourselves to resist the demands of neoliberalism by imagining and enacting positive futurities.
I also think libraries can be described as haunted but for other reasons than Settoducato suggests. That doesn’t mean I think Settoducato is wrong or the article is bad. On the contrary – I found the article delightful and I learned a lot from it. For example, having not read Foucault myself, this was new to me:
In such examples, books are a necessary component of the aesthetic of librarianship, juxtaposing the material (books and physical space) with the immaterial (ghosts). Juxtaposition is central to Michel Foucault’s concept of heterotopias, places he describes as “capable of juxtaposing in a single real place several spaces, several sites that are in themselves incompatible” (1984, 6). Foucault identifies cemeteries, libraries, and museums among his examples of heterotopias, as they are linked by unique relationships to time and memory. Cemeteries juxtapose life and death, loss (of life) and creation (of monuments), history and modernity as their grounds become increasingly populated. Similarly, libraries and museums embody “a sort of perpetual and indefinite accumulation of time in an immobile place,” organizing and enclosing representations of memory and knowledge (Foucault 1984, 7).
There are other passages in Intersubjectivity that I think could be expanded upon. For example, while I completely agree with its expression that the labour of library staff is largely invisible, I believe that particular invisibility was prevalent long before neoliberalism. The librarian has been subservient to those who endow the books for hundreds of years.
Richard Bentley, for his part, continued to run into problems with libraries. Long after the quarrel of the ancients and moderns had fizzled, he installed a young cousin, Thomas Bentley, as keeper of the library of Cambridge’s Trinity College. At Richard’s urging, the young librarian followed the path of a professional, pursuing a doctoral degree and taking long trips to the Continent in search of new books for the library. The college officers, however, did not approve of his activities. The library had been endowed by Sir Edward Stanhope, whose own ideas about librarianship were decidedly more modest than those of the Bentleys. In 1728, a move was made to remove the younger Bentley, on the ground that his long absence, studying and acquiring books in Rome and elsewhere, among other things, disqualified him from the post. In his characteristically bullish fashion, Richard Bentley rode to his nephew’s defense. In a letter, he admits that “the keeper [has not] observed all the conditions expressed in Sir Edward Stanhope’s will,” which had imposed a strict definition of the role of librarian. Bentley enumerates Sir Edward’s stipulations, thereby illuminating the sorry state of librarianship in the eighteenth century. The librarian is not to teach or hold office in the college; he shall not be absent from his appointed place in the library more than forty days out of the year; he cannot hold a degree above that of master of arts; he is to watch each library reader, and never let one out of his sight
“He is to watch each library reader” is a key phrase here. From the beginning, librarians and library staff were installed as instruments of surveillance as a means to protect property.
Even to this day, I will hear of university departments who wish to make a collection of material available for the use of faculty and students and are so committed to this end that they will secure a room, which is no small feat on a campus nowadays. But then the faculty or students refuse to share their most precious works because they realize that their materials in an open and accessible room will be subject to theft or vandalism.
Same as it ever was.
Presently, a handful of municipal libraries in Denmark operate with open service models. These open libraries rely on the self-service of patrons and have no library staff present—loans, returns, admittance and departing the physical library space are regulated through automated access points. Many public library users are familiar with self-check out kiosks and access to the collections database through a personal computing station, but few patrons have ever been in a public library without librarians, staff workers or security personnel. Libraries that rely on self-service operation models represent a new kind of enclosed environment in societies of control. Such automated interior spaces correspond to a crisis in libraries and other institutions of memory like museums or archives. Under the guise of reform, longer service hours, and cost-saving measures, libraries with rationalized operating models conscript their users into a new kind of surveillance….
The open library disciplines and controls the user by eliminating the librarian, enrolling the user into a compulsory self-service to engage with the automated space. The power of this engagement is derived from a regime of panoptic access points that visualize, capture and document the user’s path and her ability to regulate herself during every movement and transition in the library—from entering, searching the catalog, browsing the web, borrowing information resources, to exiting the building.
Because of these technologies, many, many spaces are going to feel haunted. Not just libraries:
The other day, after watching Crimson Peak for the first time, I woke up with a fully-fleshed idea for a Gothic horror story about experience design. And while the story would take place in the past, it would really be about the future. Why? Because the future itself is Gothic.
First, what is Gothic? Gothic (or “the Gothic” if you’re in academia) is a Romantic mode of literature and art. It’s a backlash against the Enlightenment obsession with order and taxonomy. It’s a radical imposition of mystery on an increasingly mundane landscape. It’s the anticipatory dread of irrational behaviour in a seemingly rational world. But it’s also a mode that places significant weight on secrets — which, in an era of diminished privacy and ubiquitous surveillance, resonates ever more strongly….
… Consider the disappearance of the interface. As our devices become smaller and more intuitive, our need to see how they work in order to work them goes away. Buttons have transformed into icons, and icons into gestures. Soon gestures will likely transform into thoughts, with brainwave-triggers and implants quietly automating certain functions in the background of our lives. Once upon a time, we valued big hulking chunks of technology: rockets, cars, huge brushed-steel hi-fis set in ornate wood cabinets, thrumming computers whose output could heat an office, even odd little single-purpose kitchen widgets. Now what we want is to be Beauty in the Beast’s castle: making our wishes known to the household gods, and watching as the “automagic” takes care of us. From Siri to Cortana to Alexa, we are allowing our lives and livelihoods to become haunted by ghosts without shells.
How can we resist this future that is being made for us but not with us? One of my favourite passages of Intersubjectivty suggests a rich field of possibility that I can’t wait to explore further:
However, it does not have to be this way. David Mitchell and Sharon Snyder also take up the questions of embodiment and productivity, examining through a disability studies lens the ways in which disabled people have historically been positioned as outside the laboring masses due to their “non-productive bodies” (2010, 186). They posit that this distinction transforms as the landscape of labor shifts toward digital and immaterial outputs from work in virtual or remote contexts, establishing the disabled body as a site of radical possibility. Alison Kafer’s crip time is similarly engaged in radical re-imagining, challenging the ways in which “‘the future’ has been deployed in the service of compulsory able-bodiedness and able-mindedness” (2013, 26-27). That is, one’s ability to exist in the future, or live in a positive version of the future is informed by the precarity of their social position. The work of theorists like Mitchell, Snyder, and Kafer is significant because it insists on a future in which disabled people not only exist, but also thrive despite the pressures of capitalism.
In Harry Potter and the Deathly Hallows (sorry), Helga Hufflepuff’s goblet is stored in a vault at Gringotts that’s been cursed so that every time you touch one of the objects in it, dozens of copies are created. On the cover of the original U.K. edition of the book, Harry, Ron and Hermione are pictured scrambling atop a wave of facsimile’d treasure. I’ve started thinking about special collections digitization like this. Digitization doesn’t streamline or simplify library collections; rather, it multiplies them, as every interaction creates additional objects for curation and preservation
amount of data that can be conjured from any given thing is almost
limitless. Pick up a plain grey rock from the side of the road, and in
moments you can make a small dataset about it: size, weight, colour,
texture, shape, material. If you take that rock to a laboratory these
data can be made greatly more precise, and instrumentation beyond our
own human sensorium can add to the list of records: temperature,
chemical composition, carbon date. From here there is a kind of fractal
unfolding of information that begins to occur, where each of these
records in turn manifest their own data. The time at which the
measurement was made, the instrument used to record it, the person who
performed the task, the place where the analysis was performed. In turn,
each of these new meta-data records can carry its own data: the age of
the person who performed the task, the model of the instrument, the
temperature of the room. Data begets data, which begets meta data,
repeat, repeat, repeat. It’s data all the way down.
That which computation sets out to map and model it eventually takes over. Google sets out to index all human knowledge and becomes the source and the arbiter of that knowledge: it became what people think. Facebook set out to map the connections between people – the social graph – and became the platform for those connections, irrevocably reshaping societal relationships. Like an air control system mistaking a flock of birds for a fleet of bombers, software is unable to distinguish between the model of the world and reality – and, once conditioned, neither are we.
I am here to bring your attention to two developments that are making me worried:
The Social Graph of Scholarly Communications is becoming more tightly bound into institutional metrics that have an increasing influence on institutional funding
The publishers of the Social Graph of Scholarship are beginning to enclose the Social Graph, excluding the infrastructure of libraries and other independent, non-profit organizations
Normally, I would try to separate these ideas into two dedicated posts but in this case, I want to bring them together in writing because if these two trends converge together, things will become very bad, very quickly.
Let me start with the first trend:
1. The social graph that binds
When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:
a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)
a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)
a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)
a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)
(My apologies for not sharing the text that goes with the slides. Since January of this year, I have been the Head of the Information Services Department at my place of work. In addition to this responsibility, much of my time this year has been spent covering the work of colleagues currently on leave. Finding time to write has been a challenge.)
In Ontario, each institution of higher education must submit a ‘Strategic Mandate Agreement‘ with its largest funding body, the provincial government. Universities are currently in the second iteration of these types of agreements and are preparing for the third round. These agreements are considered fraught by many, including Marc Spooner, a professor in the faculty of education at the University of Regina, who wrote the following in an opinion piece in University Affairs:
The agreement is designed to collect quantitative information grouped under the following broad themes: a) student experience; b) innovation in teaching and learning excellence; c) access and equity; d) research excellence and impact; and e) innovation, economic development and community engagement. The collection of system-wide data is not a bad idea on its own. For example, looking at metrics like student retention data between years one and two, proportion of expenditures on student services, graduation rates, data on the number and proportion of Indigenous students, first-generation students and students with disabilities, and graduate employment rates, all can be helpful.
Where the plan goes off-track is with the system-wide metrics used to assess research excellence and impact: 1) Tri-council funding (total and share by council); 2) number of papers (total and per full-time faculty); and 3) number of citations (total and per paper). A tabulation of our worth as scholars is simply not possible through narrowly conceived, quantified metrics that merely total up research grants, peer-reviewed publications and citations. Such an approach perversely de-incentivises time-consuming research, community-based research, Indigenous research, innovative lines of inquiry and alternative forms of scholarship. It effectively displaces research that “matters” with research that “counts” and puts a premium on doing simply what counts as fast as possible…
Even more alarming – and what is hardly being discussed – is how these damaging and limited terms of reference will be amplified when the agreement enters its third phase, SMA3, from 2020 to 2023. In this third phase, the actual funding allotments to universities will be tied to their performance on the agreement’s extremely deficient metrics.
The measure by which citation counts for each institution are going to be assessed have already been decided. The Ontario government has already stated that it is going to use Elsevier’s Scopus (although I presume they really meant SciVal).
What could possibly go wrong? To answer that question, let’s look at the second trend: enclosure.
2. Enclosing the social graph
The law locks up the man or woman Who steals the goose from off the common But leaves the greater villain loose Who steals the common from off the goose.
As someone who spends a great deal of time ensuring that the scholarship of the University of Windsor’s Institutional Repository meets the stringent restrictions set by publishers, it’s hard not to feel a slap in the face when reading Springer Nature Syndicates Content to ResearchGate.
ResearchGate has been accused of “massive infringement of peer-reviewed, published journal articles.”
They say that the networking site is illegally obtaining and distributing research papers protected by copyright law. They also suggest that the site is deliberately tricking researchers into uploading protected content.
It is not uncommon to find selective enforcement of copyright within the scholarly communication landscape. Publishers have cast a blind eye to the copyright infringement of ResearchGate and Academia.edu for years, while targeting course reserve systems set up by libraries.
Any commercial system that is part of the scholarly communication workflow can be acquired for strategic purposes.
One of the least understood and thus least appreciated functions of calibre is that it uses the Open Publication Distribution System (OPDS) standard (opds-spec.org) to allow one to easily share e-books (at least those without Digital Rights Management software installed) to e-readers on the same local network. For example, on my iPod Touch, I have the e-reader program Stanza (itunes.apple.com/us/app/stanza/id284956128) installed and from it, I can access the calibre library catalogue on my laptop from within my house, since both are on the same local WiFi network. And so can anyone else in my family from their own mobile device. It’s worth noting that Stanza was bought by Amazon in 2011 and according to those who follow the digital e-reader market, it appears that Amazon may have done so solely for the purpose of stunting its development and sunsetting the software (Hoffelder,2013)
And sometimes companies acquire products to provide a tightly integrated suite of services and seamless workflow.
If individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?
And, indeed, whatever model the university may select, if individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab? So, the university’s efforts to ensure a more competitive overall marketplace through componentization may ultimately serve only to marginalize it.
The repository must be registered in the Directory of Open Access Repositories (OpenDOAR) or in the process of being registered.
In addition, the following criteria for repositories are required:
Automated manuscript ingest facility
Full text stored in XML in JATS standard (or equivalent)
Quality assured metadata in standard interoperable format, including information on the DOI of the original publication, on the version deposited (AAM/VoR), on the open access status and the license of the deposited version. The metadata must fulfil the same quality criteria as Open Access journals and platforms (see above). In particular, metadata must include complete and reliable information on funding provided by cOAlition S funders. OpenAIRE compliance is strongly recommended.
Open API to allow others (including machines) to access the content
QA process to integrate full text with core abstract and indexing services (for example PubMed)
Automated manuscript ingest facility probably gives me the most pause. Automated means a direct pipeline from publisher to institutional repository that could be based on a publishers’ interpretation of fair use/fair dealing and we don’t know what the ramifications of that decision making might be. I’m feeling trepidation because I believe we are already experiencing the effects of a tighter integration between manuscript services and the IR.
Many publishers – including Wiley, Taylor and Francis, IEEE, and IOP – already use a third party manuscript service called ScholarOne. ScholarOne integrates the iThenticate service which produces reports of what percentage of a manuscript has already been published. Journal editors have the option to set what extent a paper can make use of a researcher’s prior work, including their thesis. Manuscripts that exceed these set thresholds can be automatically rejected without human interjection from the editor. We are only just starting to understand how this workflow is going to impact the willingness of young scholars to make their theses and dissertations open access.
It is also worth noting that ScholarOne is owned by Clarivate Analytics, the parent company of Web of Science, Incites, Journal Citation Reports, and others. One on hand, having a non-publisher act as a third party to the publishing process is probably ideal since it reduces the chances of a conflict of interest. On the other hand, I’m very unhappy with Clarivate Analytics’s product called Kopernio which provides “fast, one-click access to millions of research papers” and “integrates with Web of Science, Google Scholar, PubMed” and 20,000 other sites” (including ResearchGate and Academia.edu natch). There are prominent links to Kopernio within Web of Science that essentially positions the product as a direct competitor to a university library’s link resolver service and in doing so, removes the library from the scholarly workflow – other than the fact that the library pays for the product’s placement.
The winner takes it all
The genius — sometimes deliberate, sometimes accidental — of the enterprises now on such a steep ascent is that they have found their way through the looking-glass and emerged as something else. Their models are no longer models. The search engine is no longer a model of human knowledge, it is human knowledge. What began as a mapping of human meaning now defines human meaning, and has begun to control, rather than simply catalog or index, human thought. No one is at the controls. If enough drivers subscribe to a real-time map, traffic is controlled, with no central model except the traffic itself. The successful social network is no longer a model of the social graph, it is the social graph. This is why it is a winner-take-all game.
It appears that I haven’t written a single post on this blog since July of 2018. Perhaps it is all the talk of resolutions around me but I sincerely would like to write more in this space in 2019. And the best way to do that is to just start.
This week on Function, we take a look at the rising labor movement in tech by hearing from those whose advocacy was instrumental in setting the foundation for what we see today around the dissent from tech workers.
Anil talks to Leigh Honeywell, CEO and founder of Tall Poppy and creator of the Never Again pledge, about how her early work, along with others, helped galvanize tech workers to connect the dots between different issues in tech.
I thought I was familiar with most of Leigh’s work but I realized that wasn’t the case because somehow her involvement with the Never Again pledge escaped my attention.
Here’s the pledge’s Introduction:
We, the undersigned, are employees of tech organizations and companies based in the United States. We are engineers, designers, business executives, and others whose jobs include managing or processing data about people. We are choosing to stand in solidarity with Muslim Americans, immigrants, and all people whose lives and livelihoods are threatened by the incoming administration’s proposed data collection policies. We refuse to build a database of people based on their Constitutionally-protected religious beliefs. We refuse to facilitate mass deportations of people the government believes to be undesirable.
We have educated ourselves on the history of threats like these, and on the roles that technology and technologists played in carrying them out. We see how IBM collaborated to digitize and streamline the Holocaust, contributing to the deaths of six million Jews and millions of others. We recall the internment of Japanese Americans during the Second World War. We recognize that mass deportations precipitated the very atrocity the word genocide was created to describe: the murder of 1.5 million Armenians in Turkey. We acknowledge that genocides are not merely a relic of the distant past—among others, Tutsi Rwandans and Bosnian Muslims have been victims in our lifetimes.
Today we stand together to say: not on our watch, and never again.
The episode reminded me that while I am not an employee in the United States who is directly complicit with the facilitation of deportation, as a Canadian academic librarian, I am not entirely free from some degree of complicity as I am employed at a University that subscribes to WESTLAW .
The Intercept is reporting on Thomson Reuters response to Privacy International’s letter to TRI CEO Jim Smith expressing the watchdog group’s “concern” over the company’s involvement with ICE. According to The Intercept article “Thomson Reuters Special Services sells ICE ‘a continuous monitoring and alert service that provides real-time jail booking data to support the identification and location of aliens’ as part of a $6.7 million contract, and West Publishing, another subsidiary, provides ICE’s “Detention Compliance and Removals” office with access to a vast license-plate scanning database, along with agency access to the Consolidated Lead Evaluation and Reporting, or CLEAR, system.” The two contracts together are worth $26 million. The article observes that “the company is ready to defend at lease one of those contracts while remaining silent on the rest.”
I also work at a library that subscribes to products that are provided by Elsevier and whose parent company is the RELX Group.
In 2015, Reed Elsevier rebranded itself as RELX and moved further away from traditional academic and professional publishing. This year , the company purchased ThreatMetrix, a cybersecurity company that specializes in tracking and authenticating people’s online activities, which even tech reporters saw as a notable departure from the company’s prior academic publishing role.
In some libraries, there are sometimes particular collections in which the objects are organized by the order in which they were acquired (at my place of work, our relatively small collection of movies on DVD are ordered this way). This practice makes it easy for a person to quickly see what has been most recently been received or what’s been newly published. Such collections are easy to start and maintain as you just have to sort them by ‘acquisition number’.
But you would be hard pressed to find a good reason to organize a large amount of material this way. Eventually a collection grows too large to browse in its entirety and you have people telling you that they would rather browse the collection by author name, or by publication year, or by subject. But to allow for this means organizing the collection and let me tell you my non-library staff friends, such organization is a lot of bother — it takes time, thought and consistent diligence.
Which is why we are where we are with today’s state of the web.
Early homepages were like little libraries…
A well-organized homepage was a sign of personal and professional pride — even if it was nothing but a collection of fun gifs, or instructions on how to make the best potato guns, or homebrew research on gerbil genetics.
Dates didn’t matter all that much. Content lasted longer; there was less of it. Older content remained in view, too, because the dominant metaphor was table of contents rather than diary entry.
Everyone with a homepage became a de facto amateur reference librarian.
Movable Type didn’t just kill off blog customization.
It (and its competitors) actively killed other forms of web production.
Non-diarists — those folks with the old school librarian-style homepages — wanted those super-cool sidebar calendars just like the bloggers did. They were lured by the siren of easy use. So despite the fact that they weren’t writing daily diaries, they invested time and effort into migrating to this new platform.
They soon learned the chronostream was a decent servant, but a terrible master.
We no longer build sites. We generate streams.
All because building and maintaining a library is hard work.