The Library of the Living and the Library of the Dead

I am in the process of re-organizing my Google Drive and in doing so, I stumbled upon a bit of writing from 2013 that would have been a perfect addition to the post Haunted libraries, invisible labour, and the librarian as an instrument of surveillance which I wrote earlier this year:

When I was a child, the walls of books in the adult section of our modest public library always filed me with unease and even dread. So many books that I would never read. So many books I suspected – even then – that were never read. I was under the impression that all the books were so old that the authors must all be dead. Unlike my refuge – the children’s section of the library, partitioned by a glass door set in a glass wall – this section of the library was dark and largely silent. The books were ghosts.

I am imagining a library that is made up of two distinct sections. These sections may be on separate floors. They may be in separate buildings. But these sections must be separated and distinct.

One of these sections would be ‘The Library of the Living’. It would be comprised of works by authors who still walked on the earth, somewhere, among us. The other section would be ‘The Library of the Dead’.

When an authors passes from the earthly realm, a librarian take their work from the Library of the Living and bring it, silently, to the Library of the Dead.

And at the end of this text was this:

“We don’t have much time, you know. We need to find the others. We need to find mentors. We need to be mentors. We don’t have much time.”

Considering dark deposit

I have a slight feeling of dread.

In the inbox of the email address associated with MPOW’s institutional repository are more than a dozen notifications that a faculty member has deposited their research work for inclusion. I should be happy about this. I should be delighted that a liaison librarian spoke highly enough of the importance of the institutional repository at a faculty departmental meeting and inspired a researcher to fill in a multitude of forms so their work can be made freely available to readers.

But I don’t feel good about this because a cursory look of what journals this faculty member has published suggests that we can include none of the material in our IR due to restrictive publisher terms.

This is not a post about the larger challenges of Open Access in the current scholarly landscape. This post is a consideration of a change of practice regarding IR deposit, partly inspired by the article, Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records.

Institutional repository managers are continuously looking for new ways to demonstrate the value of their repositories. One way to do this is to create a more inclusive repository that provides reliable information about the research output produced by faculty affiliated with the institution.

Bjork, K., Cummings-Sauls, R., & Otto, R. (2019). Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records. Journal of Librarianship and Scholarly Communication, 7(1). DOI: http://doi.org/10.7710/2162-3309.2220

I read the Opening Up… article with interest because a couple of years ago, when I was the liaison librarian for biology, I ran an informal pilot in which I tried to capture the corpus of the biology department. During this time, for those articles from publishers who did not allow publisher PDF versions of deposit and authors who were not interested in depositing a manuscript version, I published the metadata of these works instead.

But part way through this pilot, I abandoned the practice. I did so for a number of reasons. One reason was that the addition of their work to the Institutional Repository did not seem to prompt faculty to start depositing their research on their volition. This was not surprising as BePress doesn’t allow for the integration of author profiles directly into it’s platform (one must purchase a separate product for author profiles and the ability to generate RSS feeds at the author level). So I was not particularly disappointed with this result. While administrators are increasingly interested in demonstrating research outputs at the department and institutional level, you can still generalize faculty as more invested in subject-based repositories.

But during this trial I uncovered a more troubling reason that suggested that uploading citations might be problematic. I came to understand that most document harvesting protocols – such as OAI-PMH and OpenAIRE – do not provide any means by which one can differentiate between metadata-only records and full text records. Our library system harvests our IR and it assumes that every item in IR has a full-text object associated with it. Other services that harvest our IR do the same. To visit the IR is to expect the full text of a text.

But the reason that made me stop the experiment pretty much immediately was reading this little bit of hearsay on Twitter:

Google and Google Scholar are responsible for the vast majority of our IR’s traffic and use. In many disciplines the percentage of Green OA articles as a percentage of total faculty output is easily less than 25%. To publish citations when the fulltext of a pre-print manuscript is not made available to the librarian, is ultimately going to test whether Google Scholar really does have an full-text threshold. And then what do we do when we find our work suddenly gone from search results?

Yet, the motivation to try to capture the whole of a faculty’s work still remains. An institutional repository should be a reflection of all the research and creative work of the institution that hosts it.

If an IR is not able to do this work, an institution is more likely to invest in a CRIS – a Current Research Information System – to represent the research outputs of the organization.

Remember when I wrote this in my post from March of this year?

When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:

– a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)

– a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)

– a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)

– a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)

If the map becomes the territory than we will be lost

The marketplace for CRIS is no different:

But I think the investment in two separate products – a CRIS to capture the citations of a faculty’s research and creative output and an IR to capture the fulltext of the same, still seems a shame to pursue. Rather than invest a large sum of money for the quick win of a CRIS, we should invest those funds into an IR that can support data re-use, institutionally.

(What is the open version of the CRIS? To be honest, I don’t know this space very well. From what I know at the moment, I would suggest it might be the institutional repository + ORCiD and/or VIVO.)

I am imagining a scenario in which every article-level work that a faculty member of an institution has produced is captured in the institutional repository. Articles that are not allowed to be made open access are embargoed until they are in the public domain.

But to be honest, I’m a little spooked because I don’t see many other institutions engaging in this practice. Dark deposit does exist in the literature but it largely appears in the early years of the conversations around scholarly communications practice. The most widely cited article about the topic (from my reading not from a proper literature review), is this 2011 article called The importance of dark deposit from Stewart Sheiber. His blog is licensed as CC-BY, so I’m going to take advantage of this generosity and re-print the seven reasons why dark is better than missing:

  1. Posterity: Repositories have a role in providing access to scholarly articles of course. But an important part of the purpose of a repository is to collect the research output of the institution as broadly as possible. Consider the mission of a university archives, well described in this Harvard statement: “The Harvard University Archives (HUA) supports the University’s dual mission of education and research by striving to preserve and provide access to Harvard’s historical records; to gather an accurate, authentic, and complete record of the life of the University; and to promote the highest standards of management for Harvard’s current records.” Although the role of the university archives and the repository are different, that part about “gather[ing] an accurate, authentic, and complete record of the life of the University” reflects this role of the repository as well.Since at any given time some of the articles that make up that output will not be distributable, the broadest collection requires some portion of the collection to be dark.
  2. Change: The rights situation for any given article can change over time — especially over long time scales, librarian time scales — and having materials in the repository dark allows them to be distributed if and when the rights situation allows. An obvious case is articles under a publisher embargo. In that case, the date of the change is known, and repository software can typically handle the distributability change automatically. There are also changes that are more difficult to predict. For instance, if a publisher changes its distribution policies, or releases backfiles as part of a corporate change, this might allow distribution where not previously allowed. Having the materials dark means that the institution can take advantage of such changes in the rights situation without having to hunt down the articles at that (perhaps much) later date.
  3. Preservation: Dark materials can still be preserved. Preservation of digital objects is by and large an unknown prospect, but one thing we know is that the more venues and methods available for preservation, the more likely the materials will be preserved. Repositories provide yet another venue for preservation of their contents, including the dark part.
  4. Discoverability: Although the articles themselves can’t be distributed, their contents can be indexed to allow for the items in the repository to be more easily and accurately located. Articles deposited dark can be found based on searches that hit not only the title and abstract but the full text of the article. And it can be technologically possible to pass on this indexing power to other services indexing the repository, such as search engines.
  5. Messaging: When repositories allow both open and dark materials, the message to faculty and researchers can be made very simple: Always deposit. Everything can go in; the distribution decision can be made separately. If authors have to worry about rights when making the decision whether to deposit in the first place, the cognitive load may well lead them to just not deposit. Since the hardest part about running a successful repository is getting a hold of the articles themselves, anything that lowers that load is a good thing. This point has been made forcefully by Stevan Harnad. It is much easier to get faculty in the habit of depositing everything than in the habit of depositing articles subject to the exigencies of their rights situations.
  6. Availability: There are times when an author has distribution rights only to unavailable versions of an article. For instance, an author may have rights to distribute the author’s final manuscript, but not the publisher’s version. Or an art historian may not have cleared rights for online distribution of the figures in an article and may not be willing to distribute a redacted version of the article without the figures. The ability to deposit dark enables depositing in these cases too. The publisher’s version or unredacted version can be deposited dark.
  7. Education: Every time an author deposits an article dark is a learning moment reminding the author that distribution is important and distribution limitations are problematic.

There is an additional reason for pursuing a change of practice to dark deposit that I believe is very significant:

There are at least six types of university OA policy. Here we orga-nize them by their methods for avoiding copyright troubles…

3. The policy seeks no rights at all, but requires deposit in the repository. If the institution already has permission to make the work OA, then it makes it OA from the moment of deposit. Otherwise the deposit will be “dark” (non-OA) (See p. 24) until the institution can obtain permission to make it OA. During the period of dark deposit, at least the metadata will be OA.

Good Practices For University Open-Access Policies, Stuart Shieber and Peter Suber, 2013

At least the metadata will be OA is a very good reason to do dark deposit. It might be reason enough. I share many of Ryan Regier’s enthusiasm for Open Citations that he explains in his post, The longer Elsevier refuses to make their citations open, the clearer it becomes that their high profit model makes them anti-open

Having a more complete picture of how much an article has been cited by other articles is an immediate clear benefit of Open Citations. Right now you can get a piece of that via the above tools I’ve listed and, maybe, a piece is all you need. If you’ve got an article that’s been cited 100s of times, likely you aren’t going to look through each of those citing articles. However, if you’ve got an article or a work that only been cited a handful of times, likely you will be much more aware of what those citing articles are saying about your article and how they are using your information.

Ryan Regier,The longer Elsevier refuses to make their citations open, the clearer it becomes that their high profit model makes them anti-open

Regier takes Elsevier to task, because Elsevier is one of the few major publishers remaining that refuses to make their citations OA.

I4OC requests that all scholarly publishers make references openly available by providing access to the reference lists they submit to Crossref. At present, most of the large publishers—including the American Physical Society, Cambridge University Press, PLOS, SAGE, Springer Nature, and Wiley—have opened their reference lists. As a result, half of the references deposited in Crossref are now freely available. We urge all publishers who have not yet opened their reference lists to do so now. This includes the American Chemical Society, Elsevier, IEEE, and Wolters Kluwer Health. By far the largest number of closed references can be found in journals published by Elsevier: of the approximately half a billion closed references stored in Crossref, 65% are from Elsevier journals. Opening these references would place the proportion of open references at nearly 83%.

Open citations: A letter from the scientometric community to scholarly publishers

There would be so much value unleashed if we could release the citations to our faculty’s research as open access.

Open Citations could lead to new ways of exploring and understanding the scholarly ecosystem. Some of these potential tools were explored by Aaron Tay in his post, More about open citations — Citation Gecko, Citation extraction from PDF & LOC-DB.

Furthermore, releasing citations as OA would enable them to be added to platforms such as Wikidata and available for visualization using the Scholia tool, pictured above.

So that’s where I’m at.

I want to change the practice at MPOW to include all published faculty research, scholarship, and creative work in the Institutional Repository and if we are unable to publish these works as open access in our IR, we will include it as embargoed, dark deposit until it is confidently in the public domain. I want the Institutional Repository to live up to its name and have all the published work of the Institution.

Is this a good idea, or no? Are there pitfalls that I have not foreseen? Is my reasoning shaky? Please let me know.

Making blog posts count as part of a not-so-secret feminist agenda

Introduction:
Secret Feminist Agenda & Masters of Text

I am an academic librarian who has earned permanence – which is the word we use at the University of Windsor to describe the librarian-version of tenure. When I was hired, there was no explicit requirement for librarians to publish in peer-reviewed journals. Nowadays, newly hired librarians at my place of work have an expectation to create peer-reviewed scholarship, although the understanding of how much and what kinds of scholarship count has not been strictly defined.

While I have written a few peer reviewed articles, most of my writing for librarians has been on this particular blog (since 2016) and for ten years prior to that at New Jack Librarian (with one article, in between blogs, hosted on Medium).

On my official CV that I update and submit to my institution every year, these peer-reviewed articles are listed individually. Under, “Non-referred publications”, I have a single line for each of my blogs. And yet, I have done so much more writing on blogging platforms compared to my peer reviewed work (over 194K words from 2006-2016, alone). And my public writing has been shared, saved, and read many, many times over my peer-reviewed scholarship.

Now, as I have previously stated, I already have permanence. So why should I care if my blog writing counts in my work as an academic librarian?

That was my thinking, so I didn’t care. That is, until a couple of weeks ago when a podcast changed my mind.

That podcast was Hannah McGregor’s Secret Feminist Agenda.

Secret Feminist Agenda is a weekly podcast about the insidious, nefarious, insurgent, and mundane ways we enact our feminism in our daily lives.

About, Secret Feminist Agenda

McGregor’s podcast is part of a larger SSHRC-funded partnership called Spoken Web that “aims to develop a coordinated and collaborative approach to literary historical study, digital development, and critical and pedagogical engagement with diverse collections of spoken recordings from across Canada and beyond”.

The episode that changed my mind was 3-26 which is dedicated largely a conversation of McGregor with another academic and podcaster, Ames Hawkins. In it, there were two particular moments that reconfigured my thinking.

The value of creative practice

The first was when their conversation turned to scholarly creative practice (around the 16 minute mark):

What did we learn about scholarly podcasting… How and when and where we create new knowledge, that’s what we call scholarship, generally, right?

Secret Feminist Agenda, 3.26

Their conversation about what counts as scholarship and how it can be valued is a great listen. And it opened the possibility in my mind to consider this writing a form of creative, critical work.

While most of my public writing is explanatory or persuasive in nature, there is definitely a subset of my work that I would consider a form of creative practice. I know that these works are creative because when I sit down to write them, I don’t have an idea of the final form of the text until it is finished. I am compelled to work through ideas that I feel might have something to them, but the only way to tell is to get closer.

These more creative writings tend to be my least-popular works that are never shared by others on social media. Examples include How Should Reality Be (my own version of Reality Hunger) and G H O S T S T O R I E S.

And yet, those writings were necessary precursors to later works that were built from those first iterations and have ended up being well-received: Libraries are for use. And by use, I mean copying and Haunted libraries, invisible labour, and the librarian as an instrument of surveillance. These second iterations are more formal but not works of formal scholarship. I still think they fall under the category of creative, critical work.

Both writing and conversation can act as a practice to discover and uncover ideas in a way that feels very different than the staking of intellectual territory and making claims, like so much scholarship.

Using tenure to break space open

The second passage that struck me comes in at the 52:26 mark, when Hannah tells this story:

Hannah: I met a prof at the Modernist Studies Association Conference a few years ago who was telling me that he does a comic book podcast with a friend of his and they’ve been doing it for years and it has quite a popular following, and I was like, “Oh, awesome! Do you count that as your scholarly output?” and he said “No, I don’t need to. I have tenure.” And I was like, “Well, but, couldn’t you use tenure as a way to break space open for those who don’t but want to be doing that kind of work? Isn’t there another way to think about what it means to have security as a position from which you can radicalize?”, but that so often doesn’t seem to prove to be the case.

Ames: “Well, and now we’re back to that’s feminist thinking -what you said there and what that person is illustrating is not feminist thinking…”

Secret Feminist Agenda, 3.26

Oof. Hearing that bit was a bit of a gut-punch.

I can and will do better.

That being said, I’m not entirely sure how my corpus of public writing should be accounted for. Obviously, the volume of words produced is not an appropriate measure. Citation counts from scholarly works might be deemed a valuable measure as but many scholars deliberately exclude public writing from their bibliographies, I feel this metric systematically undervalues this type of writing. And while page views and social media counts should stand for something, I don’t think you can make the case that popularity is an equivalent of quality.

Secret Feminist Agenda goes through its own form of peer review.

I would love to see something similar for library blogs such as my own. There is work that needs to be done.

Open Refine for Librarians

On October 24th, 2018, I gave a half-hour online presentation as part of a virtual conference from NISO called That Cutting Edge: Technology’s Impact on Scholarly Research Processes in the Library.

My presentation was called:

And here is the script and the slides that I presented:

Good afternoon. Thank you for the opportunity to introduce you to OpenRefine.

Even if you have already heard of OpenRefine before, I hope you will still find this session useful as I have tried to make a argument why librarians should invest investigate technologies like OpenRefine for Scholarly Research purposes.

This is talk has three parts.

I like to call OpenRefine the most popular library tool that you’ve never heard of.

After that this introduction, I hope that this statement will become just a little less true.

You can download OpenRefine from its official website OpenRefine.org

OpenRefine began as Google Refine and before that it was Freebase Gridworks, created by David François Huynh in January 2010 while he was working at Metaweb, the company that was responsible for Freebase.

In July 2010, Freebase and Freebase Gridworks were bought by Google adopted the technology and Freebase Gridworks was rebranded as Google Refine. While the code was always open source, Google supported the project until 2012. From that point on, the project became a community supported Open Source product and as such was renamed OpenRefine.

As an aside, Freebase was officially shut down by Google in 2016 and the data from that project was transferred to Wikidata, Wikimedia’s structured data project.  


OpenRefine is software written in Java. The program is downloaded on to your computer and accessible through the browser. OpenRefine includes web server software and so it is not necessary to be connected to the internet in order to make use of OpenRefine.

At the top of the slide is a screen capture of what you first see when you start the program. The dark black window is what opens behind the scenes if you are interested in monitoring the various processes that you are putting OpenRefine through. And in the corner of the slide, you can see a typical view of OpenRefine with data in it.

OpenRefine has been described by its creator as “a power tool for working with messy data”

Wikipedia calls OpenRefine “a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling”.

OpenRefine is used to standardize and clean data across your file or spreadsheet.

This slide is a screenshot from the Library Carpentry set of lessons on Open Refine. I like how they introduce you to the software by explaining some common scenarios in which you might use the software.

These scenarios include

  • When you want to know how many times a particular value (name, publisher, subject) appears in a column in your data
  • When you want to know how values are distributed across your whole data set
  • When you have a list of dates which are formatted in different ways, and want to change all the dates in the list to a single common date format

The software developers who maintain and extend the powers of Open Refine run regular surveys to learn more about their fellow users and in 2018 they found that their largest community was from libraries – a group that did not even register as its own category in the original 2012 survey.

So we know that the largest OpenRefine user group are librarians. Do we know how OpenRefine use measures up *within* the population of librarians?  Unfortunately we don’t.

While we don’t expect such a specialized tool to be widely used across all the different types of librarians and library work, we have seen from this recent survey of metadata managers of OCLC Research Library Partners, OpenRefine is the second most popular tool used, second to MarcEdit.

That being said, once you know the power of OpenRefine, you will like myself see all these other potential uses for the tool outside of metadata cleanup. In August of this year, I read this tweet from fellow Canadian scholarly communications librarian, Ryan Reiger and sent some links that had instructions that illustrated how OpenRefine could help with this research question


When introducing new technology to others, it’s very important not to over sell it and to manage expectations.

LINK

But I’m not the only one who feels strongly about the power of OpenRefine. For good reasons, which we will explore in the second section of this talk

If you asked me what is the most popular technology used by librarians in their work and support of scholarship, I would say that one answer could be Microsoft Excel. Many librarians I know do their collections work, their electronic resources work, and their data work in Excel and they are very good at it.

But there are some very good reasons to reconsider using Excel for our work.

This slide outlines what I consider some of the strongest reasons to consider using OpenRefine. First, the software is able to handle more types of data than Excel can. Excel can handle rows of data. Open Refine can handle rows and records of data.

For many day to day uses of Excel it is unlikely you will run into the maximum capacity of the software but for those who work with large data sets, a limit of a 1 million and change rows might and can be a problem.

But the most important reason why we should consider OpenRefine is the same reason why it’s fundamentally different than Excel. Unlike spreadsheets software like Excel, no formulas are stored in the cells of OpenRefine

Instead formulas are used to transform the data and these formulas are tracked as scripts.

Don’t use Excel / Genome Biology

Not only do the cells of Excel contain formulas that transform the data presented in ways that are not always clear, Excel sometimes transforms your data without clearly demonstrating that it is doing so. According to a paper from 2016, one fifth of genomics journals had datasets with errors from Excel transforming gene names such as SEPT10 as dates.

I want to be clear, I am not saying that Excel is bad and people who use Excel are also bad.

We can all employ good data practices whether we use spreadsheets or data wrangling tools such as OpenRefine. I believe we have to meet people where they are with their data tool choices. This is part of the approach taken by the good people responsible for this series of lessons as Part of the Ecology Curriculum of Data Carpentry.

And with that, I just want to take the briefest of moments to thank the good people behind Software Carpentry and Data Carpentry – collectively now known as The Carpentries as I am pretty sure it was their work that introduced me to the world of OpenRefine

This slide is taken from the Library Carpentry OpenRefine lesson. There is too much text on the slide to read but the gist of the message is this: OpenRefine saves every change you make to your dataset and these changes are saved as scripts. After you clean up a set of messy data, you can export your script of the transformations you made and then you can apply that script to other similarly messy data sets

Not only does this ability saves the time of the wrangler, this ability to save scripts separately from the data itself lends itself to Reproducible Science.

Here is a screenshot of a script captured in OpenRefine in both English and in JSON.

It is difficult for me to express how important and how useful it is for OpenRefine to separate the work from data in this way.  

This is the means by which librarians can share workflows outside of their organizations without worrying about accidentally sharing private data.

link

As more librarians start using more complicated data sets and data tools for supporting research and their own research, the more opportunities there will be for embodying,  demonstrating, and teaching good data practices.

I remember the instance in which I personally benefited from someone sharing their work with OpenRefine. It was this blog post from Angela Galvan which walked me through the process of looking up a list of ISSNs and running that list through the Sherpa Romeo API and using this formula on the screen to quickly and clearly present whether a particular journal allowed publisher PDFs to be added to the institutional repository or not.

And with that, here’s a bit of tour of how libraries are using OpenRefine in their work

This is from a walkthrough by Owen Stephens in which he used both MarcEdit and OpenRefine to find errors in 50,000 bibliographic records and make the corrections necessary so that they would be able to be loaded into a new library management system.

I haven’t spent much time highlighting it, but one of the most appreciated features of OpenRefine are related to its data visualizations that allow the wrangler to find differences that make a difference in the data.

The slide features two screens captures. In the lower screen, OpenRefine has used fuzzy matching algorithms to discover variations of entries that are statistically likely meant to be the same.

link

I had mentioned previously that I had used OpenRefine to use the Sherpa/Romero API. This ability of OpenRefine to allow users access to the API who may not be entirely comfortable using command-line scripting or programming should not be understated. That’s why lesson plans that use OpenRefine to perform such tasks as web scraping as pictured here, are appreciated.

link

With OpenRefine, libraries are finding ways to use reconciliation services for local projects. I am just going to read the last bit of the last line of this abstract for emphasis: a hack using OpenRefine yielded a 99.6% authority reconciliation and a stable process for monthly data verification. And as you now know, this was likely done through OpenRefine scripts.

OpenRefine has proved useful in preparing linked data…

link

And if staff feel more comfortable using spreadsheets, OpenRefine can be used to covert those spreadsheets into forms such as MODSXML

link

Knowing the history of OpenRefine, you might not be surprised to learn that it has built in capabilities to reconcile data to controlled vocabularies…

link

But you might be pleasantly surprised to learn that OpenRefine can reconcile data from VIVO, JournalTOC, VIAF, and FAST from OCLC.

link

But the data reconciliation service that I’m particular following is from Wikidata.

In this video example, John Little uses data from VIAF and Wikidata to gather authoritative versions of author names plus additional information including their place of birth.

I think it’s only appropriate that OpenRefine connects to Wikidata when you remember that both projects had their origins in the Freebase project.

link

Wikidata is worthy of its own talk – and maybe even its own conference – but since we are very close to the end of this presentation, let me introduce you to Wikidata as structured linked data that any one can use and improve.

link

I was introduced to the power of Wikidata and how its work could extend our library work from librarians such as Dan Scott and Stacy Allison Cassin. In this slide, you can see a screen capture from Dan’s presentation that highlights that the power of Wikidata is that it doesn’t just collect formal institutional identifiers such as from LC or ISNI but sources such as AllMusic.

link

And this is the example that I would like to end my presentation on. The combination of OpenRefine and Wikidata – working together – can allow the librarian not only to explore, clean up and normalize their data sets – OpenRefine has the ability to extend our data and to connect it to the world.

It really is magic.

Back to the Future of Libraries

I am in the process of reading Clive Thompson’s Coders: The Making of a New Tribe and the Remaking of the World and I have to say that I am, so far, disappointed with the book. I am a fan of Thompson’s technology journalism and I really enjoyed his earlier work, Smarter than you think: How Technology is Changing Our Minds for the Better, so I thought I would be a good reader and order the book as soon as it came out from my local. And it’s not a bad book. The book does what it says its going to do on the tin: it is a book about the tribe of coders.

The trouble is mine: I am not interested in a measured account of the lives of coders in America. I think the status quo for computing is dismal.

The way that we require people to have to think like a computer in order to make use of computers is many things. It is de-humanizing. It is unnecessary hardship. It feels wrong.

This is why Bret Victor’s Inventing on Principle (2012) presentation was (and remains) so incredible to me. Victor sets out to demonstrate that creators need (computer) tools that provide them with the most direct and immediately responsive connection to their creation as possible:

If we look to the past, we can find alternative futures to computing that might have served us better, if we had only gone down a different path. Here’s Bret Victor’s The Future of Programming 1973 (2013) which you should watch a few minutes of, if just to appreciate his alternative to powerpoint:

Here’s another video that looks to the past to see what other futures had been forsaken that is definitely worth your time. It is of and from Claire Evans – author of Broad Band: The Untold Story of the Women Who Made the Internet – who spoke at XOXO 2018.

At the around the 11 minute mark, Evans sets the scene for the first unveiling of Tim Berner Lee’s World Wide Web, and it’s a great story because when Lee first demonstrated his work at Hypertext ’91, the other attendees were not particularly impressed. Evans explains why.

So why am I telling you all about the history of computing on my library-themed blog? Well, one reason is that our profession has not done a great job of knowing our own (female) history of computing.

Case in point: until this twitter exchange, I had not had the pleasure of knowing Linda Smith or her work:

(There was now deleted post from a librarian blog from 2012 that comes to mind. I’m not entirely sure of the etiquette of quoting deleted posts, so I will paraphrase the post as the following text…)

Despite librarianship being a feminized and predominantly female profession, [author of aforementioned blog post] remarked that she was never introduced to the following women in library school despite their accomplishments: Suzanne Briet, Karen Spärck-Jones, Henriette Avram, and Elaine Svenonius. And if my memory can be trusted, I believe the same was true for myself.

Is there a connection between a more human(e) type of computing that Bret Victor advocates for with the computing innovations from women that Claire Evans wants to learn from and these lesser known women of librarianship and its adjacent in computing? I think there might be.

From the Overlooked series of obituaries from The New York Times for
Karen Spärck-Jones

When most scientists were trying to make people use code to talk to computers, Karen Sparck Jones taught computers to understand human language instead.

In so doing, her technology established the basis of search engines like Google.

A self-taught programmer with a focus on natural language processing, and an advocate for women in the field, Sparck Jones also foreshadowed by decades Silicon Valley’s current reckoning, warning about the risks of technology being led by computer scientists who were not attuned to its social implications

“A lot of the stuff she was working on until five or 10 years ago seemed like mad nonsense, and now we take it for granted,” said John Tait, a longtime friend who works with the British Computer Society.

Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines” by Nellie Bowles, The New York Times, Jan. 2, 2019

I have already given my fair share of future of the library talks already, so I think it is for the best if some one else takes up the challenge of looking to into the work of librarians past to see if we can’t refactor our present into a better future.

Haunted libraries, invisible labour, and the librarian as an instrument of surveillance

This post was inspired by the article Intersubjectivity and Ghostly Library Labor by Liz Settoducato which was published earlier this month on In the library with the lead pipe. The article, in brief:

Libraries are haunted houses. As our patrons move through scenes and illusions that took years of labor to build and maintain, we workers are hidden, erasing ourselves in the hopes of providing a seamless user experience, in the hopes that these patrons will help defend Libraries against claims of death or obsolescence. However, ‘death of libraries’ arguments that equate death with irrelevance are fundamentally mistaken. If we imagine that a collective fear has come true and libraries are dead, it stands to reason that library workers are ghosts. Ghosts have considerable power and ubiquity in the popular imagination, making death a site of creative possibility. Using the scholarly lens of haunting, I argue that we can experience time creatively, better positioning ourselves to resist the demands of neoliberalism by imagining and enacting positive futurities.

Intersubjectivity and Ghostly Library Labor by Liz Settoducato, In the library with the lead pipe

I also think libraries can be described as haunted but for other reasons than Settoducato suggests. That doesn’t mean I think Settoducato is wrong or the article is bad. On the contrary – I found the article delightful and I learned a lot from it. For example, having not read Foucault myself, this was new to me:

In such examples, books are a necessary component of the aesthetic of librarianship, juxtaposing the material (books and physical space) with the immaterial (ghosts). Juxtaposition is central to Michel Foucault’s concept of heterotopias, places he describes as “capable of juxtaposing in a single real place several spaces, several sites that are in themselves incompatible” (1984, 6). Foucault identifies cemeteries, libraries, and museums among his examples of heterotopias, as they are linked by unique relationships to time and memory. Cemeteries juxtapose life and death, loss (of life) and creation (of monuments), history and modernity as their grounds become increasingly populated. Similarly, libraries and museums embody “a sort of perpetual and indefinite accumulation of time in an immobile place,” organizing and enclosing representations of memory and knowledge (Foucault 1984, 7).

Intersubjectivity and Ghostly Library Labor by Liz Settoducato, In the library with the lead pipe

That passage felt true to me. As I once confessed to an avocado on the Internet…

There are other passages in Intersubjectivity that I think could be expanded upon. For example, while I completely agree with its expression that the labour of library staff is largely invisible, I believe that particular invisibility was prevalent long before neoliberalism. The librarian has been subservient to those who endow the books for hundreds of years.

Richard Bentley, for his part, continued to run into problems with libraries. Long after the quarrel of the ancients and moderns had fizzled, he installed a young cousin, Thomas Bentley, as keeper of the library of Cambridge’s Trinity College. At Richard’s urging, the young librarian followed the path of a professional, pursuing a doctoral degree and taking long trips to the Continent in search of new books for the library. The college officers, however, did not approve of his activities. The library had been endowed by Sir Edward Stanhope, whose own ideas about librarianship were decidedly more modest than those of the Bentleys. In 1728, a move was made to remove the younger Bentley, on the ground that his long absence, studying and acquiring books in Rome and elsewhere, among other things, disqualified him from the post. In his characteristically bullish fashion, Richard Bentley rode to his nephew’s defense. In a letter, he admits that “the keeper [has not] observed all the conditions expressed in Sir Edward Stanhope’s will,” which had imposed a strict definition of the role of librarian. Bentley enumerates Sir Edward’s stipulations, thereby illuminating the sorry state of librarianship in the eighteenth century. The librarian is not to teach or hold office in the college; he shall not be absent from his appointed place in the library more than forty days out of the year; he cannot hold a degree above that of master of arts; he is to watch each library reader, and never let one out of his sight

Library: An Unquiet History by Matthew Battles

He is to watch each library reader” is a key phrase here. From the beginning, librarians and library staff were installed as instruments of surveillance as a means to protect property.

Even to this day, I will hear of university departments who wish to make a collection of material available for the use of faculty and students and are so committed to this end that they will secure a room, which is no small feat on a campus nowadays. But then the faculty or students refuse to share their most precious works because they realize that their materials in an open and accessible room will be subject to theft or vandalism.

Same as it ever was.

” Social security cards unlock the library’s door. ” image from Amelia Acker.

Presently, a handful of municipal libraries in Denmark operate with open service models. These open libraries rely on the self-service of patrons and have no library staff present—loans, returns, admittance and departing the physical library space are regulated through automated access points. Many public library users are familiar with self-check out kiosks and access to the collections database through a personal computing station, but few patrons have ever been in a public library without librarians, staff workers or security personnel. Libraries that rely on self-service operation models represent a new kind of enclosed environment in societies of control. Such automated interior spaces correspond to a crisis in libraries and other institutions of memory like museums or archives. Under the guise of reform, longer service hours, and cost-saving measures, libraries with rationalized operating models conscript their users into a new kind of surveillance….

The open library disciplines and controls the user by eliminating the librarian, enrolling the user into a compulsory self-service to engage with the automated space. The power of this engagement is derived from a regime of panoptic access points that visualize, capture and document the user’s path and her ability to regulate herself during every movement and transition in the library—from entering, searching the catalog, browsing the web, borrowing information resources, to exiting the building.

Soft Discipline and Open Libraries in Denmark, Amelia Acker. Posted on Saturday, November 3, 2012, at 5:00 pm.

That was written in 2012.

The tools of monitoring and affecting space have widely proliferated in the ‘smart home’ category since then. We have services such as AirBnB that allows all manner of spaces to made available to others. We have technologies such as Nest that act as combination thermostats, smoke detectors, and security systems that are learning systems as they use AI to discover patterns of use not readily apparent to the human mind. And then we have the spooky and unpredictable spaces where these technologies interact with each other.

Because of these technologies, many, many spaces are going to feel haunted. Not just libraries:

The other day, after watching Crimson Peak for the first time, I woke up with a fully-fleshed idea for a Gothic horror story about experience design. And while the story would take place in the past, it would really be about the future. Why? Because the future itself is Gothic.

First, what is Gothic? Gothic (or “the Gothic” if you’re in academia) is a Romantic mode of literature and art. It’s a backlash against the Enlightenment obsession with order and taxonomy. It’s a radical imposition of mystery on an increasingly mundane landscape. It’s the anticipatory dread of irrational behaviour in a seemingly rational world. But it’s also a mode that places significant weight on secrets — which, in an era of diminished privacy and ubiquitous surveillance, resonates ever more strongly….

… Consider the disappearance of the interface. As our devices become smaller and more intuitive, our need to see how they work in order to work them goes away. Buttons have transformed into icons, and icons into gestures. Soon gestures will likely transform into thoughts, with brainwave-triggers and implants quietly automating certain functions in the background of our lives. Once upon a time, we valued big hulking chunks of technology: rockets, cars, huge brushed-steel hi-fis set in ornate wood cabinets, thrumming computers whose output could heat an office, even odd little single-purpose kitchen widgets. Now what we want is to be Beauty in the Beast’s castle: making our wishes known to the household gods, and watching as the “automagic” takes care of us. From Siri to Cortana to Alexa, we are allowing our lives and livelihoods to become haunted by ghosts without shells.

Our Gothic Future, Madeline Ashby, February 25, 2016.

How can we resist this future that is being made for us but not with us? One of my favourite passages of Intersubjectivty suggests a rich field of possibility that I can’t wait to explore further:

However, it does not have to be this way. David Mitchell and Sharon Snyder also take up the questions of embodiment and productivity, examining through a disability studies lens the ways in which disabled people have historically been positioned as outside the laboring masses due to their “non-productive bodies” (2010, 186). They posit that this distinction transforms as the landscape of labor shifts toward digital and immaterial outputs from work in virtual or remote contexts, establishing the disabled body as a site of radical possibility. Alison Kafer’s crip time is similarly engaged in radical re-imagining, challenging the ways in which “‘the future’ has been deployed in the service of compulsory able-bodiedness and able-mindedness” (2013, 26-27). That is, one’s ability to exist in the future, or live in a positive version of the future is informed by the precarity of their social position. The work of theorists like Mitchell, Snyder, and Kafer is significant because it insists on a future in which disabled people not only exist, but also thrive despite the pressures of capitalism.

Intersubjectivity and Ghostly Library Labor by Liz Settoducato, In the library with the lead pipe

[An aside: a research library filled with non-productive objects can also be seen to resist capitalism. ]

In conclusion, I would like to answer this dear student who asked this important question:

The answer is: yes.
The library staff are the ghosts in the machine.

Digitization is a multiplier and metadata is a fractal

In Harry Potter and the Deathly Hallows (sorry), Helga Hufflepuff’s goblet is stored in a vault at Gringotts that’s been cursed so that every time you touch one of the objects in it, dozens of copies are created. On the cover of the original U.K. edition of the book, Harry, Ron and Hermione are pictured scrambling atop a wave of facsimile’d treasure. I’ve started thinking about special collections digitization like this. Digitization doesn’t streamline or simplify library collections; rather, it multiplies them, as every interaction creates additional objects for curation and preservation

The above is from Harry Potter and the Responsible Version Control of Digital Surrogates and it is one of the few examples that I know of that uses the Harry Potter and the… trope appropriately. It is a post written by Emma Stanford, Digital Curator at the Bodleian Libraries from some months past but it came to my mind this week after reading this from Jep Thorpe‘s newsletter a couple of days ago:

The amount of data that can be conjured from any given thing is almost limitless. Pick up a plain grey rock from the side of the road, and in moments you can make a small dataset about it: size, weight, colour, texture, shape, material. If you take that rock to a laboratory these data can be made greatly more precise, and instrumentation beyond our own human sensorium can add to the list of records: temperature, chemical composition, carbon date. From here there is a kind of fractal unfolding of information that begins to occur, where each of these records in turn manifest their own data. The time at which the measurement was made, the instrument used to record it, the person who performed the task, the place where the analysis was performed. In turn, each of these new meta-data records can carry its own data: the age of the person who performed the task, the model of the instrument, the temperature of the room. Data begets data, which begets meta data, repeat, repeat, repeat. It’s data all the way down.

We use computers because they are supposed to make our lives more efficient but at every layer that they are applied they introduce complexity. This is one of the takeaways that I gained from reading the “Designing Freedom” Massey Lectures from cyberneticist Stafford Beer.

The book is very interesting but also a somewhat frustrating read and so if you are interested in learning more, I’d suggest this podcast episode dedicated to the book from the cybernetic marxists of General Intellect Unit.

Yes. There is now a podcast episode for everything.

If the map becomes the territory then we will be lost

That which computation sets out to map and model it eventually takes over. Google sets out to index all human knowledge and becomes the source and the arbiter of that knowledge: it became what people think. Facebook set out to map the connections between people – the social graph – and became the platform for those connections, irrevocably reshaping societal relationships. Like an air control system mistaking a flock of birds for a fleet of bombers, software is unable to distinguish between the model of the world and reality – and, once conditioned, neither are we.

James Bridle, New Dark Age, p.39.

I am here to bring your attention to two developments that are making me worried:

  1. The Social Graph of Scholarly Communications is becoming more tightly bound into institutional metrics that have an increasing influence on institutional funding
  2. The publishers of the Social Graph of Scholarship are beginning to enclose the Social Graph, excluding the infrastructure of libraries and other independent, non-profit organizations

Normally, I would try to separate these ideas into two dedicated posts but in this case, I want to bring them together in writing because if these two trends converge together, things will become very bad, very quickly.

Let me start with the first trend:

1. The social graph that binds

When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:

  1. a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)
  2. a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)
  3. a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)
  4. a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)

Workflow is the new content.

That line – workflow is the new content – is from Lorcan Dempsey and it was brought to my attention by Roger Schonfeld. For Open Access week, I gave a presentation on this idea of being mindful of workflow and tool choices in a presentation entitled, A Field Guide to Scholarly Communications Ecosystems. The slides are below.

(My apologies for not sharing the text that goes with the slides. Since January of this year, I have been the Head of the Information Services Department at my place of work. In addition to this responsibility, much of my time this year has been spent covering the work of colleagues currently on leave. Finding time to write has been a challenge.)

In Ontario, each institution of higher education must submit a ‘Strategic Mandate Agreement‘ with its largest funding body, the provincial government. Universities are currently in the second iteration of these types of agreements and are preparing for the third round. These agreements are considered fraught by many, including Marc Spooner, a professor in the faculty of education at the University of Regina, who wrote the following in an opinion piece in University Affairs:

The agreement is designed to collect quantitative information grouped under the following broad themes: a) student experience; b) innovation in teaching and learning excellence; c) access and equity; d) research excellence and impact; and e) innovation, economic development and community engagement. The collection of system-wide data is not a bad idea on its own. For example, looking at metrics like student retention data between years one and two, proportion of expenditures on student services, graduation rates, data on the number and proportion of Indigenous students, first-generation students and students with disabilities, and graduate employment rates, all can be helpful.

Where the plan goes off-track is with the system-wide metrics used to assess research excellence and impact: 1) Tri-council funding (total and share by council); 2) number of papers (total and per full-time faculty); and 3) number of citations (total and per paper). A tabulation of our worth as scholars is simply not possible through narrowly conceived, quantified metrics that merely total up research grants, peer-reviewed publications and citations. Such an approach perversely de-incentivises time-consuming research, community-based research, Indigenous research, innovative lines of inquiry and alternative forms of scholarship. It effectively displaces research that “matters” with research that “counts” and puts a premium on doing simply what counts as fast as possible…

Even more alarming – and what is hardly being discussed – is how these damaging and limited terms of reference will be amplified when the agreement enters its third phase, SMA3, from 2020 to 2023. In this third phase, the actual funding allotments to universities will be tied to their performance on the agreement’s extremely deficient metrics.

Ontario university strategic mandate agreements: a train wreck waiting to happen”, Marc Spooner, University Affairs, Jan 23 2018

The measure by which citation counts for each institution are going to be assessed have already been decided. The Ontario government has already stated that it is going to use Elsevier’s Scopus (although I presume they really meant SciVal).

What could possibly go wrong? To answer that question, let’s look at the second trend: enclosure.

2. Enclosing the social graph

The law locks up the man or woman
Who steals the goose from off the common
But leaves the greater villain loose
Who steals the common from off the goose.

Anonymous, “The Goose and the Commons”

As someone who spends a great deal of time ensuring that the scholarship of the University of Windsor’s Institutional Repository meets the stringent restrictions set by publishers, it’s hard not to feel a slap in the face when reading Springer Nature Syndicates Content to ResearchGate.

ResearchGate has been accused of “massive infringement of peer-reviewed, published journal articles.”

They say that the networking site is illegally obtaining and distributing research papers protected by copyright law. They also suggest that the site is deliberately tricking researchers into uploading protected content.

Who is the they of the above quote? Why they is the publishers, the American Chemical Society and Elsevier.

It is not uncommon to find selective enforcement of copyright within the scholarly communication landscape. Publishers have cast a blind eye to the copyright infringement of ResearchGate and Academia.edu for years, while targeting course reserve systems set up by libraries.

Any commercial system that is part of the scholarly communication workflow can be acquired for strategic purposes.

As I noted in my contribution to Grinding the Gears: Academic Librarians and Civic Responsibility, sometimes companies purchase competing companies as a means to control their development and even to shut their products down.

One of the least understood and thus least appreciated functions of calibre is that it uses the Open Publication Distribution System (OPDS) standard (opds-spec.org) to allow one to easily share e-books (at least those without Digital Rights Management software installed) to e-readers on the same local network. For example, on my iPod Touch, I have the e-reader program Stanza (itunes.apple.com/us/app/stanza/id284956128) installed and from it, I can access the calibre library catalogue on my laptop from within my house, since both are on the same local WiFi network. And so can anyone else in my family from their own mobile device. It’s worth noting that Stanza was bought by Amazon in 2011 and according to those who follow the digital e-reader market, it appears that Amazon may have done so solely for the purpose of stunting its development and sunsetting the software (Hoffelder,2013)

Grinding the Gears: Academic Librarians and Civic Responsibility” Lisa Sloniowski, Mita Williams, Patti Ryan, Urban Library Journal. Vol. 19. No.1 (2013). Special Issue: Libraries, Information and the Right to the city: Proceedings of the 2013 LACUNY Institute.

And sometimes companies acquire products to provide a tightly integrated suite of services and seamless workflow.

If individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?

And, indeed, whatever model the university may select, if individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?  So, the university’s efforts to ensure a more competitive overall marketplace through componentization may ultimately serve only to marginalize it.

“Big Deal: Should Universities Outsource More Core Research Infrastructure?”, Roger C. Schonfeld, January 4, 2018

Elsevier bought BePress in August of 2017. In May of 2016, Elsevier acquired SSRN. Bepress and SSRN are currently exploring further “potential areas of integration, including developing a single upload experience, conducting expanded research into rankings and download integration, as well as sending content from Digital Commons to SSRN.

Now, let’s get to the recent development that has me nervous.

10.2 Requirements for Plan S compliant Open Access repositories

The repository must be registered in the Directory of Open Access Repositories (OpenDOAR) or in the process of being registered.

In addition, the following criteria for repositories are required:

  • Automated manuscript ingest facility
  • Full text stored in XML in JATS standard (or equivalent)
  • Quality assured metadata in standard interoperable format, including information on the DOI of the original publication, on the version deposited (AAM/VoR), on the open access status and the license of the deposited version. The metadata must fulfil the same quality criteria as Open Access journals and platforms (see above). In particular, metadata must include complete and reliable information on funding provided by cOAlition S funders. OpenAIRE compliance is strongly recommended.
  • Open API to allow others (including machines) to access the content
  • QA process to integrate full text with core abstract and indexing services (for example PubMed)
  • Continuous availability

Automated manuscript ingest facility probably gives me the most pause. Automated means a direct pipeline from publisher to institutional repository that could be based on a publishers’ interpretation of fair use/fair dealing and we don’t know what the ramifications of that decision making might be. I’m feeling trepidation because I believe we are already experiencing the effects of a tighter integration between manuscript services and the IR.

Many publishers – including Wiley, Taylor and Francis, IEEE, and IOP – already use a third party manuscript service called ScholarOne. ScholarOne integrates the iThenticate service which produces reports of what percentage of a manuscript has already been published. Journal editors have the option to set what extent a paper can make use of a researcher’s prior work, including their thesis. Manuscripts that exceed these set thresholds can be automatically rejected without human interjection from the editor. We are only just starting to understand how this workflow is going to impact the willingness of young scholars to make their theses and dissertations open access.

It is also worth noting that ScholarOne is owned by Clarivate Analytics, the parent company of Web of Science, Incites, Journal Citation Reports, and others. One on hand, having a non-publisher act as a third party to the publishing process is probably ideal since it reduces the chances of a conflict of interest. On the other hand, I’m very unhappy with Clarivate Analytics’s product called Kopernio which provides “fast, one-click access to millions of research papers” and “integrates with Web of Science, Google Scholar, PubMed” and 20,000 other sites” (including ResearchGate and Academia.edu natch). There are prominent links to Kopernio within Web of Science that essentially positions the product as a direct competitor to a university library’s link resolver service and in doing so, removes the library from the scholarly workflow – other than the fact that the library pays for the product’s placement.

The winner takes it all

The genius — sometimes deliberate, sometimes accidental — of the enterprises now on such a steep ascent is that they have found their way through the looking-glass and emerged as something else. Their models are no longer models. The search engine is no longer a model of human knowledge, it is human knowledge. What began as a mapping of human meaning now defines human meaning, and has begun to control, rather than simply catalog or index, human thought. No one is at the controls. If enough drivers subscribe to a real-time map, traffic is controlled, with no central model except the traffic itself. The successful social network is no longer a model of the social graph, it is the social graph. This is why it is a winner-take-all game.

Childhood’s End, Edge, George Dyson [1.1.19]

Blogging again and Never again

It appears that I haven’t written a single post on this blog since July of 2018. Perhaps it is all the talk of resolutions around me but I sincerely would like to write more in this space in 2019. And the best way to do that is to just start.

In December of last year I listened to Episode 7 of Anil Dash’s Function Podcast: Fn 7: Behind the Rising Labor Movement in Tech.

This week on Function, we take a look at the rising labor movement in tech by hearing from those whose advocacy was instrumental in setting the foundation for what we see today around the dissent from tech workers.

Anil talks to Leigh Honeywell, CEO and founder of Tall Poppy and creator of the Never Again pledge, about how her early work, along with others, helped galvanize tech workers to connect the dots between different issues in tech.

Fn 7: Behind the Rising Labor Movement in Tech

I thought I was familiar with most of Leigh’s work but I realized that wasn’t the case because somehow her involvement with the Never Again pledge escaped my attention.

Here’s the pledge’s Introduction:

We, the undersigned, are employees of tech organizations and companies based in the United States. We are engineers, designers, business executives, and others whose jobs include managing or processing data about people. We are choosing to stand in solidarity with Muslim Americans, immigrants, and all people whose lives and livelihoods are threatened by the incoming administration’s proposed data collection policies. We refuse to build a database of people based on their Constitutionally-protected religious beliefs. We refuse to facilitate mass deportations of people the government believes to be undesirable.

We have educated ourselves on the history of threats like these, and on the roles that technology and technologists played in carrying them out. We see how IBM collaborated to digitize and streamline the Holocaust, contributing to the deaths of six million Jews and millions of others. We recall the internment of Japanese Americans during the Second World War. We recognize that mass deportations precipitated the very atrocity the word genocide was created to describe: the murder of 1.5 million Armenians in Turkey. We acknowledge that genocides are not merely a relic of the distant past—among others, Tutsi Rwandans and Bosnian Muslims have been victims in our lifetimes.

Today we stand together to say: not on our watch, and never again.

“Our pledge”, Never Again.

The episode reminded me that while I am not an employee in the United States who is directly complicit with the facilitation of deportation, as a Canadian academic librarian, I am not entirely free from some degree of complicity as I am employed at a University that subscribes to WESTLAW .

The Intercept is reporting on Thomson Reuters response to Privacy International’s letter to TRI CEO Jim Smith expressing the watchdog group’s “concern” over the company’s involvement with ICE. According to The Intercept article “Thomson Reuters Special Services sells ICE ‘a continuous monitoring and alert service that provides real-time jail booking data to support the identification and location of aliens’ as part of a $6.7 million contract, and West Publishing, another subsidiary, provides ICE’s “Detention Compliance and Removals” office with access to a vast license-plate scanning database, along with agency access to the Consolidated Lead Evaluation and Reporting, or CLEAR, system.” The two contracts together are worth $26 million. The article observes that “the company is ready to defend at lease one of those contracts while remaining silent on the rest.”

“Thomson Reuters defends $26 million contracts with ICE”
by Joe Hodnicki (Law Librarian Blog) on June 28, 2018

I also work at a library that subscribes to products that are provided by Elsevier and whose parent company is the RELX Group.

In 2015, Reed Elsevier rebranded itself as RELX and moved further away from traditional academic and professional publishing. This year [2018], the company purchased ThreatMetrix, a cybersecurity company that specializes in tracking and authenticating people’s online activities, which even tech reporters saw as a notable departure from the company’s prior academic publishing role.

Surveillance and Legal Research Providers: What You Need to Know“, Sarah Lamdan, Medium, July 6, 2018.

Welcome to 2019. There is work to do and it’s time to start.

What ruined the web was the lack of good library software

In some libraries, there are sometimes particular collections in which the objects are organized by the order in which they were acquired (at my place of work, our relatively small collection of movies on DVD are ordered this way). This practice makes it easy for a person to quickly see what has been most recently been received or what’s been newly published. Such collections are easy to start and maintain as you just have to sort them by ‘acquisition number’.

But you would be hard pressed to find a good reason to organize a large amount of material this way. Eventually a collection grows too large to browse in its entirety and you have people telling you that they would rather browse the collection by author name, or by publication year, or by subject. But to allow for this means organizing the collection and let me tell you my non-library staff friends, such organization is a lot of bother — it takes time, thought and consistent diligence.

Which is why we are where we are with today’s state of the web.

Early homepages were like little libraries…

A well-organized homepage was a sign of personal and professional pride — even if it was nothing but a collection of fun gifs, or instructions on how to make the best potato guns, or homebrew research on gerbil genetics.

Dates didn’t matter all that much. Content lasted longer; there was less of it. Older content remained in view, too, because the dominant metaphor was table of contents rather than diary entry.

Everyone with a homepage became a de facto amateur reference librarian.

Obviously, it didn’t last.

The above is from a short essay by Amy Hoy about Moveable Type – one of the first blogging platforms – and how MT and other blogging platforms that facilitated easy chronological ordering of blog posts may be have been the true culprit that ruined the web.

Movable Type didn’t just kill off blog customization.

It (and its competitors) actively killed other forms of web production.

Non-diarists — those folks with the old school librarian-style homepages — wanted those super-cool sidebar calendars just like the bloggers did. They were lured by the siren of easy use. So despite the fact that they weren’t writing daily diaries, they invested time and effort into migrating to this new platform.

They soon learned the chronostream was a decent servant, but a terrible master.

We no longer build sites. We generate streams.

All because building and maintaining a library is hard work.

[The above was first shared in my weekly newsletter University of Winds which provides three links to wonderful and thought-provoking things in the world every Saturday morning].