Open Refine for Librarians

On October 24th, 2018, I gave a half-hour online presentation as part of a virtual conference from NISO called That Cutting Edge: Technology’s Impact on Scholarly Research Processes in the Library.

My presentation was called:

And here is the script and the slides that I presented:

Good afternoon. Thank you for the opportunity to introduce you to OpenRefine.

Even if you have already heard of OpenRefine before, I hope you will still find this session useful as I have tried to make a argument why librarians should invest investigate technologies like OpenRefine for Scholarly Research purposes.

This is talk has three parts.

I like to call OpenRefine the most popular library tool that you’ve never heard of.

After that this introduction, I hope that this statement will become just a little less true.

You can download OpenRefine from its official website OpenRefine.org

OpenRefine began as Google Refine and before that it was Freebase Gridworks, created by David François Huynh in January 2010 while he was working at Metaweb, the company that was responsible for Freebase.

In July 2010, Freebase and Freebase Gridworks were bought by Google adopted the technology and Freebase Gridworks was rebranded as Google Refine. While the code was always open source, Google supported the project until 2012. From that point on, the project became a community supported Open Source product and as such was renamed OpenRefine.

As an aside, Freebase was officially shut down by Google in 2016 and the data from that project was transferred to Wikidata, Wikimedia’s structured data project.  


OpenRefine is software written in Java. The program is downloaded on to your computer and accessible through the browser. OpenRefine includes web server software and so it is not necessary to be connected to the internet in order to make use of OpenRefine.

At the top of the slide is a screen capture of what you first see when you start the program. The dark black window is what opens behind the scenes if you are interested in monitoring the various processes that you are putting OpenRefine through. And in the corner of the slide, you can see a typical view of OpenRefine with data in it.

OpenRefine has been described by its creator as “a power tool for working with messy data”

Wikipedia calls OpenRefine “a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling”.

OpenRefine is used to standardize and clean data across your file or spreadsheet.

This slide is a screenshot from the Library Carpentry set of lessons on Open Refine. I like how they introduce you to the software by explaining some common scenarios in which you might use the software.

These scenarios include

  • When you want to know how many times a particular value (name, publisher, subject) appears in a column in your data
  • When you want to know how values are distributed across your whole data set
  • When you have a list of dates which are formatted in different ways, and want to change all the dates in the list to a single common date format

The software developers who maintain and extend the powers of Open Refine run regular surveys to learn more about their fellow users and in 2018 they found that their largest community was from libraries – a group that did not even register as its own category in the original 2012 survey.

So we know that the largest OpenRefine user group are librarians. Do we know how OpenRefine use measures up *within* the population of librarians?  Unfortunately we don’t.

While we don’t expect such a specialized tool to be widely used across all the different types of librarians and library work, we have seen from this recent survey of metadata managers of OCLC Research Library Partners, OpenRefine is the second most popular tool used, second to MarcEdit.

That being said, once you know the power of OpenRefine, you will like myself see all these other potential uses for the tool outside of metadata cleanup. In August of this year, I read this tweet from fellow Canadian scholarly communications librarian, Ryan Reiger and sent some links that had instructions that illustrated how OpenRefine could help with this research question


When introducing new technology to others, it’s very important not to over sell it and to manage expectations.

LINK

But I’m not the only one who feels strongly about the power of OpenRefine. For good reasons, which we will explore in the second section of this talk

If you asked me what is the most popular technology used by librarians in their work and support of scholarship, I would say that one answer could be Microsoft Excel. Many librarians I know do their collections work, their electronic resources work, and their data work in Excel and they are very good at it.

But there are some very good reasons to reconsider using Excel for our work.

This slide outlines what I consider some of the strongest reasons to consider using OpenRefine. First, the software is able to handle more types of data than Excel can. Excel can handle rows of data. Open Refine can handle rows and records of data.

For many day to day uses of Excel it is unlikely you will run into the maximum capacity of the software but for those who work with large data sets, a limit of a 1 million and change rows might and can be a problem.

But the most important reason why we should consider OpenRefine is the same reason why it’s fundamentally different than Excel. Unlike spreadsheets software like Excel, no formulas are stored in the cells of OpenRefine

Instead formulas are used to transform the data and these formulas are tracked as scripts.

Don’t use Excel / Genome Biology

Not only do the cells of Excel contain formulas that transform the data presented in ways that are not always clear, Excel sometimes transforms your data without clearly demonstrating that it is doing so. According to a paper from 2016, one fifth of genomics journals had datasets with errors from Excel transforming gene names such as SEPT10 as dates.

I want to be clear, I am not saying that Excel is bad and people who use Excel are also bad.

We can all employ good data practices whether we use spreadsheets or data wrangling tools such as OpenRefine. I believe we have to meet people where they are with their data tool choices. This is part of the approach taken by the good people responsible for this series of lessons as Part of the Ecology Curriculum of Data Carpentry.

And with that, I just want to take the briefest of moments to thank the good people behind Software Carpentry and Data Carpentry – collectively now known as The Carpentries as I am pretty sure it was their work that introduced me to the world of OpenRefine

This slide is taken from the Library Carpentry OpenRefine lesson. There is too much text on the slide to read but the gist of the message is this: OpenRefine saves every change you make to your dataset and these changes are saved as scripts. After you clean up a set of messy data, you can export your script of the transformations you made and then you can apply that script to other similarly messy data sets

Not only does this ability saves the time of the wrangler, this ability to save scripts separately from the data itself lends itself to Reproducible Science.

Here is a screenshot of a script captured in OpenRefine in both English and in JSON.

It is difficult for me to express how important and how useful it is for OpenRefine to separate the work from data in this way.  

This is the means by which librarians can share workflows outside of their organizations without worrying about accidentally sharing private data.

link

As more librarians start using more complicated data sets and data tools for supporting research and their own research, the more opportunities there will be for embodying,  demonstrating, and teaching good data practices.

I remember the instance in which I personally benefited from someone sharing their work with OpenRefine. It was this blog post from Angela Galvan which walked me through the process of looking up a list of ISSNs and running that list through the Sherpa Romeo API and using this formula on the screen to quickly and clearly present whether a particular journal allowed publisher PDFs to be added to the institutional repository or not.

And with that, here’s a bit of tour of how libraries are using OpenRefine in their work

This is from a walkthrough by Owen Stephens in which he used both MarcEdit and OpenRefine to find errors in 50,000 bibliographic records and make the corrections necessary so that they would be able to be loaded into a new library management system.

I haven’t spent much time highlighting it, but one of the most appreciated features of OpenRefine are related to its data visualizations that allow the wrangler to find differences that make a difference in the data.

The slide features two screens captures. In the lower screen, OpenRefine has used fuzzy matching algorithms to discover variations of entries that are statistically likely meant to be the same.

link

I had mentioned previously that I had used OpenRefine to use the Sherpa/Romero API. This ability of OpenRefine to allow users access to the API who may not be entirely comfortable using command-line scripting or programming should not be understated. That’s why lesson plans that use OpenRefine to perform such tasks as web scraping as pictured here, are appreciated.

link

With OpenRefine, libraries are finding ways to use reconciliation services for local projects. I am just going to read the last bit of the last line of this abstract for emphasis: a hack using OpenRefine yielded a 99.6% authority reconciliation and a stable process for monthly data verification. And as you now know, this was likely done through OpenRefine scripts.

OpenRefine has proved useful in preparing linked data…

link

And if staff feel more comfortable using spreadsheets, OpenRefine can be used to covert those spreadsheets into forms such as MODSXML

link

Knowing the history of OpenRefine, you might not be surprised to learn that it has built in capabilities to reconcile data to controlled vocabularies…

link

But you might be pleasantly surprised to learn that OpenRefine can reconcile data from VIVO, JournalTOC, VIAF, and FAST from OCLC.

link

But the data reconciliation service that I’m particular following is from Wikidata.

In this video example, John Little uses data from VIAF and Wikidata to gather authoritative versions of author names plus additional information including their place of birth.

I think it’s only appropriate that OpenRefine connects to Wikidata when you remember that both projects had their origins in the Freebase project.

link

Wikidata is worthy of its own talk – and maybe even its own conference – but since we are very close to the end of this presentation, let me introduce you to Wikidata as structured linked data that any one can use and improve.

link

I was introduced to the power of Wikidata and how its work could extend our library work from librarians such as Dan Scott and Stacy Allison Cassin. In this slide, you can see a screen capture from Dan’s presentation that highlights that the power of Wikidata is that it doesn’t just collect formal institutional identifiers such as from LC or ISNI but sources such as AllMusic.

link

And this is the example that I would like to end my presentation on. The combination of OpenRefine and Wikidata – working together – can allow the librarian not only to explore, clean up and normalize their data sets – OpenRefine has the ability to extend our data and to connect it to the world.

It really is magic.

Back to the Future of Libraries

I am in the process of reading Clive Thompson’s Coders: The Making of a New Tribe and the Remaking of the World and I have to say that I am, so far, disappointed with the book. I am a fan of Thompson’s technology journalism and I really enjoyed his earlier work, Smarter than you think: How Technology is Changing Our Minds for the Better, so I thought I would be a good reader and order the book as soon as it came out from my local. And it’s not a bad book. The book does what it says its going to do on the tin: it is a book about the tribe of coders.

The trouble is mine: I am not interested in a measured account of the lives of coders in America. I think the status quo for computing is dismal.

The way that we require people to have to think like a computer in order to make use of computers is many things. It is de-humanizing. It is unnecessary hardship. It feels wrong.

This is why Bret Victor’s Inventing on Principle (2012) presentation was (and remains) so incredible to me. Victor sets out to demonstrate that creators need (computer) tools that provide them with the most direct and immediately responsive connection to their creation as possible:

If we look to the past, we can find alternative futures to computing that might have served us better, if we had only gone down a different path. Here’s Bret Victor’s The Future of Programming 1973 (2013) which you should watch a few minutes of, if just to appreciate his alternative to powerpoint:

Here’s another video that looks to the past to see what other futures had been forsaken that is definitely worth your time. It is of and from Claire Evans – author of Broad Band: The Untold Story of the Women Who Made the Internet – who spoke at XOXO 2018.

At the around the 11 minute mark, Evans sets the scene for the first unveiling of Tim Berner Lee’s World Wide Web, and it’s a great story because when Lee first demonstrated his work at Hypertext ’91, the other attendees were not particularly impressed. Evans explains why.

So why am I telling you all about the history of computing on my library-themed blog? Well, one reason is that our profession has not done a great job of knowing our own (female) history of computing.

Case in point: until this twitter exchange, I had not had the pleasure of knowing Linda Smith or her work:

(There was now deleted post from a librarian blog from 2012 that comes to mind. I’m not entirely sure of the etiquette of quoting deleted posts, so I will paraphrase the post as the following text…)

Despite librarianship being a feminized and predominantly female profession, [author of aforementioned blog post] remarked that she was never introduced to the following women in library school despite their accomplishments: Suzanne Briet, Karen Spärck-Jones, Henriette Avram, and Elaine Svenonius. And if my memory can be trusted, I believe the same was true for myself.

Is there a connection between a more human(e) type of computing that Bret Victor advocates for with the computing innovations from women that Claire Evans wants to learn from and these lesser known women of librarianship and its adjacent in computing? I think there might be.

From the Overlooked series of obituaries from The New York Times for
Karen Spärck-Jones

When most scientists were trying to make people use code to talk to computers, Karen Sparck Jones taught computers to understand human language instead.

In so doing, her technology established the basis of search engines like Google.

A self-taught programmer with a focus on natural language processing, and an advocate for women in the field, Sparck Jones also foreshadowed by decades Silicon Valley’s current reckoning, warning about the risks of technology being led by computer scientists who were not attuned to its social implications

“A lot of the stuff she was working on until five or 10 years ago seemed like mad nonsense, and now we take it for granted,” said John Tait, a longtime friend who works with the British Computer Society.

Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines” by Nellie Bowles, The New York Times, Jan. 2, 2019

I have already given my fair share of future of the library talks already, so I think it is for the best if some one else takes up the challenge of looking to into the work of librarians past to see if we can’t refactor our present into a better future.

Haunted libraries, invisible labour, and the librarian as an instrument of surveillance

This post was inspired by the article Intersubjectivity and Ghostly Library Labor by Liz Settoducato which was published earlier this month on In the library with the lead pipe. The article, in brief:

Libraries are haunted houses. As our patrons move through scenes and illusions that took years of labor to build and maintain, we workers are hidden, erasing ourselves in the hopes of providing a seamless user experience, in the hopes that these patrons will help defend Libraries against claims of death or obsolescence. However, ‘death of libraries’ arguments that equate death with irrelevance are fundamentally mistaken. If we imagine that a collective fear has come true and libraries are dead, it stands to reason that library workers are ghosts. Ghosts have considerable power and ubiquity in the popular imagination, making death a site of creative possibility. Using the scholarly lens of haunting, I argue that we can experience time creatively, better positioning ourselves to resist the demands of neoliberalism by imagining and enacting positive futurities.

Intersubjectivity and Ghostly Library Labor by Liz Settoducato, In the library with the lead pipe

I also think libraries can be described as haunted but for other reasons than Settoducato suggests. That doesn’t mean I think Settoducato is wrong or the article is bad. On the contrary – I found the article delightful and I learned a lot from it. For example, having not read Foucault myself, this was new to me:

In such examples, books are a necessary component of the aesthetic of librarianship, juxtaposing the material (books and physical space) with the immaterial (ghosts). Juxtaposition is central to Michel Foucault’s concept of heterotopias, places he describes as “capable of juxtaposing in a single real place several spaces, several sites that are in themselves incompatible” (1984, 6). Foucault identifies cemeteries, libraries, and museums among his examples of heterotopias, as they are linked by unique relationships to time and memory. Cemeteries juxtapose life and death, loss (of life) and creation (of monuments), history and modernity as their grounds become increasingly populated. Similarly, libraries and museums embody “a sort of perpetual and indefinite accumulation of time in an immobile place,” organizing and enclosing representations of memory and knowledge (Foucault 1984, 7).

Intersubjectivity and Ghostly Library Labor by Liz Settoducato, In the library with the lead pipe

That passage felt true to me. As I once confessed to an avocado on the Internet…

There are other passages in Intersubjectivity that I think could be expanded upon. For example, while I completely agree with its expression that the labour of library staff is largely invisible, I believe that particular invisibility was prevalent long before neoliberalism. The librarian has been subservient to those who endow the books for hundreds of years.

Richard Bentley, for his part, continued to run into problems with libraries. Long after the quarrel of the ancients and moderns had fizzled, he installed a young cousin, Thomas Bentley, as keeper of the library of Cambridge’s Trinity College. At Richard’s urging, the young librarian followed the path of a professional, pursuing a doctoral degree and taking long trips to the Continent in search of new books for the library. The college officers, however, did not approve of his activities. The library had been endowed by Sir Edward Stanhope, whose own ideas about librarianship were decidedly more modest than those of the Bentleys. In 1728, a move was made to remove the younger Bentley, on the ground that his long absence, studying and acquiring books in Rome and elsewhere, among other things, disqualified him from the post. In his characteristically bullish fashion, Richard Bentley rode to his nephew’s defense. In a letter, he admits that “the keeper [has not] observed all the conditions expressed in Sir Edward Stanhope’s will,” which had imposed a strict definition of the role of librarian. Bentley enumerates Sir Edward’s stipulations, thereby illuminating the sorry state of librarianship in the eighteenth century. The librarian is not to teach or hold office in the college; he shall not be absent from his appointed place in the library more than forty days out of the year; he cannot hold a degree above that of master of arts; he is to watch each library reader, and never let one out of his sight

Library: An Unquiet History by Matthew Battles

He is to watch each library reader” is a key phrase here. From the beginning, librarians and library staff were installed as instruments of surveillance as a means to protect property.

Even to this day, I will hear of university departments who wish to make a collection of material available for the use of faculty and students and are so committed to this end that they will secure a room, which is no small feat on a campus nowadays. But then the faculty or students refuse to share their most precious works because they realize that their materials in an open and accessible room will be subject to theft or vandalism.

Same as it ever was.

” Social security cards unlock the library’s door. ” image from Amelia Acker.

Presently, a handful of municipal libraries in Denmark operate with open service models. These open libraries rely on the self-service of patrons and have no library staff present—loans, returns, admittance and departing the physical library space are regulated through automated access points. Many public library users are familiar with self-check out kiosks and access to the collections database through a personal computing station, but few patrons have ever been in a public library without librarians, staff workers or security personnel. Libraries that rely on self-service operation models represent a new kind of enclosed environment in societies of control. Such automated interior spaces correspond to a crisis in libraries and other institutions of memory like museums or archives. Under the guise of reform, longer service hours, and cost-saving measures, libraries with rationalized operating models conscript their users into a new kind of surveillance….

The open library disciplines and controls the user by eliminating the librarian, enrolling the user into a compulsory self-service to engage with the automated space. The power of this engagement is derived from a regime of panoptic access points that visualize, capture and document the user’s path and her ability to regulate herself during every movement and transition in the library—from entering, searching the catalog, browsing the web, borrowing information resources, to exiting the building.

Soft Discipline and Open Libraries in Denmark, Amelia Acker. Posted on Saturday, November 3, 2012, at 5:00 pm.

That was written in 2012.

The tools of monitoring and affecting space have widely proliferated in the ‘smart home’ category since then. We have services such as AirBnB that allows all manner of spaces to made available to others. We have technologies such as Nest that act as combination thermostats, smoke detectors, and security systems that are learning systems as they use AI to discover patterns of use not readily apparent to the human mind. And then we have the spooky and unpredictable spaces where these technologies interact with each other.

Because of these technologies, many, many spaces are going to feel haunted. Not just libraries:

The other day, after watching Crimson Peak for the first time, I woke up with a fully-fleshed idea for a Gothic horror story about experience design. And while the story would take place in the past, it would really be about the future. Why? Because the future itself is Gothic.

First, what is Gothic? Gothic (or “the Gothic” if you’re in academia) is a Romantic mode of literature and art. It’s a backlash against the Enlightenment obsession with order and taxonomy. It’s a radical imposition of mystery on an increasingly mundane landscape. It’s the anticipatory dread of irrational behaviour in a seemingly rational world. But it’s also a mode that places significant weight on secrets — which, in an era of diminished privacy and ubiquitous surveillance, resonates ever more strongly….

… Consider the disappearance of the interface. As our devices become smaller and more intuitive, our need to see how they work in order to work them goes away. Buttons have transformed into icons, and icons into gestures. Soon gestures will likely transform into thoughts, with brainwave-triggers and implants quietly automating certain functions in the background of our lives. Once upon a time, we valued big hulking chunks of technology: rockets, cars, huge brushed-steel hi-fis set in ornate wood cabinets, thrumming computers whose output could heat an office, even odd little single-purpose kitchen widgets. Now what we want is to be Beauty in the Beast’s castle: making our wishes known to the household gods, and watching as the “automagic” takes care of us. From Siri to Cortana to Alexa, we are allowing our lives and livelihoods to become haunted by ghosts without shells.

Our Gothic Future, Madeline Ashby, February 25, 2016.

How can we resist this future that is being made for us but not with us? One of my favourite passages of Intersubjectivty suggests a rich field of possibility that I can’t wait to explore further:

However, it does not have to be this way. David Mitchell and Sharon Snyder also take up the questions of embodiment and productivity, examining through a disability studies lens the ways in which disabled people have historically been positioned as outside the laboring masses due to their “non-productive bodies” (2010, 186). They posit that this distinction transforms as the landscape of labor shifts toward digital and immaterial outputs from work in virtual or remote contexts, establishing the disabled body as a site of radical possibility. Alison Kafer’s crip time is similarly engaged in radical re-imagining, challenging the ways in which “‘the future’ has been deployed in the service of compulsory able-bodiedness and able-mindedness” (2013, 26-27). That is, one’s ability to exist in the future, or live in a positive version of the future is informed by the precarity of their social position. The work of theorists like Mitchell, Snyder, and Kafer is significant because it insists on a future in which disabled people not only exist, but also thrive despite the pressures of capitalism.

Intersubjectivity and Ghostly Library Labor by Liz Settoducato, In the library with the lead pipe

[An aside: a research library filled with non-productive objects can also be seen to resist capitalism. ]

In conclusion, I would like to answer this dear student who asked this important question:

The answer is: yes.
The library staff are the ghosts in the machine.

Digitization is a multiplier and metadata is a fractal

In Harry Potter and the Deathly Hallows (sorry), Helga Hufflepuff’s goblet is stored in a vault at Gringotts that’s been cursed so that every time you touch one of the objects in it, dozens of copies are created. On the cover of the original U.K. edition of the book, Harry, Ron and Hermione are pictured scrambling atop a wave of facsimile’d treasure. I’ve started thinking about special collections digitization like this. Digitization doesn’t streamline or simplify library collections; rather, it multiplies them, as every interaction creates additional objects for curation and preservation

The above is from Harry Potter and the Responsible Version Control of Digital Surrogates and it is one of the few examples that I know of that uses the Harry Potter and the… trope appropriately. It is a post written by Emma Stanford, Digital Curator at the Bodleian Libraries from some months past but it came to my mind this week after reading this from Jep Thorpe‘s newsletter a couple of days ago:

The amount of data that can be conjured from any given thing is almost limitless. Pick up a plain grey rock from the side of the road, and in moments you can make a small dataset about it: size, weight, colour, texture, shape, material. If you take that rock to a laboratory these data can be made greatly more precise, and instrumentation beyond our own human sensorium can add to the list of records: temperature, chemical composition, carbon date. From here there is a kind of fractal unfolding of information that begins to occur, where each of these records in turn manifest their own data. The time at which the measurement was made, the instrument used to record it, the person who performed the task, the place where the analysis was performed. In turn, each of these new meta-data records can carry its own data: the age of the person who performed the task, the model of the instrument, the temperature of the room. Data begets data, which begets meta data, repeat, repeat, repeat. It’s data all the way down.

We use computers because they are supposed to make our lives more efficient but at every layer that they are applied they introduce complexity. This is one of the takeaways that I gained from reading the “Designing Freedom” Massey Lectures from cyberneticist Stafford Beer.

The book is very interesting but also a somewhat frustrating read and so if you are interested in learning more, I’d suggest this podcast episode dedicated to the book from the cybernetic marxists of General Intellect Unit.

Yes. There is now a podcast episode for everything.

If the map becomes the territory then we will be lost

That which computation sets out to map and model it eventually takes over. Google sets out to index all human knowledge and becomes the source and the arbiter of that knowledge: it became what people think. Facebook set out to map the connections between people – the social graph – and became the platform for those connections, irrevocably reshaping societal relationships. Like an air control system mistaking a flock of birds for a fleet of bombers, software is unable to distinguish between the model of the world and reality – and, once conditioned, neither are we.

James Bridle, New Dark Age, p.39.

I am here to bring your attention to two developments that are making me worried:

  1. The Social Graph of Scholarly Communications is becoming more tightly bound into institutional metrics that have an increasing influence on institutional funding
  2. The publishers of the Social Graph of Scholarship are beginning to enclose the Social Graph, excluding the infrastructure of libraries and other independent, non-profit organizations

Normally, I would try to separate these ideas into two dedicated posts but in this case, I want to bring them together in writing because if these two trends converge together, things will become very bad, very quickly.

Let me start with the first trend:

1. The social graph that binds

When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:

  1. a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)
  2. a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)
  3. a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)
  4. a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)

Workflow is the new content.

That line – workflow is the new content – is from Lorcan Dempsey and it was brought to my attention by Roger Schonfeld. For Open Access week, I gave a presentation on this idea of being mindful of workflow and tool choices in a presentation entitled, A Field Guide to Scholarly Communications Ecosystems. The slides are below.

(My apologies for not sharing the text that goes with the slides. Since January of this year, I have been the Head of the Information Services Department at my place of work. In addition to this responsibility, much of my time this year has been spent covering the work of colleagues currently on leave. Finding time to write has been a challenge.)

In Ontario, each institution of higher education must submit a ‘Strategic Mandate Agreement‘ with its largest funding body, the provincial government. Universities are currently in the second iteration of these types of agreements and are preparing for the third round. These agreements are considered fraught by many, including Marc Spooner, a professor in the faculty of education at the University of Regina, who wrote the following in an opinion piece in University Affairs:

The agreement is designed to collect quantitative information grouped under the following broad themes: a) student experience; b) innovation in teaching and learning excellence; c) access and equity; d) research excellence and impact; and e) innovation, economic development and community engagement. The collection of system-wide data is not a bad idea on its own. For example, looking at metrics like student retention data between years one and two, proportion of expenditures on student services, graduation rates, data on the number and proportion of Indigenous students, first-generation students and students with disabilities, and graduate employment rates, all can be helpful.

Where the plan goes off-track is with the system-wide metrics used to assess research excellence and impact: 1) Tri-council funding (total and share by council); 2) number of papers (total and per full-time faculty); and 3) number of citations (total and per paper). A tabulation of our worth as scholars is simply not possible through narrowly conceived, quantified metrics that merely total up research grants, peer-reviewed publications and citations. Such an approach perversely de-incentivises time-consuming research, community-based research, Indigenous research, innovative lines of inquiry and alternative forms of scholarship. It effectively displaces research that “matters” with research that “counts” and puts a premium on doing simply what counts as fast as possible…

Even more alarming – and what is hardly being discussed – is how these damaging and limited terms of reference will be amplified when the agreement enters its third phase, SMA3, from 2020 to 2023. In this third phase, the actual funding allotments to universities will be tied to their performance on the agreement’s extremely deficient metrics.

Ontario university strategic mandate agreements: a train wreck waiting to happen”, Marc Spooner, University Affairs, Jan 23 2018

The measure by which citation counts for each institution are going to be assessed have already been decided. The Ontario government has already stated that it is going to use Elsevier’s Scopus (although I presume they really meant SciVal).

What could possibly go wrong? To answer that question, let’s look at the second trend: enclosure.

2. Enclosing the social graph

The law locks up the man or woman
Who steals the goose from off the common
But leaves the greater villain loose
Who steals the common from off the goose.

Anonymous, “The Goose and the Commons”

As someone who spends a great deal of time ensuring that the scholarship of the University of Windsor’s Institutional Repository meets the stringent restrictions set by publishers, it’s hard not to feel a slap in the face when reading Springer Nature Syndicates Content to ResearchGate.

ResearchGate has been accused of “massive infringement of peer-reviewed, published journal articles.”

They say that the networking site is illegally obtaining and distributing research papers protected by copyright law. They also suggest that the site is deliberately tricking researchers into uploading protected content.

Who is the they of the above quote? Why they is the publishers, the American Chemical Society and Elsevier.

It is not uncommon to find selective enforcement of copyright within the scholarly communication landscape. Publishers have cast a blind eye to the copyright infringement of ResearchGate and Academia.edu for years, while targeting course reserve systems set up by libraries.

Any commercial system that is part of the scholarly communication workflow can be acquired for strategic purposes.

As I noted in my contribution to Grinding the Gears: Academic Librarians and Civic Responsibility, sometimes companies purchase competing companies as a means to control their development and even to shut their products down.

One of the least understood and thus least appreciated functions of calibre is that it uses the Open Publication Distribution System (OPDS) standard (opds-spec.org) to allow one to easily share e-books (at least those without Digital Rights Management software installed) to e-readers on the same local network. For example, on my iPod Touch, I have the e-reader program Stanza (itunes.apple.com/us/app/stanza/id284956128) installed and from it, I can access the calibre library catalogue on my laptop from within my house, since both are on the same local WiFi network. And so can anyone else in my family from their own mobile device. It’s worth noting that Stanza was bought by Amazon in 2011 and according to those who follow the digital e-reader market, it appears that Amazon may have done so solely for the purpose of stunting its development and sunsetting the software (Hoffelder,2013)

Grinding the Gears: Academic Librarians and Civic Responsibility” Lisa Sloniowski, Mita Williams, Patti Ryan, Urban Library Journal. Vol. 19. No.1 (2013). Special Issue: Libraries, Information and the Right to the city: Proceedings of the 2013 LACUNY Institute.

And sometimes companies acquire products to provide a tightly integrated suite of services and seamless workflow.

If individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?

And, indeed, whatever model the university may select, if individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?  So, the university’s efforts to ensure a more competitive overall marketplace through componentization may ultimately serve only to marginalize it.

“Big Deal: Should Universities Outsource More Core Research Infrastructure?”, Roger C. Schonfeld, January 4, 2018

Elsevier bought BePress in August of 2017. In May of 2016, Elsevier acquired SSRN. Bepress and SSRN are currently exploring further “potential areas of integration, including developing a single upload experience, conducting expanded research into rankings and download integration, as well as sending content from Digital Commons to SSRN.

Now, let’s get to the recent development that has me nervous.

10.2 Requirements for Plan S compliant Open Access repositories

The repository must be registered in the Directory of Open Access Repositories (OpenDOAR) or in the process of being registered.

In addition, the following criteria for repositories are required:

  • Automated manuscript ingest facility
  • Full text stored in XML in JATS standard (or equivalent)
  • Quality assured metadata in standard interoperable format, including information on the DOI of the original publication, on the version deposited (AAM/VoR), on the open access status and the license of the deposited version. The metadata must fulfil the same quality criteria as Open Access journals and platforms (see above). In particular, metadata must include complete and reliable information on funding provided by cOAlition S funders. OpenAIRE compliance is strongly recommended.
  • Open API to allow others (including machines) to access the content
  • QA process to integrate full text with core abstract and indexing services (for example PubMed)
  • Continuous availability

Automated manuscript ingest facility probably gives me the most pause. Automated means a direct pipeline from publisher to institutional repository that could be based on a publishers’ interpretation of fair use/fair dealing and we don’t know what the ramifications of that decision making might be. I’m feeling trepidation because I believe we are already experiencing the effects of a tighter integration between manuscript services and the IR.

Many publishers – including Wiley, Taylor and Francis, IEEE, and IOP – already use a third party manuscript service called ScholarOne. ScholarOne integrates the iThenticate service which produces reports of what percentage of a manuscript has already been published. Journal editors have the option to set what extent a paper can make use of a researcher’s prior work, including their thesis. Manuscripts that exceed these set thresholds can be automatically rejected without human interjection from the editor. We are only just starting to understand how this workflow is going to impact the willingness of young scholars to make their theses and dissertations open access.

It is also worth noting that ScholarOne is owned by Clarivate Analytics, the parent company of Web of Science, Incites, Journal Citation Reports, and others. One on hand, having a non-publisher act as a third party to the publishing process is probably ideal since it reduces the chances of a conflict of interest. On the other hand, I’m very unhappy with Clarivate Analytics’s product called Kopernio which provides “fast, one-click access to millions of research papers” and “integrates with Web of Science, Google Scholar, PubMed” and 20,000 other sites” (including ResearchGate and Academia.edu natch). There are prominent links to Kopernio within Web of Science that essentially positions the product as a direct competitor to a university library’s link resolver service and in doing so, removes the library from the scholarly workflow – other than the fact that the library pays for the product’s placement.

The winner takes it all

The genius — sometimes deliberate, sometimes accidental — of the enterprises now on such a steep ascent is that they have found their way through the looking-glass and emerged as something else. Their models are no longer models. The search engine is no longer a model of human knowledge, it is human knowledge. What began as a mapping of human meaning now defines human meaning, and has begun to control, rather than simply catalog or index, human thought. No one is at the controls. If enough drivers subscribe to a real-time map, traffic is controlled, with no central model except the traffic itself. The successful social network is no longer a model of the social graph, it is the social graph. This is why it is a winner-take-all game.

Childhood’s End, Edge, George Dyson [1.1.19]

Blogging again and Never again

It appears that I haven’t written a single post on this blog since July of 2018. Perhaps it is all the talk of resolutions around me but I sincerely would like to write more in this space in 2019. And the best way to do that is to just start.

In December of last year I listened to Episode 7 of Anil Dash’s Function Podcast: Fn 7: Behind the Rising Labor Movement in Tech.

This week on Function, we take a look at the rising labor movement in tech by hearing from those whose advocacy was instrumental in setting the foundation for what we see today around the dissent from tech workers.

Anil talks to Leigh Honeywell, CEO and founder of Tall Poppy and creator of the Never Again pledge, about how her early work, along with others, helped galvanize tech workers to connect the dots between different issues in tech.

Fn 7: Behind the Rising Labor Movement in Tech

I thought I was familiar with most of Leigh’s work but I realized that wasn’t the case because somehow her involvement with the Never Again pledge escaped my attention.

Here’s the pledge’s Introduction:

We, the undersigned, are employees of tech organizations and companies based in the United States. We are engineers, designers, business executives, and others whose jobs include managing or processing data about people. We are choosing to stand in solidarity with Muslim Americans, immigrants, and all people whose lives and livelihoods are threatened by the incoming administration’s proposed data collection policies. We refuse to build a database of people based on their Constitutionally-protected religious beliefs. We refuse to facilitate mass deportations of people the government believes to be undesirable.

We have educated ourselves on the history of threats like these, and on the roles that technology and technologists played in carrying them out. We see how IBM collaborated to digitize and streamline the Holocaust, contributing to the deaths of six million Jews and millions of others. We recall the internment of Japanese Americans during the Second World War. We recognize that mass deportations precipitated the very atrocity the word genocide was created to describe: the murder of 1.5 million Armenians in Turkey. We acknowledge that genocides are not merely a relic of the distant past—among others, Tutsi Rwandans and Bosnian Muslims have been victims in our lifetimes.

Today we stand together to say: not on our watch, and never again.

“Our pledge”, Never Again.

The episode reminded me that while I am not an employee in the United States who is directly complicit with the facilitation of deportation, as a Canadian academic librarian, I am not entirely free from some degree of complicity as I am employed at a University that subscribes to WESTLAW .

The Intercept is reporting on Thomson Reuters response to Privacy International’s letter to TRI CEO Jim Smith expressing the watchdog group’s “concern” over the company’s involvement with ICE. According to The Intercept article “Thomson Reuters Special Services sells ICE ‘a continuous monitoring and alert service that provides real-time jail booking data to support the identification and location of aliens’ as part of a $6.7 million contract, and West Publishing, another subsidiary, provides ICE’s “Detention Compliance and Removals” office with access to a vast license-plate scanning database, along with agency access to the Consolidated Lead Evaluation and Reporting, or CLEAR, system.” The two contracts together are worth $26 million. The article observes that “the company is ready to defend at lease one of those contracts while remaining silent on the rest.”

“Thomson Reuters defends $26 million contracts with ICE”
by Joe Hodnicki (Law Librarian Blog) on June 28, 2018

I also work at a library that subscribes to products that are provided by Elsevier and whose parent company is the RELX Group.

In 2015, Reed Elsevier rebranded itself as RELX and moved further away from traditional academic and professional publishing. This year [2018], the company purchased ThreatMetrix, a cybersecurity company that specializes in tracking and authenticating people’s online activities, which even tech reporters saw as a notable departure from the company’s prior academic publishing role.

Surveillance and Legal Research Providers: What You Need to Know“, Sarah Lamdan, Medium, July 6, 2018.

Welcome to 2019. There is work to do and it’s time to start.

What ruined the web was the lack of good library software

In some libraries, there are sometimes particular collections in which the objects are organized by the order in which they were acquired (at my place of work, our relatively small collection of movies on DVD are ordered this way). This practice makes it easy for a person to quickly see what has been most recently been received or what’s been newly published. Such collections are easy to start and maintain as you just have to sort them by ‘acquisition number’.

But you would be hard pressed to find a good reason to organize a large amount of material this way. Eventually a collection grows too large to browse in its entirety and you have people telling you that they would rather browse the collection by author name, or by publication year, or by subject. But to allow for this means organizing the collection and let me tell you my non-library staff friends, such organization is a lot of bother — it takes time, thought and consistent diligence.

Which is why we are where we are with today’s state of the web.

Early homepages were like little libraries…

A well-organized homepage was a sign of personal and professional pride — even if it was nothing but a collection of fun gifs, or instructions on how to make the best potato guns, or homebrew research on gerbil genetics.

Dates didn’t matter all that much. Content lasted longer; there was less of it. Older content remained in view, too, because the dominant metaphor was table of contents rather than diary entry.

Everyone with a homepage became a de facto amateur reference librarian.

Obviously, it didn’t last.

The above is from a short essay by Amy Hoy about Moveable Type – one of the first blogging platforms – and how MT and other blogging platforms that facilitated easy chronological ordering of blog posts may be have been the true culprit that ruined the web.

Movable Type didn’t just kill off blog customization.

It (and its competitors) actively killed other forms of web production.

Non-diarists — those folks with the old school librarian-style homepages — wanted those super-cool sidebar calendars just like the bloggers did. They were lured by the siren of easy use. So despite the fact that they weren’t writing daily diaries, they invested time and effort into migrating to this new platform.

They soon learned the chronostream was a decent servant, but a terrible master.

We no longer build sites. We generate streams.

All because building and maintaining a library is hard work.

[The above was first shared in my weekly newsletter University of Winds which provides three links to wonderful and thought-provoking things in the world every Saturday morning].

OK ScholComm – time for some game theory

I have approximate knowledge of when I was first introduced to game theory. It was the late 1980s and I was in a classroom and we were shown a documentary featured The Prisoner’s Dilemma (which is best understood through Nicky Case’s The Evolution of Trust).

Some idle googling on my part makes me think that the documentary might have been ‘Nice Guys Finish First’ by not-so-nice guy Richard Dawkins but I am more inclined to think it was a PBS documentary.

What I can say with much more confidence is that whatever documentary I happened to have watched combined with my subscription to The Whole Earth Review and primed me for a future interest in population biology that I pursued at university until I switched from a degree in biology to one in Geography and Environmental Science.

I have much more specific knowledge of when I first became interested in the theory of games.

Years ago I bought off the newsstand the September 2003 issue of Games Magazine despite the fact that the magazine was clearly more about puzzles than games. From that issue I discovered that the puzzles contained were all way above my ability but there was this one article that caught my attention: Metagaming 101 by W, Eric Martin. The article begins:

Games without change, like War and Chutes & Ladders, are games without choices; they incorporate change only in the smallest, most random ways. Other than choosing to play or quit, players of these games can do nothing more than follow fate’s fickle finger until a winner emerges. Only children have patience for such games; more experienced players yearn for a higher level of change and the choices that accompany it.

At the other end of the change continuum lies chaos, a swirling mass of rules and playing pieces that survive only on whim. The perfect example: Calvinball. Again, only children can tolerate such games; other players require a structured set of rules for change that they can refer to as needed.

But there are game designers who encourage rule-breaking via the concept of *meta-rules* — that is, rules with a game that change the rules of the game itself. With meta-rules, players can explore any point they wish on a change continuum simply by altering the rules of a game.

from Metagaming 101

Game theory is not the same as the theory of games. Game theory is “the study of mathematical models of conflict and cooperation between intelligent rational decision-makers.” This means that you can choose to employ a variety of different types of game theory in certain games.

Since September 2003, I have read several books of the theory of games including A Theory of Fun for Game Design, The Art of Game Design: A Book of Lenses, Rules of Play: Game Design Fundamentals, How to Do Things with Videogames, What Video Games Have to Teach Us about Learning and Literacy, Play Anything: The Pleasure of Limits, the Uses of Boredom, and the Secret of Games, and Minds on Fire: How Role-Immersion Games Transform College.

Now, the reading of books does not make one an expert and I don’t consider myself an expert on the theory of games. I have approximate knowledge of the theory of games.

 

 

I sometimes joke that the true purpose of metrics within scholarly communication is to avoid reading.

This is an allusion to the common practice of many tenure and promotion committees whose members don’t read the research of the scholar who they are assessing. Instead, they tally up the number of prominent journals that the scholar has published in. The perceived quality of the journal is transmuted into the perceived quality of the work that the scholar has produced.

And so, as the smallest gesture against this state of affairs, I have decided to celebrate the reading of scholarship. Well, I’m going to try to read more of it.

Last week I read Calvinball: User’s Rights, Public Choice Theory and Rules Mutable Games by Bob Tarantino in The Windsor Yearbook of Access to Justice. Its abstract:

This article proposes the “rules mutable game” as a metaphor for understanding the operation of copyright reform. Using the game of Calvinball (created by artist  Bill Watterson in his long-running comic strip Calvin & Hobbes) as an illustrative device, and drawing on public choice theory’s account of how political change is effected by privileged interests, the article explores how the notion of a game in which players can modify the rules of the game while it is being played accounts for how users are often disadvantaged in copyright reform processes. The game metaphor also introduces a normative metric of fairness into the heart of the assessment of the copyright reform process from the standpoint of the user. The notion of a rules mutable game tells us something important about the kinds of stories we should be telling about copyright and copyright reform. The narrative power of the “fair play” norm embedded in the concept of the game can facilitate rhetoric which does not just doom users to dwell on their political losses, but empowers them to strategize for future victories.

I enjoyed the article but I would like to spend a little time on Tarantino’s assertion that a “game metaphor contains an inherent ethical vision.” While I take his point that most of us assume that all games are fair, I don’t think Calvinball is the game metaphor that one should first reach for, especially as law itself is already a rules-mutable system.

I would suggest instead to consider the concept of the infinite game.

Here’s the blurb from Finite and Infinite Games

Finite games are the familiar contests of everyday life; they are played in order to be won, which is when they end. But infinite games are more mysterious. Their object is not winning, but ensuring the continuation of play. The rules may change, the boundaries may change, even the participants may change—as long as the game is never allowed to come to an end.

From Kevin Kelly:

The goal of the infinite game is to keep playing — to explore every way to play the game, to include all games, all possible players, to widen what is meant by playing, to spend all, to hoard nothing, to seed the universe with improbable plays, and if possible to surpass everything that has come before.

Games rules, incidentally, are uncopyrightable and this holds true for video games rules as well.

 

From Metagaming 101:

THE KING OF CHANGE

Nearly every game discussed thus far, no matter how successful on its own, owes a debt to Nomic, a rule-changing game that has spawned hundreds of variations over the past two decades.

Nomic was created in 1982 by Peter Suber, a professor of philosophy at Earlham College, as an appendix to his book The Paradox of Self-Amendment. This book explored the possible complications of a government system (such as that of the U.S.) in which a constitution includes rules for self-amendment. As Suber wrote, “While self-amendment appears to be an esoteric feature of law, capturing it in a game creates a remarkably complete microcosm of a functional legal system.

As created, Nomic consists of a two-tiered system of 16 “immutable” and 13 “mutable” rules. Players take turns proposing rule changes and new amendments, and earn points by voting and throwing a die. The first player to achive 100 points wins.

As dry as this sounds, games of Nomic can quickly explode in unimaginable directions. Perhaps the winner must now achieve 1,000 points — make that 1,000 points and the title “Supreme Overlord.” How does a player become titled? Propose a rule. On second thought, forget points; let’s give every rule a color and now someone wins by passing proposals that are colored green, red, and brown. “The ability of Nomic to change itself is a wonderful thing,” says Kevan Davis. “If the game ever starts to become boring, it change to whatever people think is less boring. If it’s going to fast, it can be slowed down; if it’s going to slowly, it can be speeded up. If people think it could use fewer dice and more rubber-band firing, then it gets fewer dice and more rubber-band firing.”

Is it coincidence that the King of Change is the same Peter Suber who helped define and promote Open Access in academia?

 

 

Here’s a book that I haven’t read: The Glass Bead Game by Herman Hesse. I am going to trust Wikipedia that the description of the book is accurate:

The Glass Bead Game takes place at an unspecified date centuries into the future. Hesse suggested that he imagined the book’s narrator writing around the start of the 25th century. The setting is a fictional province of central Europe called Castalia, which was reserved by political decision for the life of the mind; technology and economic life are kept to a strict minimum. Castalia is home to an austere order of intellectuals with a twofold mission: to run boarding schools for boys, and to cultivate and play the Glass Bead Game, whose exact nature remains elusive and whose devotees occupy a special school within Castalia known as Waldzell. The rules of the game are only alluded to—they are so sophisticated that they are not easy to imagine. Playing the game well requires years of hard study of music, mathematics, and cultural history. The game is essentially an abstract synthesis of all arts and sciences. It proceeds by players making deep connections between seemingly unrelated topics… The plot chronicles Knecht’s education as a youth, his decision to join the order, his mastery of the Game, and his advancement in the order’s hierarchy to eventually become Magister Ludi, the executive officer of the Castalian Order’s game administrators.

This is not the only time I have witnessed academia being understood as a game.

I read Scott Nicholson’s delightful Quest for Tenue: A Chose-Your-Own Adventure when I visited the Rare Books Room of the Stephen A. Schwarzman Building of the New York Public Library. Scott was one of many contributors to a book written in a single night called 100 ways to make history.

And earlier this week I learned from this video about the concept of chmess which was coined by philosopher Daniel C. Dennett in the article, Higher-order truths about chmess [pdf]

What is chmess you might ask?

Chess is a deep and important human artifact, about which much of value has been written. But some philosophical research projects are more like working out the truths of chmess. Chmess is just like chess except that the king can move two squares in any direction, not one. I just invented it—though no doubt others have explored it in depth to see if it is worth playing. Probably it isn’t. It probably has other names. I didn’t bother investigating these questions because although they have true answers, they just aren’t worth my time and energy to discover. Or so I think. There are just as many a priori truths of chmess as there are of chess (an infinity), and they are just as hard to discover. And that means that if people actually did get involved in investigating the truths of chmess, they would make mistakes, which would need to be corrected, and this opens up a whole new field of a priori investigation, the higher-order truths of chmess, such as the following:
1. Jones’ (1989) proof that p is a truth of chmess is
flawed: he overlooks the following possibility …
2. Smith’s (2002) claim that Jones’ (1989) proof is
flawed presupposes the truth of Brown’s lemma
(1975), which has recently been challenged by
Garfinkle (2002)

Dennett holds the playing of chmess is much more of a concern of philosophy than of other disciplines because:

Philosophy is an a priori discipline, like mathematics, or at least it has an a priori methodology at its core, and this fact cuts two ways. On the one hand, it excuses philosophers from spending tedious hours in the lab or the field, and from learning data-gathering techniques, statistical methods, geography, history, foreign languages …, empirical science, so they have plenty of time for honing their philosophical skills. On the other hand, as is often noted, you can make philosophy out of just about anything, and this is not always a blessing.

Knowing this, is it surprising that philosophy journals have some of the lowest acceptance rates in all of scholarship? (ht Ryan Reiger).

There is another written work that really got me thinking about the University not necessarily as a game but as an institution of productive leisure but I cannot cite it or quote from it.

The reasons for this might have something to do with citation counts.

 

 

Please allow me to make a sweeping generalization: reputation is the coin of the realm of academia. Not citation counts.

And yet there are many software platforms that are currently being sold that presents the number of citations as some sort of scoring system.

Who has the high score at your institution? Just check Google Scholar.

I think we should be more mindful of the types of behaviours we are implicitly and explicitly encouraging by choosing to rank scholars, research labs, and institutions by number of citations, alone.

If we want to develop better scoring systems, I think we could learn from game designers:

 

 

The following is an excerpt from my contribution to “Librarian Origin Story” in Schroeder, R., Deitering, AM, Stoddart, R., The Self as Subject: Autoethnographic Research into Identity, Culture, and Academic Librarianship, Association of College and Research Libraries, 2017.

In 2010, Jane McGonigal had a public conversation with Stewart Brand as part of an event Called The Long Conversation that was put on by The Long Now Foundation. Jane McGonigal started the conversation by bringing up Stewart Brand’s past experience with game design as part of the “New Games Movement” in the late 1970s. McGonigal asked Brand if the New Games movement was designed to “change the world” and Brand said yes, and told her of his game-design origin story

During the late 70s, he and friends were talking about how the Cold War was being played out by “rules” that would only result in bad endings for everyone and as such, the rules of the Cold War needed to change. And Brand thought about when he was a kid, when he and his friends changed the rules all the time. For example, kids would change the rules of the game of stickball  that they were playing to accommodate any new kids who arrived to play. And so he and his friends started creating New Games for adults to explore and play in a world that they would rather live in.

Also in 2010, I was invited to be participant in the Evoke Summit held at the World Bank headquarters in Washington DC where I had the chance to meet and thank Jane McGonigal in person. The summit was a reward for the winners of the game who had come up with their winning proposals for social entrepreneurial projects and the two days were filled with activities geared to making those proposals a reality.  One of the activities was to work on a short memorable tagline for one’s work that would distill the essence of who you are and what you want to achieve. Eventually I came up with this phrase for myself that I still feature on my professional portfolio: Changing the rules so more can win.

Bret Victor’s Bookshelf

A couple of posts ago, I wrote a somewhat unorthodox introduction to the work of Bret Victor. In it, I brought the reader’s attention to a recent article from The Atlantic called The Scientific Paper is Obsolete.

 

I know that this article had already made the rounds among some library people because I saw the piece being recommended and retweeted online. Chris Bourg, Director of Libraries at MIT chose not to read this article.

Not to be presumptuous, but I like to think that I understand her reasons and her reaction. I say this because whenever I read a list – especially a list that promises some form of universal canon (oh, say for a manual for civilization) and there are few to no women or non-white people (or non-white women), more often than not that list registers to me as deficient.

You cannot be well-read until you read the work of women.

So what are we to make of the gender balance of the works on Bret Victor’s esteemed bookshelf?

Are Bret’s reading choices any of our business? Maybe not.

Although…. they might be if they are the same books that are being used to form the canon of DynamicLand.

 

Enough with my moral reproach, scolding and lecturing! Let me tell you about DynamicLand! Because gender representation of its bookshelf notwithstanding, I think it’s an absolutely remarkable endeavour.

Seriously, go to the Dynamicland website, take it in, and consider it. Scroll through the videos on their twitter stream. And then, when you can, go deeper and watch Bret Victor’s videos, The Humane Representation of Thought and Seeing Spaces.

Speaking of seeing spaces

If I could offer one additional book for the shelves of Bret Victor and Dynamicland, it would be The science studies reader only because I know it contains Donna Haraway’s Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective.

I have found the idea of situated knowledge very useful as both a feminist and someone who has a degree in science . This work has helped me reconcile these two selves. I also have found the concept useful in some of my own thinking in librarianship (see post: The Observer or Seeing What You Mean)

I like this definition of Situated Knowledge from The Oxford Dictionary of Human Geography:

The idea that all forms of knowledge reflect the particular conditions in which they are produced, and at some level reflect the social identities and social locations of knowledge producers. The term was coined by historian of science Donna Haraway in Simians, Cyborgs, and Women: the Reinvention of Nature (1991) to question what she regarded as two dangerous myths in Western societies. The first was that it is possible to be epistemologically objective, to somehow be a neutral mouthpiece for the world’s truths if one adopts the ‘right’ method of inquiry. The second myth was that science and scientists are uniquely and exclusively equipped to be objective. Haraway was not advocating relativism. Instead, she was calling for all knowledge producers to take full responsibility for their epistemic claims rather than pretending that ‘reality’ has definitively grounded these claims.

We can and should take full responsibility for what we see and to recognize that what we see is what we choose to see.

To not read the works of people who are unlike ourselves is a choice. We can do better. I know I can do better. I’ve made first steps, but I know I c=could do more. I am considering following the lead of Ed Yong who actively pursued a better gender balance in his reporting:

We can’t see if we don’t even try to look.

Chasing Shadows

Last Monday when Dr. Rajiv Jhangiani opened his keynote at the 2018 Open Education Summit, one of the first things he did was place his work in the context of bell hooks and Jesse Strommel. And after hearing this my internal voice said to itself, “O.K. now I know where he’s coming from.”

It’s an admitted generalization but let me suggest that when academics compose a scholarly article they tend to introduce their work with a positioning statement that expresses the tradition of thought that their work extends. This might be done explicitly like Dr. Jhangiani did in his keynote or quietly through the careful choice of whose definitions were used to set the table for the work.

The adjective ‘scientific’ is not attributed to isolated texts that are able to oppose the opinion of the multitude by virtue of some mysterious faculty. A document becomes scientific when its claims stop being isolated and when the number of people engaged in publishing it are many and explicitly indicated in the text. When reading it, it is on the contrary the reader who becomes isolated. The careful marking of the allies’ presence is the first sign that the controversy is now heated enough to generate technical documents.

Latour B. Science in action: how to follow scientists and engineers through society. Cambridge: Harvard University Press; 2005. p. 33.

If scholarly communication is a conversation, then we can think of journals as parlors, where you can expect certain conversations are taking place. If your work becomes a frequent touchpoint of these conversations you get… a high H-index?

As someone who is only five months into my position of Scholarly Communications Librarian, I’ve been particularly mindful of how people talk about scholarship and the various measures and analytics we use to describe scholarly work.

I try to keep in mind that metrics are just a shadow of an object. You can make a shadow larger by trying to angle yourself in various ways towards the sun but you shouldn’t forget that when you make your shadow larger this way, the object casting the shadow does not change.

I was approached recently by a peer who had a faculty member tell them that they are hesitant to add their work in the university’s repository because they are afraid that it would take away from the linking to their work on SSRN and thus would diminish their Google Scholar ranking.

What should be the response to these concerns? One thing we could do is reassure them that we are doing all we can [ethically] do to maximize the SEO of our IR.

But I believe that it would be better to express our work not in terms of links and citation counts but rather in terms of potential readership.

We could try to reframe the conversation so it didn’t seem so much of a zero-sum game. There are a set of readers who will discover work as a pre-print in SSRN and there will be another set of readers who will be interested in the work that they discover in an institutional repository. These interested readers could include a potential graduate student who is looking for an advisor to work for. It could be someone who has discovered the work in the IR because we allow other subject specific sites to index our institutional repository. It could be the local press. And, if the fears of SSRN link-cannibalization are still strong, we can always offer to place the work in the IR under a short-term embargo.

When we only think of metrics, we end up chasing shadows.

When faculty member assesses the quality of a peer’s work, they take the publication source as a measure of quality of that work. The unsaid rule is that each scholar, if they could, will always publish in the highest ranked journal in their field and any choice to publish anywhere else must be only because the work in question was not good enough. Any article published in a higher ranked journal is better than any article in a lower ranked journal.

And yet it’s easy to forget that the ranking behind ‘highly ranked journals’ are calculated using formulas that process the collected sum and speed of citation. In the end, journal ranking can also be re-considered as a measure of readership

Instead of positioning our work as ‘how to increase your h-index’, we should not forget that each citation is an author who we can also consider (perhaps charitably) a reader.

When I was the lead of Open Data Windsor Essex, we hosted a wonderful talk from Detroiter Alex Hill called Giving Data Empathy.  What he reminded us in his talk was that behind each data point in his work was a person and that it was essential to remember how diminished that person is when they are reduced to a ‘count’.

Let’s remember this as well.

Every data point, a reader.