The Provenance of Facts

Brian Feldman has a newsletter called BNet and on May 30th, he published an insightful and whimsical take on facts and Wikipedia called mysteries of the scatman.

The essay is an excellent reminder that if a fact without proper provenance makes it way into Wikipedia and is then published in a reputable source, it is nearly impossible to remove said fact from Wikipedia.

Both the Scatman John and “Maps” issues, however, point to a looming vulnerability in the system. What happens when facts added early on in Wikipedia’s life remain, and take on a life of their own? Neither of these supposed truths outlined above can be traced to any source outside of Wikipedia, and yet, because they initially appeared on Wikipedia and have been repeated elsewhere, they are now, for all intents and purposes, accepted as truth on Wikipedia. It’s twisty.

mysteries of the scatman

This is not a problem of only Wikipedia. Last year I addressed a similar issue in an Information Literacy class for 4th year Political Science students when I encouraged students to follow the citation pathways of the data that they plan to cite. I warned them not to fall for academic urban legends:

Spinach is not an exceptional nutritional source of iron. The leafy green has iron, yes, but not much more than you’d find in other green vegetables. And the plant contains oxalic acid, which inhibits iron absorption.

Why, then, do so many people believe spinach boasts such high iron levels? Scholars committed to unmasking spinach’s myths have long offered a story of academic sloppiness. German chemists in the 1930s misplaced a decimal point, the story goes. They thus overestimated the plant’s iron content tenfold.

But this story, it turns out, is apocryphal. It’s another myth, perpetuated by academic sloppiness of another kind. The German scientists never existed. Nor did the decimal point error occur. At least, we have no evidence of either. Because, you see, although academics often see themselves as debunkers, in skewering one myth they may fall victim to another.

In his article “Academic Urban Legends,” Ole Bjorn Rekdal, an associate professor of health and social sciences at Bergen University College in Norway, narrates the story of these twinned myths. His piece, published in the journal Social Studies of Science, argues that through chains of sloppy citations, “academic urban legends” are born. Following a line of lazily or fraudulently employed references, Rekdal shows how rumor can become acknowledged scientific truth, and how falsehood can become common knowledge.

Academic Urban Legends“, Charlie Tyson, Inside Higher Ed, August 6, 2014

I’m in the process of working on an H5P learning object dedicated to how to calculate one’s H-Index and yet, I’m conflicted about doing so. There are many reasons why using citations as a measure of an academic’s value is problematic for reasons far beyond the occasional academic urban legend:

To weed out academic urban legends, Rekdal says editors “should crack down violently on every kind of abuse of academic citations, such as ornamental but meaningless citations to the classics, or exchanges in citation clubs where the members pump up each other’s impact factors and h-indexes.”

Yet even Rekdal – who debunks the debunkers – says his citation record isn’t flawless.

“I have to admit that I published an article two decades ago where I included an academically completely meaningless reference (without page numbers of course) to a paper written by a woman I was extremely in love with,” he said. “I am still a little ashamed of what I did. But on the other hand, the author of that paper has now been my wife for more than 20 years.”

Academic Urban Legends“, Charlie Tyson, Inside Higher Ed, August 6, 2014

Considering dark deposit

I have a slight feeling of dread.

In the inbox of the email address associated with MPOW’s institutional repository are more than a dozen notifications that a faculty member has deposited their research work for inclusion. I should be happy about this. I should be delighted that a liaison librarian spoke highly enough of the importance of the institutional repository at a faculty departmental meeting and inspired a researcher to fill in a multitude of forms so their work can be made freely available to readers.

But I don’t feel good about this because a cursory look of what journals this faculty member has published suggests that we can include none of the material in our IR due to restrictive publisher terms.

This is not a post about the larger challenges of Open Access in the current scholarly landscape. This post is a consideration of a change of practice regarding IR deposit, partly inspired by the article, Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records.

Institutional repository managers are continuously looking for new ways to demonstrate the value of their repositories. One way to do this is to create a more inclusive repository that provides reliable information about the research output produced by faculty affiliated with the institution.

Bjork, K., Cummings-Sauls, R., & Otto, R. (2019). Opening Up Open Access Institutional Repositories to Demonstrate Value: Two Universities’ Pilots on Including Metadata-Only Records. Journal of Librarianship and Scholarly Communication, 7(1). DOI: http://doi.org/10.7710/2162-3309.2220

I read the Opening Up… article with interest because a couple of years ago, when I was the liaison librarian for biology, I ran an informal pilot in which I tried to capture the corpus of the biology department. During this time, for those articles from publishers who did not allow publisher PDF versions of deposit and authors who were not interested in depositing a manuscript version, I published the metadata of these works instead.

But part way through this pilot, I abandoned the practice. I did so for a number of reasons. One reason was that the addition of their work to the Institutional Repository did not seem to prompt faculty to start depositing their research on their volition. This was not surprising as BePress doesn’t allow for the integration of author profiles directly into it’s platform (one must purchase a separate product for author profiles and the ability to generate RSS feeds at the author level). So I was not particularly disappointed with this result. While administrators are increasingly interested in demonstrating research outputs at the department and institutional level, you can still generalize faculty as more invested in subject-based repositories.

But during this trial I uncovered a more troubling reason that suggested that uploading citations might be problematic. I came to understand that most document harvesting protocols – such as OAI-PMH and OpenAIRE – do not provide any means by which one can differentiate between metadata-only records and full text records. Our library system harvests our IR and it assumes that every item in IR has a full-text object associated with it. Other services that harvest our IR do the same. To visit the IR is to expect the full text of a text.

But the reason that made me stop the experiment pretty much immediately was reading this little bit of hearsay on Twitter:

Google and Google Scholar are responsible for the vast majority of our IR’s traffic and use. In many disciplines the percentage of Green OA articles as a percentage of total faculty output is easily less than 25%. To publish citations when the fulltext of a pre-print manuscript is not made available to the librarian, is ultimately going to test whether Google Scholar really does have an full-text threshold. And then what do we do when we find our work suddenly gone from search results?

Yet, the motivation to try to capture the whole of a faculty’s work still remains. An institutional repository should be a reflection of all the research and creative work of the institution that hosts it.

If an IR is not able to do this work, an institution is more likely to invest in a CRIS – a Current Research Information System – to represent the research outputs of the organization.

Remember when I wrote this in my post from March of this year?

When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:

– a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)

– a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)

– a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)

– a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)

If the map becomes the territory than we will be lost

The marketplace for CRIS is no different:

But I think the investment in two separate products – a CRIS to capture the citations of a faculty’s research and creative output and an IR to capture the fulltext of the same, still seems a shame to pursue. Rather than invest a large sum of money for the quick win of a CRIS, we should invest those funds into an IR that can support data re-use, institutionally.

(What is the open version of the CRIS? To be honest, I don’t know this space very well. From what I know at the moment, I would suggest it might be the institutional repository + ORCiD and/or VIVO.)

I am imagining a scenario in which every article-level work that a faculty member of an institution has produced is captured in the institutional repository. Articles that are not allowed to be made open access are embargoed until they are in the public domain.

But to be honest, I’m a little spooked because I don’t see many other institutions engaging in this practice. Dark deposit does exist in the literature but it largely appears in the early years of the conversations around scholarly communications practice. The most widely cited article about the topic (from my reading not from a proper literature review), is this 2011 article called The importance of dark deposit from Stewart Sheiber. His blog is licensed as CC-BY, so I’m going to take advantage of this generosity and re-print the seven reasons why dark is better than missing:

  1. Posterity: Repositories have a role in providing access to scholarly articles of course. But an important part of the purpose of a repository is to collect the research output of the institution as broadly as possible. Consider the mission of a university archives, well described in this Harvard statement: “The Harvard University Archives (HUA) supports the University’s dual mission of education and research by striving to preserve and provide access to Harvard’s historical records; to gather an accurate, authentic, and complete record of the life of the University; and to promote the highest standards of management for Harvard’s current records.” Although the role of the university archives and the repository are different, that part about “gather[ing] an accurate, authentic, and complete record of the life of the University” reflects this role of the repository as well.Since at any given time some of the articles that make up that output will not be distributable, the broadest collection requires some portion of the collection to be dark.
  2. Change: The rights situation for any given article can change over time — especially over long time scales, librarian time scales — and having materials in the repository dark allows them to be distributed if and when the rights situation allows. An obvious case is articles under a publisher embargo. In that case, the date of the change is known, and repository software can typically handle the distributability change automatically. There are also changes that are more difficult to predict. For instance, if a publisher changes its distribution policies, or releases backfiles as part of a corporate change, this might allow distribution where not previously allowed. Having the materials dark means that the institution can take advantage of such changes in the rights situation without having to hunt down the articles at that (perhaps much) later date.
  3. Preservation: Dark materials can still be preserved. Preservation of digital objects is by and large an unknown prospect, but one thing we know is that the more venues and methods available for preservation, the more likely the materials will be preserved. Repositories provide yet another venue for preservation of their contents, including the dark part.
  4. Discoverability: Although the articles themselves can’t be distributed, their contents can be indexed to allow for the items in the repository to be more easily and accurately located. Articles deposited dark can be found based on searches that hit not only the title and abstract but the full text of the article. And it can be technologically possible to pass on this indexing power to other services indexing the repository, such as search engines.
  5. Messaging: When repositories allow both open and dark materials, the message to faculty and researchers can be made very simple: Always deposit. Everything can go in; the distribution decision can be made separately. If authors have to worry about rights when making the decision whether to deposit in the first place, the cognitive load may well lead them to just not deposit. Since the hardest part about running a successful repository is getting a hold of the articles themselves, anything that lowers that load is a good thing. This point has been made forcefully by Stevan Harnad. It is much easier to get faculty in the habit of depositing everything than in the habit of depositing articles subject to the exigencies of their rights situations.
  6. Availability: There are times when an author has distribution rights only to unavailable versions of an article. For instance, an author may have rights to distribute the author’s final manuscript, but not the publisher’s version. Or an art historian may not have cleared rights for online distribution of the figures in an article and may not be willing to distribute a redacted version of the article without the figures. The ability to deposit dark enables depositing in these cases too. The publisher’s version or unredacted version can be deposited dark.
  7. Education: Every time an author deposits an article dark is a learning moment reminding the author that distribution is important and distribution limitations are problematic.

There is an additional reason for pursuing a change of practice to dark deposit that I believe is very significant:

There are at least six types of university OA policy. Here we orga-nize them by their methods for avoiding copyright troubles…

3. The policy seeks no rights at all, but requires deposit in the repository. If the institution already has permission to make the work OA, then it makes it OA from the moment of deposit. Otherwise the deposit will be “dark” (non-OA) (See p. 24) until the institution can obtain permission to make it OA. During the period of dark deposit, at least the metadata will be OA.

Good Practices For University Open-Access Policies, Stuart Shieber and Peter Suber, 2013

At least the metadata will be OA is a very good reason to do dark deposit. It might be reason enough. I share many of Ryan Regier’s enthusiasm for Open Citations that he explains in his post, The longer Elsevier refuses to make their citations open, the clearer it becomes that their high profit model makes them anti-open

Having a more complete picture of how much an article has been cited by other articles is an immediate clear benefit of Open Citations. Right now you can get a piece of that via the above tools I’ve listed and, maybe, a piece is all you need. If you’ve got an article that’s been cited 100s of times, likely you aren’t going to look through each of those citing articles. However, if you’ve got an article or a work that only been cited a handful of times, likely you will be much more aware of what those citing articles are saying about your article and how they are using your information.

Ryan Regier,The longer Elsevier refuses to make their citations open, the clearer it becomes that their high profit model makes them anti-open

Regier takes Elsevier to task, because Elsevier is one of the few major publishers remaining that refuses to make their citations OA.

I4OC requests that all scholarly publishers make references openly available by providing access to the reference lists they submit to Crossref. At present, most of the large publishers—including the American Physical Society, Cambridge University Press, PLOS, SAGE, Springer Nature, and Wiley—have opened their reference lists. As a result, half of the references deposited in Crossref are now freely available. We urge all publishers who have not yet opened their reference lists to do so now. This includes the American Chemical Society, Elsevier, IEEE, and Wolters Kluwer Health. By far the largest number of closed references can be found in journals published by Elsevier: of the approximately half a billion closed references stored in Crossref, 65% are from Elsevier journals. Opening these references would place the proportion of open references at nearly 83%.

Open citations: A letter from the scientometric community to scholarly publishers

There would be so much value unleashed if we could release the citations to our faculty’s research as open access.

Open Citations could lead to new ways of exploring and understanding the scholarly ecosystem. Some of these potential tools were explored by Aaron Tay in his post, More about open citations — Citation Gecko, Citation extraction from PDF & LOC-DB.

Furthermore, releasing citations as OA would enable them to be added to platforms such as Wikidata and available for visualization using the Scholia tool, pictured above.

So that’s where I’m at.

I want to change the practice at MPOW to include all published faculty research, scholarship, and creative work in the Institutional Repository and if we are unable to publish these works as open access in our IR, we will include it as embargoed, dark deposit until it is confidently in the public domain. I want the Institutional Repository to live up to its name and have all the published work of the Institution.

Is this a good idea, or no? Are there pitfalls that I have not foreseen? Is my reasoning shaky? Please let me know.

If the map becomes the territory then we will be lost

That which computation sets out to map and model it eventually takes over. Google sets out to index all human knowledge and becomes the source and the arbiter of that knowledge: it became what people think. Facebook set out to map the connections between people – the social graph – and became the platform for those connections, irrevocably reshaping societal relationships. Like an air control system mistaking a flock of birds for a fleet of bombers, software is unable to distinguish between the model of the world and reality – and, once conditioned, neither are we.

James Bridle, New Dark Age, p.39.

I am here to bring your attention to two developments that are making me worried:

  1. The Social Graph of Scholarly Communications is becoming more tightly bound into institutional metrics that have an increasing influence on institutional funding
  2. The publishers of the Social Graph of Scholarship are beginning to enclose the Social Graph, excluding the infrastructure of libraries and other independent, non-profit organizations

Normally, I would try to separate these ideas into two dedicated posts but in this case, I want to bring them together in writing because if these two trends converge together, things will become very bad, very quickly.

Let me start with the first trend:

1. The social graph that binds

When I am asked to explain how to achieve a particular result within scholarly communication, more often than not, I find myself describing four potential options:

  1. a workflow of Elsevier products (BePress, SSRN, Scopus, SciVal, Pure)
  2. a workflow of Clarivate products (Web of Science, InCites, Endnote, Journal Citation Reports)
  3. a workflow of Springer-Nature products (Dimensions, Figshare, Altmetrics)
  4. a DIY workflow from a variety of independent sources (the library’s institutional repository, ORCiD, Open Science Framework)

Workflow is the new content.

That line – workflow is the new content – is from Lorcan Dempsey and it was brought to my attention by Roger Schonfeld. For Open Access week, I gave a presentation on this idea of being mindful of workflow and tool choices in a presentation entitled, A Field Guide to Scholarly Communications Ecosystems. The slides are below.

(My apologies for not sharing the text that goes with the slides. Since January of this year, I have been the Head of the Information Services Department at my place of work. In addition to this responsibility, much of my time this year has been spent covering the work of colleagues currently on leave. Finding time to write has been a challenge.)

In Ontario, each institution of higher education must submit a ‘Strategic Mandate Agreement‘ with its largest funding body, the provincial government. Universities are currently in the second iteration of these types of agreements and are preparing for the third round. These agreements are considered fraught by many, including Marc Spooner, a professor in the faculty of education at the University of Regina, who wrote the following in an opinion piece in University Affairs:

The agreement is designed to collect quantitative information grouped under the following broad themes: a) student experience; b) innovation in teaching and learning excellence; c) access and equity; d) research excellence and impact; and e) innovation, economic development and community engagement. The collection of system-wide data is not a bad idea on its own. For example, looking at metrics like student retention data between years one and two, proportion of expenditures on student services, graduation rates, data on the number and proportion of Indigenous students, first-generation students and students with disabilities, and graduate employment rates, all can be helpful.

Where the plan goes off-track is with the system-wide metrics used to assess research excellence and impact: 1) Tri-council funding (total and share by council); 2) number of papers (total and per full-time faculty); and 3) number of citations (total and per paper). A tabulation of our worth as scholars is simply not possible through narrowly conceived, quantified metrics that merely total up research grants, peer-reviewed publications and citations. Such an approach perversely de-incentivises time-consuming research, community-based research, Indigenous research, innovative lines of inquiry and alternative forms of scholarship. It effectively displaces research that “matters” with research that “counts” and puts a premium on doing simply what counts as fast as possible…

Even more alarming – and what is hardly being discussed – is how these damaging and limited terms of reference will be amplified when the agreement enters its third phase, SMA3, from 2020 to 2023. In this third phase, the actual funding allotments to universities will be tied to their performance on the agreement’s extremely deficient metrics.

Ontario university strategic mandate agreements: a train wreck waiting to happen”, Marc Spooner, University Affairs, Jan 23 2018

The measure by which citation counts for each institution are going to be assessed have already been decided. The Ontario government has already stated that it is going to use Elsevier’s Scopus (although I presume they really meant SciVal).

What could possibly go wrong? To answer that question, let’s look at the second trend: enclosure.

2. Enclosing the social graph

The law locks up the man or woman
Who steals the goose from off the common
But leaves the greater villain loose
Who steals the common from off the goose.

Anonymous, “The Goose and the Commons”

As someone who spends a great deal of time ensuring that the scholarship of the University of Windsor’s Institutional Repository meets the stringent restrictions set by publishers, it’s hard not to feel a slap in the face when reading Springer Nature Syndicates Content to ResearchGate.

ResearchGate has been accused of “massive infringement of peer-reviewed, published journal articles.”

They say that the networking site is illegally obtaining and distributing research papers protected by copyright law. They also suggest that the site is deliberately tricking researchers into uploading protected content.

Who is the they of the above quote? Why they is the publishers, the American Chemical Society and Elsevier.

It is not uncommon to find selective enforcement of copyright within the scholarly communication landscape. Publishers have cast a blind eye to the copyright infringement of ResearchGate and Academia.edu for years, while targeting course reserve systems set up by libraries.

Any commercial system that is part of the scholarly communication workflow can be acquired for strategic purposes.

As I noted in my contribution to Grinding the Gears: Academic Librarians and Civic Responsibility, sometimes companies purchase competing companies as a means to control their development and even to shut their products down.

One of the least understood and thus least appreciated functions of calibre is that it uses the Open Publication Distribution System (OPDS) standard (opds-spec.org) to allow one to easily share e-books (at least those without Digital Rights Management software installed) to e-readers on the same local network. For example, on my iPod Touch, I have the e-reader program Stanza (itunes.apple.com/us/app/stanza/id284956128) installed and from it, I can access the calibre library catalogue on my laptop from within my house, since both are on the same local WiFi network. And so can anyone else in my family from their own mobile device. It’s worth noting that Stanza was bought by Amazon in 2011 and according to those who follow the digital e-reader market, it appears that Amazon may have done so solely for the purpose of stunting its development and sunsetting the software (Hoffelder,2013)

Grinding the Gears: Academic Librarians and Civic Responsibility” Lisa Sloniowski, Mita Williams, Patti Ryan, Urban Library Journal. Vol. 19. No.1 (2013). Special Issue: Libraries, Information and the Right to the city: Proceedings of the 2013 LACUNY Institute.

And sometimes companies acquire products to provide a tightly integrated suite of services and seamless workflow.

If individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?

And, indeed, whatever model the university may select, if individual researchers determine that seamlessness is valuable to them, will they in turn license access to a complete end-to-end service for themselves or on behalf of their lab?  So, the university’s efforts to ensure a more competitive overall marketplace through componentization may ultimately serve only to marginalize it.

“Big Deal: Should Universities Outsource More Core Research Infrastructure?”, Roger C. Schonfeld, January 4, 2018

Elsevier bought BePress in August of 2017. In May of 2016, Elsevier acquired SSRN. Bepress and SSRN are currently exploring further “potential areas of integration, including developing a single upload experience, conducting expanded research into rankings and download integration, as well as sending content from Digital Commons to SSRN.

Now, let’s get to the recent development that has me nervous.

10.2 Requirements for Plan S compliant Open Access repositories

The repository must be registered in the Directory of Open Access Repositories (OpenDOAR) or in the process of being registered.

In addition, the following criteria for repositories are required:

  • Automated manuscript ingest facility
  • Full text stored in XML in JATS standard (or equivalent)
  • Quality assured metadata in standard interoperable format, including information on the DOI of the original publication, on the version deposited (AAM/VoR), on the open access status and the license of the deposited version. The metadata must fulfil the same quality criteria as Open Access journals and platforms (see above). In particular, metadata must include complete and reliable information on funding provided by cOAlition S funders. OpenAIRE compliance is strongly recommended.
  • Open API to allow others (including machines) to access the content
  • QA process to integrate full text with core abstract and indexing services (for example PubMed)
  • Continuous availability

Automated manuscript ingest facility probably gives me the most pause. Automated means a direct pipeline from publisher to institutional repository that could be based on a publishers’ interpretation of fair use/fair dealing and we don’t know what the ramifications of that decision making might be. I’m feeling trepidation because I believe we are already experiencing the effects of a tighter integration between manuscript services and the IR.

Many publishers – including Wiley, Taylor and Francis, IEEE, and IOP – already use a third party manuscript service called ScholarOne. ScholarOne integrates the iThenticate service which produces reports of what percentage of a manuscript has already been published. Journal editors have the option to set what extent a paper can make use of a researcher’s prior work, including their thesis. Manuscripts that exceed these set thresholds can be automatically rejected without human interjection from the editor. We are only just starting to understand how this workflow is going to impact the willingness of young scholars to make their theses and dissertations open access.

It is also worth noting that ScholarOne is owned by Clarivate Analytics, the parent company of Web of Science, Incites, Journal Citation Reports, and others. One on hand, having a non-publisher act as a third party to the publishing process is probably ideal since it reduces the chances of a conflict of interest. On the other hand, I’m very unhappy with Clarivate Analytics’s product called Kopernio which provides “fast, one-click access to millions of research papers” and “integrates with Web of Science, Google Scholar, PubMed” and 20,000 other sites” (including ResearchGate and Academia.edu natch). There are prominent links to Kopernio within Web of Science that essentially positions the product as a direct competitor to a university library’s link resolver service and in doing so, removes the library from the scholarly workflow – other than the fact that the library pays for the product’s placement.

The winner takes it all

The genius — sometimes deliberate, sometimes accidental — of the enterprises now on such a steep ascent is that they have found their way through the looking-glass and emerged as something else. Their models are no longer models. The search engine is no longer a model of human knowledge, it is human knowledge. What began as a mapping of human meaning now defines human meaning, and has begun to control, rather than simply catalog or index, human thought. No one is at the controls. If enough drivers subscribe to a real-time map, traffic is controlled, with no central model except the traffic itself. The successful social network is no longer a model of the social graph, it is the social graph. This is why it is a winner-take-all game.

Childhood’s End, Edge, George Dyson [1.1.19]

Bret Victor, Bruno Latour, the citations that bring them together, and the networks that keep them apart

Occasionally I have the opportunity to give high school students an introduction to research in a university context. During this introduction I show them an example of a ‘scholarly paper’ so they can take in the visual cues that might help them recognize other scholarly papers in their future.

 

After I point out the important features, I take the time to point out this piece of dynamic text on the page:

I know these citation counts come from CrossRef because I have an old screen capture that shows that the citation count section used to looks like this:

I tell the students that this article has a unique identifier number called a DOI and that there is a system called CrossRef that tracks how many bibliographies where this number appears.

And then I scan the faces of the room and if I don’t see sufficient awe, I inform the class that a paper’s ability to express its own impact outside of itself is forking amazing.

The ability to make use of the CrossRef API is reserved for CrossRef members with paid memberships or those who pay for access.

This means that individual researchers cannot make use of the CrossRef API and embed their own citation counts without paying CrossRef.

Not even Bret Victor:

 

The image above is from the end of Bret Victor’s CV.

The image below is from the the top: of Bret Victor’s CV which describes himself through the words of two notable others:

 

I like to think that the library is a humane medium that helps thinkers see, understand, and create systems. As such, I think librarians have much to learn from Bret Victor.

Bret Victor designs interfaces and his thinking has been very influential to many. How can  I express the extent of his influence to you?

Bret Victor chooses not to publish in academic journals but rather opts to publish his essays on his website worrydream.com. The videos of some of his talks are available on Vimeo.

Here are the citation counts to these works, according to Google Scholar:

 

 

It is an accepted notion that the normative view of science expounded by Merton, provided a sociological interpretation of citation analysis in the late 1960s and 70s. According to his theory, a recognition of the previous work of scientists and of the originality of their work is an institutional form of awarding rewards for efforts. Citations are a means of providing such recognition and reward.

The above is the opening paragraph of, “Why hasn’t Latour’s Theory of Citations Been Ignored By the Bibliometric Community? Discussion of Sociological Interpretation of Citation Analysis” by Terttu Luukkonen.

Latour’s views of citations are part of his research on the social construction of scientific facts and laboratories, science in the making as contrasted with ready made science, that is beliefs which are treated as scientific facts and are not questioned… In this phase, according to Latour, references in articles are among the resources that are under author’s command in their efforts at trying to “make their point firm” and to lend support to their knowledge claims. Other “allies” or resources are, for example, the editors of the journals which publish the articles, the referees of the journals, and the research funds which finance the pieces of research…

Latour’s theory has an advantage over that of Merton’s in that it can explain many of the findings made in the so-called citation content and context studies mentioned. These findings relate to the contents of citations, which are vastly different and vary from one situation to another; also the fact that the surrounding textual contexts in which they are used differ greatly. Such differences include whether citations are positive or negational, essential to the references text or perfunctory, whether they concern concepts or techniques or neither, whether they provide background reading, alert readers to new work, provide leads, etc.

The above passage is from page 29 of the article.

On page 31, you can find this passage:

The Latourian views have been largely ignored by the bibliographic community if their discussions about citations. The reasons why this is so are intriguing. An important conceptual reason is presumably the fact that in Latourian theory, the major of references is to support the knowledge claims of the citing author. This explanation does not legitimate major uses of citation indexing, its use as a performance measure – as in the use of citation counts which presupposes that references indicate a positive assessment of the cited document — or as an indication of the development of specialties – as in co-citation analysis.

You may have heard of Bret Victor just earlier this week. His work is described of in an article from The Atlantic called The Scientific Paper is Obsolete. Here’s What’s Next.

 

The article contains this passage:

What would you get if you designed the scientific paper from scratch today? A little while ago I spoke to Bret Victor, a researcher who worked at Apple on early user-interface prototypes for the iPad and now runs his own lab in Oakland, California, that studies the future of computing. Victor has long been convinced that scientists haven’t yet taken full advantage of the computer. “It’s not that different than looking at the printing press, and the evolution of the book,” he said. After Gutenberg, the printing press was mostly used to mimic the calligraphy in bibles. It took nearly 100 years of technical and conceptual improvements to invent the modern book. “There was this entire period where they had the new technology of printing, but they were just using it to emulate the old media.”

Victor gestured at what might be possible when he redesigned a journal article by Duncan Watts and Steven Strogatz, “Collective dynamics of ‘small-world’ networks.” He chose it both because it’s one of the most highly cited papers in all of science and because it’s a model of clear exposition. (Strogatz is best known for writing the beloved “Elements of Math” column for The New York Times.)

The Watts-Strogatz paper described its key findings the way most papers do, with text, pictures, and mathematical symbols. And like most papers, these findings were still hard to swallow, despite the lucid prose. The hardest parts were the ones that described procedures or algorithms, because these required the reader to “play computer” in their head, as Victor put it, that is, to strain to maintain a fragile mental picture of what was happening with each step of the algorithm.

Victor’s redesign interleaved the explanatory text with little interactive diagrams that illustrated each step. In his version, you could see the algorithm at work on an example. You could even control it yourself.

The article goes on to present two software driven alternatives to the PDF paper-mimicking practices of academia : notebooks from private company Mathematica and open source Jupyter Notebooks.

Perhaps it was for length or other editorial reasons but the article doesn’t go into Bret Victor’s own work on reactive documents that are best introduced by his self-published essay called ‘Explorable Explanations‘.  There is a website dedicated to collecting dynamic works inspired by Bret’s essay from Nicky Case, who has created some remarkable examples including Parable of the Polygons and The Evolution of Trust.

Or maybe it’s not odd that his work wasn’t mentioned.  From T. Luukkonen’s Latour’s Theory of Citations:

The more people believe in a statement and use it as an unquestioned fact, as a black box, the more it undergoes transformations. It may even undergo a process which Latour calls stylisation or erosion, but which Garfield called obliteration by information, that is, a scientist’s work becomes so generic tot he field, so integrated into its body of knowledge that people neglect to cite it explicitly.

At the end of 2013, Bret Victor published a page of things that ‘Bret fell in love with this year’. The first item on his list was the paper Visualization and Cognition: Drawing Things Together [pdf] from French philosopher, anthropologist and sociologist, Bruno Latour.

On page five of this paper is this passage, which I came across across again and again during my sabbatical when I was doing a lot of reading about maps:

One example will illustrate what I mean. La Pérouse travels through the Pacific for Louis XVI with the explicit mission of bringing back a better map. One day, landing on what he calls Sakhalin he meets with Chinese and tries to learn from them whether Sakhalin is an island or a peninsula. To his great surprise the Chinese understand geography quite well. An older man stands up and draws a map of his island on the sand with the scale and the details needed by La Pérouse. Another, who is younger, sees that the rising tide will soon erase the map and picks up one of La Pérouse’s notebooks to draw the map again with a pencil . . .

What are the differences between the savage geography and the civilized one? There is no need to bring a prescientific mind into the picture, nor any distinction between the close and open predicaments (Horton, 1977), nor primary and secondary theories (Horton, 1982), nor divisions between implicit and explicit, or concrete and abstract geography. The Chinese are quite able to think in terms of a map but also to talk about navigation on an equal footing with La Pérouse. Strictly speaking, the ability to draw and to visualize does not really make a difference either, since they all draw maps more or less based on the same principle of projection, first on sand, then on paper. So perhaps there is no difference after all and, geographies being equal, relativism is right. This, however, cannot be, because La Pérouse does something that is going to create an enormous difference between the Chinese and the European. What is, for the former, a drawing of no importance that the tide may erase, is for the latter the single object of his mission. What should be brought into the picture is how the picture is brought back. The Chinese does not have to keep track, since he can generate many maps at will, being born on this island and fated to die on it. La Pérouse is not going to stay for more than a night; he is not born here and will die far away. What is he doing, then? He is passing through all these places, in order to take something back to Versailles where many people expect his map to determine who was right and wrong about whether Sakhalin was an island, who will own this and that part of the world, and along which routes the next ships should sail.

Science requires a paper to be brought back from our endeavours.

I thought of Latour when I read this particular passage from The Atlantic article:

Pérez told me stories of scientists who sacrificed their academic careers to build software, because building software counted for so little in their field: The creator of matplotlib, probably the most widely used tool for generating plots in scientific papers, was a postdoc in neuroscience but had to leave academia for industry. The same thing happened to the creator of NumPy, a now-ubiquitous tool for numerical computing. Pérez himself said, “I did get straight-out blunt comments from many, many colleagues, and from senior people and mentors who said: Stop doing this, you’re wasting your career, you’re wasting your talent.” Unabashedly, he said, they’d tell him to “go back to physics and mathematics and writing papers.”

What else is software but writing on sand?

I wanted to highlight Bret Victor’s to my fellow library workers for what I thought were several reasons. But the more I thought about it, the more reasons came to mind. But I don’t want to try your patience any longer so consider this a potential beginning of a short series of blog posts.

I’ll end this section with why wrote about Bret Victor, Bruno Latour, and citations. It has to do with this website, Northwestern University’s Faculty Directory powered by Pure:

 

More and more of our academic institutions are making use of Pure and other similar CRISes that create profiles of people that are generated from the texts we write and the citations we make.

Despite Latour, we are still using citations as a performative measurement.

 

I think we need a more humane medium that helps thinkers see and understand the systems we work in.