Open and Sustainable Innovation Systems (OASIS) Lab working notes

Powered by 🌱Roam Garden

Q: Do scholarly synthesis infrastructures already exist?


The history of attempts at synthesis infrastructures is vast.

We have a thriving space of standards for underpinning infrastructure for synthesis

compression resembles the core concerns of standards like std/Micropublications @clarkMicropublicationsSemanticModel2014 and std/Nanopublications @grothAnatomyNanopublication2010, which were developed in part to enable reasoning over more granular units of knowledge.

The property of context resembles the goal of representing evidence @brushSEPIOSemanticModel2016, uncertainty @dewaardFormalisingUncertaintyOntology2012, and provenance of publications @grothAnatomyNanopublication2010.

The idea of formal semantics and composability connect well to the core vision of enabling machine assisted-reasoning and higher-level synthesis in the Semantic Web @kuhnGenuineSemanticPublishing2017, @berners-leePublishingSemanticWeb2001

The history of *successful* attempts is... unclear? It's certainly not mainstream yet, at least in my experience. That is what we mean by infrastructure. No one "inherits" an infrastructure yet, at least to my knowledge.

Let's look at some examples.

The earliest ones in the computing era:

Within the sciences / scholarship, led by bioinformatics, best are under the rubric of "genuine" semantic publishing (term I've seen from @kuhnGenuineSemanticPublishing2017)

Other more recent ones

What we see now is not infrastructure for synthesis. Instead, we see people either resort to all sorts of "hacks" and workarounds, or put in a substantial amount of work to "mine" publications for what they need (for an evocative example of this, see @knightEnslavedTrappedData2019). We have a whole cottage industry that is dedicated to fueling workarounds of this sort, for systematic reviewing. While these hacks often work well enough for the task at hand, they are rarely transferred in systematic ways across projects and people, violating the dimensions of "reach or scope" and "embodiment of standards", laid out by @star1996steps.

Conversely, we see the infrastructure that people do rely on (e.g., Google Scholar, Web of Science, and so on) consistently breaking down and thereby becoming visible when people try to use it for difficult synthesis tasks, especially across disciplines. They also often cannot really transfer their bricolage solutions from previous tasks or projects to these new domains they have to navigate.

It's not that we have made no progress! Indeed, the semantic publishing revolution is indeed underway! We see encouraging developments, in bioinformatics and archeology, for example.

Yet, on the whole, we're not seeing nearly as much of this transformation as we'd like. Uptake of semantic publishing is low and restricted to a small set of power users


reports around 10 million std/Nanopublications published at the time of writing, albeit almost all within bioinformatics, and overwhelmingly dominated by a small (N=41!!) set of authors

@schrimlCOVID19PandemicReveals2020 report that

in the International Nucleotide Sequence Database Collaboration (INSDC, there are 2.1 million Sequence Read Archive (SRA) experiments listed under the taxonomy term “metagenomes”, less than 33% of which are tagged with environment metadata. Although published descriptions of metagenomic datasets are generally associated with enriched metadata describing the environment, source material, and sequencing technology, and in theory it is possible for one to read the manuscripts (including figures, tables and supplementary information) and gather that information, this is an onerous task when dealing with multiple studies. It also means multiple researchers potentially repeating the same work of trawling for metadata, resulting in significant researcher-hours that could be better spent actually interrogating the data.

@kuhnGenuineSemanticPublishing2017 also argues that much of what is being published under the heading of semantic publishing may fall short of the original vision "genuine" semantic publishing.

Referenced in

Z: The central bottleneck to synthesis infrastructures is authoring

This note is supposed to be a proposed explanation for a negative answer to the question Q: Do scholarly synthesis infrastructures already exist?. I am still developing that claim, though. A deeper diagnosis of why I'm not seeing mainstream adoption of these synthesis infrastructures yet can help ground our project more.

February 5th, 2021

then there's the Anita De Waard stuff and others (See Q: Do scholarly synthesis infrastructures already exist? 7Uk4bIpBa) - AFAIK, none of these have been studied in actual usage either!

Q: Do scholarly synthesis infrastructures already exist?