The unBalanced ecoLOGist: Preprints, Open Science, and the Muddy Road Ahead (Part 1)

Monthly submissions of preprints to bioRxiv. Figure from Le Monde March 20, 2018.

While browsing my twitter feed this morning, I came across a tweet from ScienceOpen about a new article “In praise of preprints” by Norman Fry, Helina Marhsall, and Tasha Mellins-Cohen (published open access in Microbial Genomics; doi: 10.1099/mgen.0.000259). This article re-asserts familiar advantages (credit, visibility, pre-submission and prepublication review) and disadvantages (no peer review, novelty not required, financially unsustainable, and obscuring priority), and ends with a reinforced commitment to preprints (and smoothed workflow via bioRxiv) and a position statement in support of preprints from the Microbiology Society.

At ±5am EDT, there was already one response to this, from Lonni Besançon, who wrote:

can only be helpful in the dissemination of research results. I wonder whether someone has looked at whether they could actually be enough. could happen directly on preprint platforms and could be done massively and in a totally open fashion

which prompted me to wonder (and reply) whether any of the preprints I’ve posted have received any comments…

The answer: No. Not a one. In five years of posting preprints.

And, as today is a holiday here in the US, I had plenty of time to burrow further down this rabbit-hole…

First, a bit of background

I’m a scientist (ecologist) who’s never left the comforting confines of academia. I’ve been, variously, a college professor, academic administrator, or full-time researcher for 30 years. In that time, I’ve published scholarly and trade books, technical papers and popular articles, book reviews, essays, op-eds, and science fiction. I’ve been an Associate Editor for five journals (American Journal of Botany, Ecology, Ecological Monographs, Ecology Letters, and PeerJ), Associate Editor-in-Chief of Ecology, Editor-in-Chief of Ecological Monographs, and I currently am one of three Senior Editors of Methods in Ecology and Evolution.

Throughout my career, I have been a strong advocate for open science in all its forms, including open-access publication (I helped launch Ecosphere in 2009); open archiving of data, software (I was the founding editor in 1998 of Ecological Archives, the first online data respository and data “journal” for ecologists), and field samples; and preprint servers.

Since preprint servers began accepting ecological research (ca. 2015), I’ve been posting papers on them and encouraging—for all the advantages asserted by Fry et al. (2019)— my students, post-docs, and collaborators world-wide to do the same. Today, prompted by Lonni’s tweet, I looked at these preprints to see if any of these advantages had born fruit.

The basic data

Sample size: Since 2015, I have been the lead author (first position), senior author (last position), or contributing co-author (somewhere in between) on 58 peer-reviewed technical articles (it helps that I don’t teach anymore). Of these 58 articles, 17 were posted on a preprint server; an additional nine manuscripts also have been posted on a preprint server but have not yet been submitted to, accepted by, or published in a journal. All of these are findable through Google Scholar.

Preprints: Most (18) of “my” preprints are available on bioRxiv. The remainder are divided among arXiv (3), PeerJ Preprints (3), PhilSci Archive (1), and socArxiv (1). Many papers on which I have been a co-author have not been submitted to preprint servers because the lead or senior authors felt that the disadvantages identified by Fry et al. (2019) of posting outweighed the advantages. In some cases, lead or senior authors also worried that posting a preprint could lead to the work being stolen or scooped. My impression from today’s casual assessment is they needn’t have worried.

Metrics: bioRxiv provides the most comprehensive set of metrics and alt-metrics for each posting, including the overall alt-metric score, whether the abstract, full-text html, or full-text pdf has been viewed or downloaded, how many times the preprint has been tweeted about, read on Mendeley, posted on someone’s facebook page, blogged about, or been picked up by a news or media outlet. PeerJ Preprints identifies tweets. Comments can be viewed on both bioRxiv and PeerJ Preprints. arXiv, socArxiv, and Phil Sci Archiv provide no metrics and any comments must be sent directly to the author (poster) and so are not publicy available. Google Scholar records the number of citations to the preprints posted on all of these servers.

Publications: For comparison with preprints, I looked at both the actual articles-as-published for which preprints had been posted (n=17) and those published articles for which preprints had not been posted (n=41). These articles were published in 42 different journals, including the open-access journals Animals, BMC Genomics, Chilean Journal of Statistics, Ecosphere, Forests, Geosciences, Insects, iScience, Nature Scientific Data, PeerJ, Science Advancesand the subscription journals Ecological Applications, Ecological Monographs, Ecology, Ecology Letters, Frontiers in Ecology and the Environment, Journal of Animal Ecology, Philosophical Transactions of the Royal Society, Proceedings of the National Academy of Sciences, and Trends in Ecology & Evolution. All told, 26 papers were published in open-access journals and 35 in subscription journals. For these, I tabulated citations listed on Google Scholar.


The good

  • People seem to look at preprints:
    • Abstract views ranged from 487 to 12,090 (median = 946);
    • Full-text html views ranged from 1 to 79 (median = 7);
    • PDF downloads ranged from 94 to 998 (median = 354).
    • Some even made it into reader’s Mendeley libraries (range 0 to 25, median = 1), and these included a few that haven’t been published yet.

The bad

  • None of “my” 26 preprints posted received any comments.
  • One of these preprints was cited 4 times, 2 were cited twice, 6 were cited once, and 16 were never cited.
  • After they were published (17 of the 26 preprints), citations ranged from 0 to 47, with a median = 4:
    • The most-cited paper was by Daru et al. (2018) published in New Phytologist;
    • Next, at 31 citations, was Li et al. (2017) published in Science Bulletin;
    • The “higher-profile” papers, published in Nature Ecology and Evolution (2018), BioScience (2016), and Ecological Monographs (2019), garnered 10, 4, and 1 citations, respectively.
  • Alt-metrics fared little better:
    • Total alt-metric score ranged from 2 to 44 (the latter being the paper by Milcu et al. (2018)published in Nature Ecology & Evolution two years after posting on bioRxiv) with a median of 3;
    • Tweets did poorly, ranging from zero to 40 (Daru et al. followed by Milcu et al. at 33) with a median of 7;
    • Only Daru et al. and Milcu et al. made it onto Facebook;
    • Only Milcu et al. made it into a blog or the media.
  • Published papers for which preprints had not been posted were equally unread. Citations ranged from 0 to 127 with a median of 5:

The ugly

  • Papers published in subscription journals garnered, on average, three times more citations than papers published in open-access journals
    • t = 2.29, P = 0.026 (two-sample t-test on log-transformed data).

Take-home messages

I have my ideas… what are yours?

…to be continued


