The OA advantage is greater for the more citablearticles, not because of a quality bias from authors self-selecting what to make OA, but because of a quality advantage, from users self-selecting what to use and cite, freed by OA from the constraints of selective accessibility to subscribers only. Articles whose authors have supplemented subscription-based access to the publisher’s version by self-archiving their own final draft to make it accessible free for all on the web are cited significantly more than articles in the same journal and year that have not been made OA (Gargouri & al, 2010). This was demonstrated in our previous sample of articles published between 2002 and 2006. This is now extended to cover 63,518 articles (13,425 are mandated and 50,093 are nonmandated) published between 2002 and 2009 in 5,992 journals. This is not a random sample of articles. The sample is based on determining which articles are deposited in the mandated repositories, and then 10 keyword-matched controls are randomly selected from the same journal and year as each article.
In this 2009 sample, an average of 64% of each of the four mandated institutions’ total yearly article output was self-archived and hence made OA, as mandated. The corresponding percentage OA among the control articles published in the same journal/year (but originating from other, presumably nonmandated institutions) was 21%, or close to the frequently reported global spontaneous baseline rate of about 15-20% for self-selected (nonmandated) self-archiving (Table 1). In other words, about 21% of these papers were self-selectively self-archived when it was not mandated, whereas an average of 64% were self-archived when it was mandated (Figure 1).
Table 1: Counts of mandated and nonmandated articles
2. Citation Ratios Comparing the Yearly OA Impact advantage for Self-Selected vs Mandatory
Averages across the sample of four institutions with self-archiving mandates confirm the significantly higher citation counts for OA articles (symbolized here as “O”) compared to matched control non-OA articles (symbolized here as “Ø”) published in the same journal and year. They are compared as O/Ø log ratios in the seven comparisons (Figure 2). (The first comparison, O/Ø, for example, is the arithmetic mean of all the (log) ratios O/Ø for each of the 8 years.) OA articles are more highly cited irrespective of whether the OA is Self-Selected (S) or Mandated (M). The O/Ø Advantage is present for mandated OA (OM/ØS) and is of about the same magnitude irrespective of whether we compare the S ratios with the M ratios for the entire control sample (OS/Ø vs OM/Ø) or just compare S alone with M alone (OS/ØS vs OM/ØM).
The number of citations an article receives can be correlated with and hence influenced by a variety of variables (Age, Journal Impact Factor, Number of Authors, Number of References, Number of Pages, Science, Review, USA Author, OA, Mandatedness, CERN, Southampton, Minho, Queensland and the interaction Age*OA). Logistic regressions are used to test 4 models, each analyzing a different comparison range (Figure 3). For each comparison (e.g., 1-4 citations (lo) vs. 5-9 citations (med-lo)) an article is assigned zero if its citation count is in the lower of the two ranges and one if it is in the upper range. Then the model assigns the best fitting weights to each of fifteen predictor variables in their joint prediction of the citation counts. The weights are proportional to the independent contribution of each variable. (Only statistically significant weights are shown.)
Figure 3 shows that citations are positively correlated with Age, Journal Impact Factor, Authors, References, Science, USA as well as with OA. Citations are negatively correlated but slightly with Pages and Review. (Note the anomalous effect of the “Review” variable; this is probably because it is confounded with the Reference count variable; when Review was removed in further analyses, the pattern of the other variables, and in particular OA, was unchanged.)Logistic regression analysis shows that the OA advantage is independent of other correlates of citations and highest for the most highly cited articles. The analysis was also done for 2002-2008 without CERN and also without CERN and Southampton, and the outcome was the same.
Figure 3: Exp(ß)-1 values for logistic regressions
Gargouri Y, Hajjem C, Larivière V, Gingras Y, Carr L, Brody T, Harnad S (2010) Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research. PLoS ONE 5(10):e13636+. doi:10.1371/journal.pone.0013636.