To make research more inclusive, we must rethink citation ‘relevance’

Bibliographic databases’ default ‘sort by relevance’ listings perpetuate bias towards white, Western men, says Katy Jordan

April 19, 2023
Filing cabinets symbolising database sorting
Source: iStock

It may seem like a trivial detail, but by making a small change in our online search habits, academics could help to address some well-known problems with under-representation in education and research.

I’m referring to that setting in the corner of most bibliographical databases marked “sort by relevance”. In all likelihood, the last time you trawled a scholarly database, that was your default setting – and it probably made sense to use it. But there are good reasons to think again.

Typically, this function risks perpetuating biases in academic publishing that over-represent scholars in high-income countries. The beneficiaries tend to be researchers who are white, Western and male, while other contributions are overlooked.

Many of us are aware of this as a wider problem; after all, the evidence has been around for years. A 2013 study, for example, found that articles with a female first author tended to receive significantly fewer citations than those with a male first author. The world’s most-cited research still comes disproportionately from Europe and the US. According to one analysis, take any “international” peer-reviewed journal, and chances are that just 1 per cent of its contents will be from anywhere in sub-Saharan Africa.

It is easy to forget that when scholarly databases sort by “relevance”, the algorithms pre-selecting content are built on this uneven ground.

Take, for example, Google Scholar, which is by far the most popular literature search platform. Its publicly available explanation of how it ranks content tells us that it does this “the way researchers do”, taking into account factors such as the content, journal of publication, the author and recency of citations “in other scholarly literature”.

Put simply, the first few pages of returns will probably represent the greatest hits of established researchers in whichever field you’re searching. Based on what we know about pre-existing biases, work by women, scholars of colour, early career researchers or those from the Global South is much more likely to be buried.

To what extent are academics aware of this – and what, if anything, are they doing about it? In a recent study, a colleague and I surveyed 100 academics about how they use search platforms and the assumptions they make about them. We also analysed how “relevance” is defined by 14 of the largest academic bibliographic databases, including Academia.edu, JSTOR, PubMed, Scopus and Semantic Scholar.

Encouragingly, most researchers were wary of Google Scholar. They were frequently uncertain how it determined relevance and often described this as a sort of “algorithmic magic”. As one participant put it: “It’s a total black box”.

Most researchers, however, told us that their main strategy in response to this opacity was to use some of the other, more specialised databases that we also looked at in the study. When we asked about how these sort by relevance, algorithmic magic never came up. This is a problem because in reality their algorithms are just as opaque.

In fact, of the 14 databases we looked at, “sort by relevance” was the default setting in all but two – and seven provided no information about how this was determined. The remainder offered sketchy details. In many cases, they appeared to rely heavily, once again, on citations and reputation metrics. Well-meaning academics who look to these sources to avoid biases may, therefore, be inadvertently reproducing them.

What can be done to fix this? For a start, the database providers should be more transparent. A brief definition of relevance should be a bare minimum; it is shocking that in some cases no definition is provided at all. Going deeper, developers ought to consider a radical rethink of their ranking algorithms, given what we know about pre-existing biases in citation practices especially.

In recent times, various resources and guidelines for “positive citation practices” have been developed to help researchers ensure that the literature they are citing draws on an appropriately wide range of potential sources. These are only being used on a piecemeal basis right now, but they could be more standard. Universities could set their use as an expectation for research staff, and journals could adopt them as a submission requirement.

It's also up to us, however, to make ourselves and others aware of the risks of citation bias. Staff development programmes, especially for new researchers, could easily incorporate information highlighting the problematic nature of ranking by relevance.

And the simplest measures we can take? Diversify our searches and tweak our search settings. Most platforms allow users to customise how they receive information, so the next time you do a literature search, switch “sort by relevance” off. You might be surprised by the results. And they will almost certainly be fairer for everyone.

Katy Jordan is senior research associate in the Faculty of Education at the University of Cambridge.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Related articles

Sponsored