Monday, January 22, 2024

Sleeping Beauties Of Science

Many scientific papers receive little attention initially but become highly cited years later. What groundbreaking discoveries might have already been made, and how can we uncover them faster?

The scientific literature is vast. No individual human can fully know all the published research findings, even within a single field of science. Regardless of how much time a scientist spends reading the literature, there’ll always be what the information scientist Don Swanson called ‘undiscovered public knowledge’: knowledge that exists and is published somewhere, but still remains largely unknown.

Some scientific papers receive very little attention after their publication – some, indeed, receive no attention whatsoever. Others, though, can languish with few citations for years or decades, but are eventually rediscovered and become highly cited. These are the so-called ‘sleeping beauties’ of science.

The reasons for their hibernation vary. Sometimes it is because contemporaneous scientists lack the tools or practical technology to test the idea. Other times, the scientific community does not understand or appreciate what has been discovered, perhaps because of a lack of theory. Yet other times it’s a more sublunary reason: the paper is simply published somewhere obscure and it never makes its way to the right readers.

What can sleeping beauties tell us about how science works? How do we rediscover information the scientific body of knowledge already contains but that is not widely known? Is it possible that, if we could understand sleeping beauties in a more systematic way, we might be able to accelerate scientific progress? 

[---]

Indeed, one of the most famous physics papers, Albert Einstein, Boris Podolsky, and Nathan Rosen (EPR)’s ‘Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?’ (1935) is a classic example of a sleeping beauty. It’s number 14 on one list that quantifies sleeping beauties by how long they slept and how many citations they suddenly accrued.

The EPR paper questioned whether quantum mechanics could truly describe physical reality. The stumbling block was the phenomenon of ‘quantum entanglement’, where two quantum particles have a history of previous interaction and remain connected in such a way that means any measurement of a property of one of them influences that property in the other, regardless of how far away from each other they are.

[---]

The first is Karl Pearson’s 1901 paper ‘On Lines and Planes of Closest Fit to Systems of Points in Space’. It looks like a classic case of a sleeping beauty: it was published in a primarily philosophical outlet with the rather unwieldy name of The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, and seems to have slept soundly for a whole century, only being fully awakened in 2002 with a huge surge of citations.

It’s certainly true that the twenty-first century brought with it many more ways to use Pearson’s 1901 insights. What he had described was what eventually became the statistical workhorse known as principal components analysis (PCA) – which became particularly useful after the advent of digital ‘big data’ to discover patterns and summarize large, unwieldy datasets in a smaller number of variables. But even without those datasets, the technique of PCA itself was well used across the entire twenty-first century, from psychology to palaeontology.

It’s hard to say why the 1901 paper suddenly started being cited around 2002 – the explanation could be pure luck and social dynamics, with one study happening to cite it and others following suit – but it wasn’t because PCA, which by that point was taught in every basic statistics course, had been ‘rediscovered’.

- Lost Science


No comments: