CEU Network Scientist and Colleagues Highlight Pitfalls of Automated Predictors of Scientific Success

Can we make predictions in science? Can we predict where to search a cure for cancer? And how and when will this breakthrough be made and by whom? CEU Assistant Professor of Network Science Roberta Sinatra and colleagues from several other institutions have published a Prospective in Science on the science of science, or “an interdisciplinary effort to scientifically understand the social processes that lead to scientific discoveries.” It turns out that mining massive digital databases for elements that can help us predict scientific success or breakthroughs have great benefits, but also some pitfalls.

The more predictable we can make the process of scientific discovery, the more efficiently those resources can be used to support worthwhile technological, biomedical, and scientific advances,” the authors write.

Last fall, Sinatra, CEU Professor Albert-Laszlo Barabasi and other colleagues published their research in Science on predicting academic success of individual researchers in certain fields using a specific formula. Looking at, for example, a block of ten papers authored by a scientist, the team applied their formula and could predict the future success of the academic. Importantly, they noted that there is no specific time that a researcher is prone to be more successful – e.g. at the beginning, middle or end of her career. “So, if you have an early outstanding discovery, it's no necessarily the promise of taking off,” Sinatra said. “On the other hand, if you haven't had a big discovery yet but you are systematically having good impact, you will have your big work in the future.”

However, access to resources is key here, as proof of merit – or prediction of achievement – can result in organizational and government scientific funding. There are, however, limits to data-driven prediction of discoveries. This current Science paper points out important weaknesses in the system of scientific publication that can lead to significant inequality. The authors cite previous papers and articles that highlight the fact that “what discoveries are made is determined in part by who is working to make them and how they were trained as scientists.” The scientific workforce, the authors say, is the product of a small number of prestigious institutions and programs that “tend to drive the scientific preferences and workforce composition of the entire ecosystem.”

In the same vein of exclusion, the authors highlight existing research on patterns in productivity and impact as well as evidence of biases in the evaluation of research proposals “raise troubling questions about our current approach to funding most scientific research.” Previous studies have, in fact, shown that grant proposals led by female or non-white investigators, or focused on interdisciplinary research are less likely to receive funding.

In addition, possibly erroneous notions of a scientist's most productive years has led some agencies to shift funding from older to younger researchers, which could deny older scientists the opportunity to make their big discovery. “Citations and publications in particular are measures of past success that exhibit a feedback loop that creates a rich-gets-richer dynamic,” the authors write. They suggest controlled experiments could be used when dissecting large databases to use both quantitative and qualitative analysis when, for example, gauging how well citation counts reflect perceived scientific impact. The authors also suggest that there could be other, better data for prediction of success from scientific workshops to rejected manuscripts/grant proposals to social media.

Finally, the authors caution the scientific community against broad funding decisions based solely on systems that would automatically evaluate the future success of a paper, proposal or even a scientist herself.

We have a responsibility to ensure that the use of prediction tools does not inhibit future discovery, marginalize underrepresented groups, exclude novel ideas, or discourage interdisciplinary work and the development of new fields.” 

Photo caption/credits for collage image: from left, top row: Jonas Salk (image: public domain); John Dabiri (image: Stanford); Sir Alexander Fleming plaque (image: public domain); from left, bottom row: Shinya Yamanaka (image: public domain); Marie Cure (image: public domain); Maryam Mirzakhani (image: Stanford)