Writing Betrays Scientists Who Falsify Data

Scientists who falsify data had better watch out, because analysis of their writing can help tell good science from bad.

Two researchers at Stanford University have come up with a way to see if a scientific article contains fraudulent data. How? By measuring the degree of obfuscation in the writing.

Communication professor Jeff Hancock and graduate student David Markowitz examined a PubMed corpus of research published between 1973 and 2013. (PubMed is a database of articles from life sciences journals.)

Liars lie in predictable ways. Amateur poker players have certain “tells” that indicate when they’re bluffing. And bogus financial reports contain linguistic “fog.” This language keeps readers from detecting bad information either by distracting them or by hiding reality.

scientists who falsify data

Do all scientists who falsify data look deranged? No, but they do seem ethically challenged.

Scientists Who Falsify Data Also Obfuscate

Hancock and Markowitz wanted to test that same theory. Could they detect fraudulent science by evaluating the text of journal articles?

As it turns out, the answer is “yes.” The two researchers first found over 250 journal articles that had been retracted for fraud. They then compared the writing in those articles to the writing in articles that had not been retracted.

Using an “obfuscation index” to measure the language in each, Hancock and Markowitz came up with a score for each article. The score came from the incidence of causal language, readability, use of jargon, occurrence of positive words and phrases, and so on.

Among the researchers’ findings were the following:

  • Fraudulent writing has about 1.5 times the jargon found in non-fraudulent writing. (Just one more reason to avoid the overuse of jargon in your writing…)
  • Fraudulent researchers may “downplay” their findings by using more negative terms. Why? There is a fear that using too much positive language will arouse suspicion.
  • As obfuscation in an article increases, the number of references cited in it does, too. By adding more and more references, a fraudulent researcher makes it more difficult for ethical colleagues or editors to “out” him because it takes more time and effort to thoroughly evaluate each reference.

While a perfect detection system is not yet in place, software that uses the two researchers’ methodology may one day help scientific editors flag falsified research.

So scientists who falsify data, you’d better watch out. The linguists are on to you.

Sources and additional reading:
“Stanford researchers uncover patterns in how scientists lie about their data,” Stanford University.
“Linguistic Obfuscation in Fraudulent Science,” Journal of Language and Social Psychology, November 8, 2015.

Special thanks go to Bryan Shelly of Advanced Education Measurement for sharing this story with RedLine.