Last January, MIT computer science graduate student (and former Wall Street Journal reporter) Keith Winstein reported on the dramatic divergence between Google’s flu data and the official CDC flu numbers: Is Google Flu Trends Prescient Or Wrong?
“This could be a cautionary tale about the perils of relying on these ‘Big Data’ predictive models in situations where accuracy is important,” Winstein said in an interview with CommonHealth.
Bingo. A paper by Northeastern University researchers and others, just out in the journal Science, looks at where Google Flu Trends went wrong — and presents the errors as exactly that sort of cautionary tale. And one of the morals of the “The Parable of Google Flu: Traps in Big Data Analysis” is that Google needs to share its workings better with other research outfits. From news@Northeastern:
By incorporating lagged data from the Centers for Disease Control and Prevention as well as making a few simple statistical tweaks to the model, Lazer said, the GFT [Google Flu Trends] engineers could have significantly improved their results. But in a companion report also released Thursday on the Social Science Research Network—an online repository of scholarly research and related materials—Lazer and his colleagues show that an updated version of GFT, which came about in response to a 2013 Nature article revealing GFT’s limitations, does little better than its predecessor.
While Big Data certainly holds great promise for research, Lazer said, it will only be successful if the methods and data are made—at least partially—accessible to the community. But that so far has not been the case with Google.
“Google wants to contribute to science but at the same time does not follow scientific praxis and the principles of reproducibility and data availability that are crucial for progress,” Vespignani said. “In other words they want to contribute to science with a black box, which we cannot fully scrutinize and understand.”
If scientists are to “stand on the shoulders of giants,” as the old adage requires for moving knowledge forward, they will need some help from the giants, Lazer said. Otherwise failures like that with Google Flu Trends will be rampant, with the potential to tarnish our understanding of anything from stock market trends to the spread of disease. – See more at: http://www.northeastern.edu/news/2014/03/does-big-data-have-the-flu/#sthash.XqZV5IJD.dpuf
Read the full Northeastern story here.