Big Data Hubris? Where Google Flu Trends Went Wrong

flu graph

(Keith Winstein, MIT)

Last January, MIT computer science graduate student (and former Wall Street Journal reporter) Keith Winstein reported on the dramatic divergence between Google’s flu data and the official CDC flu numbers: Is Google Flu Trends Prescient Or Wrong?

“This could be a cautionary tale about the perils of relying on these ‘Big Data’ predictive models in situations where accuracy is important,” Winstein said in an interview with CommonHealth.

Bingo. A paper by Northeastern University researchers and others, just out in the journal Science, looks at where Google Flu Trends went wrong — and presents the errors as exactly that sort of cautionary tale. And one of the morals of the “The Parable of Google Flu: Traps in Big Data Analysis” is that Google needs to share its workings better with other research outfits. From news@Northeastern:

By incor­po­rating lagged data from the Cen­ters for Dis­ease Con­trol and Pre­ven­tion as well as making a few simple sta­tis­tical tweaks to the model, Lazer said, the GFT [Google Flu Trends] engi­neers could have sig­nif­i­cantly improved their results. But in a com­panion report also released Thursday on the Social Sci­ence Research Network—an online repos­i­tory of schol­arly research and related materials—Lazer and his col­leagues show that an updated ver­sion of GFT, which came about in response to a 2013 Nature article revealing GFT’s lim­i­ta­tions, does little better than its predecessor.

While Big Data cer­tainly holds great promise for research, Lazer said, it will only be suc­cessful if the methods and data are made—at least partially—accessible to the com­mu­nity. But that so far has not been the case with Google.

“Google wants to con­tribute to sci­ence but at the same time does not follow sci­en­tific praxis and the prin­ci­ples of repro­ducibility and data avail­ability that are cru­cial for progress,” Vespig­nani said. “In other words they want to con­tribute to sci­ence with a black box, which we cannot fully scru­ti­nize and understand.”

If sci­en­tists are to “stand on the shoul­ders of giants,” as the old adage requires for moving knowl­edge for­ward, they will need some help from the giants, Lazer said. Oth­er­wise fail­ures like that with Google Flu Trends will be ram­pant, with the poten­tial to tar­nish our under­standing of any­thing from stock market trends to the spread of disease. – See more at: http://www.northeastern.edu/news/2014/03/does-big-data-have-the-flu/#sthash.XqZV5IJD.dpuf

Read the full Northeastern story here.

Please follow our community rules when engaging in comment discussion on this site.