From Kaiser Fung, Harvard Business Review Blog
In their best-selling 2013 book Big Data: A Revolution That Will Transform How We Live, Work and Think, authors Viktor Mayer-Schönberger and Kenneth Cukier selected Google Flu Trends (GFT) as the lede of chapter one. They explained how Google’s algorithm mined five years of web logs, containing hundreds of billions of searches, and created a predictive model utilizing 45 search terms that “proved to be a more useful and timely indicator [of flu] than government statistics with their natural reporting lags.”
Unfortunately, no. The first sign of trouble emerged in 2009, shortly after GFT launched, when it completely missed the swine flu pandemic. Last year, Nature reported that Flu Trends overestimated by 50% the peak Christmas season flu of 2012. Last week came the most damning evaluation yet. In Science, a team of Harvard-affiliated researchers published their findings that GFT has over-estimated the prevalence of flu for 100 out of the last 108 weeks; it’s been wrong since August 2011. The Science article further points out that a simplistic forecasting model—a model as basic as one that predicts the temperature by looking at recent-past temperatures—would have forecasted flu better than GFT.
In short, you wouldn’t have needed big data at all to do better than Google Flu Trends. Ouch.
In fact, GFT’s poor track record is hardly a secret to big data and GFT followers like me, and it points to a little bit of a big problem in the big data business that many of us have been discussing: Data validity is being consistently overstated. As the Harvard researchers warn: “The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.”
With the advancements in data science, Big Data is going to change entire fields - marketing, sales, business, finance, healthcare, just to name a few. The sheer amount of data available to companies with large memberships or visitors (like Facebook, Google, Whatsapp, Apple, etc) means that these corporations can find previously unseen trends and correlations, and be able to act on such statistical relationships. For example, the real-time geographical spread 2009 swine flu was better predicted and measured by Google than the US Centre for Disease Control through analysing the search terms used by Americans around the country. Used to spot fraud, track consumer behaviour, and so on, Big Data has countless applications that are just waiting to be discovered. The new currency of the Information Age is data - and the currency is appreciating.