Early Indications October 2010: The Analytics Moment: Getting numbers to tell stories

Thanks in part to vigorous efforts by vendors (led by IBM) to bring the idea to a wider public, analytics is coming closer to the mainstream. Whether in ESPN ads for fantasy football, or election-night slicing and dicing of vote and poll data, or the ever-broadening influence of quantitative models for stock trading and portfolio development, numbers-driven decisions are no longer the exclusive province of people with hard-core quantitative skills.

Not surprisingly, the definition is completely problematic. At the simple end of the spectrum, one Australian firm asserts that “Analytics is basically using existing business data or statistics to make informed decisions.” At the other end of a broad continuum, TechTarget distinguishes, not completely convincingly, between data mining and data analytics:

“Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is used in many industries to allow companies and organization to make better business decisions and in the sciences to verify or disprove existing models or theories. Data analytics is distinguished from data mining by the scope, purpose and focus of the analysis. Data miners sort through huge data sets using sophisticated software to identify undiscovered patterns and establish hidden relationships.”

To avoid a terminological quagmire, let us merely assert that analytics uses statistical and other methods of processing to tease out business insights and decision cues from masses of data. In order to see the reach of these concepts and methods, consider a few examples drawn at random:

-The “flash crash” of May 2010 focused attention on the many forms and roles of algorithmic trading of equities. While firm numbers on the practice are difficult to find, it is telling that the regulated New York Stock Exchange has fallen from executing 80% of trades in its listed stocks to only 26% in 2010, according to Bloomberg. The majority occur in other trading venues, many of them essentially “lights-out” data centers; high-frequency trading firms, employing a tiny percentage of the people associated with the stock markets, generate 60% of daily U.S. trading volume of roughly 10 billion shares.

-In part because of the broad influence of Michael Lewis’s bestselling book Moneyball, quantitative analysis has moved from its formerly geeky niche at the periphery to become a central facet of many sports. MIT holds an annual conference on sports analytics that draws both sell-out crowds and A-list speakers. Statistics-driven fantasy sports continue to rise in popularity all over the world as soccer, cricket, and rugby join the more familiar U.S. staples of football and baseball.

-Social network analysis, a lightly practiced subspecialty of sociology only two decades ago, has surged in popularity within the intelligence, marketing, and technology industries. Physics, biology, economics, and other disciplines all are contributing to the rapid growth of knowledge in this domain. Facebook, Al Qaeda, and countless startups all require new ways of understanding cell phone, GPS, and friend/kin-related traffic.

Why now?

Perhaps as interesting as the range of its application are the many converging reasons for the rise of interest in analytics. Here are ten, from perhaps a multitude of others.

1) Total quality management and six-sigma programs trained a generation of production managers to value rigorous application of data. That six-sigma has been misapplied and misinterpreted there can be little doubt, but the successes derived from a data-driven approach to decisions are, I believe, informing today’s wider interest in statistically sophisticated forms of analysis within the enterprise.

2) Quantitative finance applied ideas from operations research, physics, biology, supply chain management, and elsewhere to problems of money and markets. In a bit of turnabout, many data-intensive techniques, such as portfolio theory, are now migrating out of formal finance into day-to-day management.

3) As Eric Schmidt said in August, we now create in two days as much information as humanity did from the beginning of recorded history until 2003. That’s measuring in bits, obviously, and as such Google’s estimate is skewed by the rise of high-resolution video, but the overall point is valid: people and organizations can create data far faster than any human being or process can assemble, digest, or act on it. Cell phones, seen as both sensor and communications platforms, are a major contributor, as are enterprise systems and image generation. More of the world is instrumented, in increasingly standardized ways, than ever before: Facebook status updates, GPS, ZigBee and other “Internet of things” efforts, and barcodes and RFID on more and more items merely begin a list.

4) Even as we as a species generate more data points than ever before, Moore’s law and its corollaries (such as Kryder’s law of hard disks) are creating a computational fabric which enables that data to be processed more cost-effectively than ever before. That processing, of course, creates still more data, compounding the glut.

5) After the reengineering/ERP push, the Internet boom, and the largely failed effort to make services-oriented architectures a business development theme, vendors are putting major weight behind analytics. It sells services, hardware, and software; it can be used in every vertical segment; it applies to every size of business; and it connects to other macro-level phenomena: smart grids, carbon footprints, healthcare cost containment, e-government, marketing efficiency, lean manufacturing, and so on. In short, many vendors have good reasons to emphasize analytics in their go-to-market efforts. Investments reinforce the commitment: SAP’s purchase of Business Objects was its biggest acquisition ever, while IBM, Oracle, Microsoft, and Google have also spent billions buying capability in this area.

6) Despite all the money spent on ERP, on data warehousing, and on “real-time” systems, most managers still can not fully trust their data. Multiple spreadsheets document the same phenomena through different organizational lenses, data quality in enterprise systems rarely inspires confidence, and timeliness of results can vary widely, particularly in multinationals. I speak to executives across industries who have the same lament: for all of our systems and numbers, we often don’t have a firm sense of what’s going on in our company and our markets.

7) Related to this lack of confidence in enterprise data, risk awareness is on the rise in many sectors. Whether in product provenance (Mattel), recall management (Toyota, Safeway, or CVS), exposure to natural disasters (Allstate, Chubb), credit and default risk (anyone), malpractice (any hospital), counterparty risk (Goldman Sachs), disaster management, or fraud (Enron, Satyam, Societe General), events of the past decade have sensitized executives and managers to the need for rigorous, data-driven monitoring of complex situations.

8) Data from across domains can be correlated through such ready identifiers as GPS location, credit reporting, cell phone number, or even Facebook identity. The “like” button, by itself, serves as a massive spur to inter-organizational data analysis of consumer behavior at a scale never before available to sampling-driven marketing analytics. What happens when a “sample” population includes 100 million individuals?

9) Visualization is improving. While the spreadsheet is ubiquitous in every organization and will remain so, the quality of information visualization has improved over the past decade. This may result primarily from the law of large numbers (1% of a boatload is bigger than 1% of a handful), or it may reflect the growing influence of a generation of skilled information designers, or it may be that such tools as Mathematica and Adobe Flex are empowering better number pictures, but in any event, the increasing quality of both the tools and the outputs of information visualization reinforce the larger trend toward sophisticated quantitative analysis.

10) Software as a service puts analytics into the hands of people who lack the data sets, the computational processing power, and the rich technical training formerly required for hard-core number-crunching. Some examples follow.

Successes, many available as SaaS

-Financial charting and modeling continue to migrate down-market: retail investors can now use Monte Carlo simulations and other tools well beyond the reach of individuals at the dawn of online investing in 1995 or thereabouts.

-Airline ticket prices at Microsoft’s Bing search engine are rated against a historical database, so purchasers of a particular route and date are told whether to buy now or wait.

-Wolfram Alpha is taking a search-engine approach to calculated results: a stock’s price/earnings ratio is readily presented on a historical chart, for example. Scientific calculations are currently handled more readily than natural-language queries, but the tool’s potential is unbelievable.

-Google Analytics brings marketing tools formerly unavailable anywhere to the owner of the smallest business: anyone can slice and dice ad- and revenue-related data from dozens of angles, as long as it relates to the search engine in some way.

-Fraud detection through automated, quantitative tools holds great appeal because of both labor savings and rapid payback. Health and auto insurers, telecom carriers, and financial institutions are investing heavily in these technologies.

Practical considerations: Why analytics is still hard

For all the tools, all the data, and all the computing power, getting numbers to tell stories is still difficult. There are a variety of reasons for the current state of affairs.

First, organizational realities mean that different entities collect the data for their own purposes, label and format it in often non-standard ways, and hold it locally, usually in Excel but also in e-mails, or pdfs, or production systems. Data synchronization efforts can be among the most difficult of a CIO’s tasks, with uncertain payback. Managers in separate but related silos may ask the same question using different terminology, or see a cross-functional issue through only one lens.

Secondly, skills are not yet adequately distributed. Database analysts can type SQL queries but usually don’t have the managerial instincts or experience to probe the root cause of a business phenomenon. Statistical numeracy, often at a high level, remains a requirement for many analytics efforts; knowing the right tool for a given data type, or business event, or time scale, takes experience, even assuming a clean data set. For example, correlation does not imply causation, as every first-year statistics student knows, yet temptations to let it do so abound, especially as scenarios outrun human understanding of ground truths.

Third, odd as it sounds in an age of assumed infoglut, getting the right data can be a challenge. Especially in extended enterprises but also in extra-functional processes, measures are rarely sufficiently consistent, sufficiently rich, or sufficiently current to support robust analytics. Importing data to explain outside factors adds layers of cost, complexity, and uncertainty: weather, credit, customer behavior, and other exogenous factors can be critically important to either long-term success or day-to-day operations, yet representing these phenomena in a data-driven model can pose substantial challenges. Finally, many forms of data do not readily plug into the available processing tools: unstructured data is growing at a rapid rate, adding to the complexity of analysis.

In short, getting numbers to tell stories requires the ability to ask the right question of the data, assuming the data is clean and trustworthy in the first place. This unique skill requires a blend of process knowledge, statistical numeracy, time, narrative facility, and both rigor and creativity in proper proportion. Not surprisingly, such managers are not technicians, and are difficult to find in many workplaces. For the promise of analytics to match what it actually delivers, the biggest breakthroughs will likely come in education and training rather than algorithms or database technology.