Big Data and Real-time Structured Data Analytics – O’Reilly Radar
The emergence of sensors as sources of Big Data highlights the need for real-time analytic tools. Popular web apps like Twitter, Facebook, and blogs are also faced with having to analyze (mostly unstructured) data in near real-time. But as Truviso founder and UC Berkeley CS Professor Michael Franklin recently noted, there are mountains of structured data generated by web apps that lend themselves to real-time analysis: The information stream driving the data analytics challenge is orders of magnitude larger than the streams of tweets, blog posts, etc. that are driving interest in searching the real-time web. Most tweets, for example, are created manually by people at keyboards or touchscreens, 140 characters at a time. Multiply that by the millions of active users and the result is indeed an impressive amount of information. The data driving the data analytics tsunami, on the other hand, is automatically generated. Every page view, ad impression, ad click, video view, etc. done by every user on the web generates thousands of bytes of log information. Add in the data automatically generated by the underlying …
Big Data and Real-time Structured Data Analytics – O’Reilly Radar
The emergence of sensors as sources of Big Data highlights the need for real-time analytic tools. Popular web apps like Twitter, Facebook, and blogs are also faced with having to analyze (mostly unstructured) data in near real-time. But as Truviso founder and UC Berkeley CS Professor Michael Franklin recently noted, there are mountains of structured data generated by web apps that lend themselves to real-time analysis: The information stream driving the data analytics challenge is orders of magnitude larger than the streams of tweets, blog posts, etc. that are driving interest in searching the real-time web. Most tweets, for example, are created manually by people at keyboards or touchscreens, 140 characters at a time. Multiply that by the millions of active users and the result is indeed an impressive amount of information. The data driving the data analytics tsunami, on the other hand, is automatically generated. Every page view, ad impression, ad click, video view, etc. done by every user on the web generates thousands of bytes of log information. Add in the data automatically generated by the underlying infrastructure (CDNs, servers, gateways, etc.) and you can quickly find yourself dealing with petabytes of data.
The Smarter Planet tumblelog is an outgrowth of IBM’s strategic initiative to help a world of smart systems emerge.