I spoke at Defrag 2010 earlier today and introduced what I am describing as the new physics of big data.
Having designed and deployed a number of multi-billion row context accumulating systems over the last 14 years I cannot help but notice some very interesting, very exciting phenomenology. Not research. Not Theory. Real.
I spoke at Defrag 2010 earlier today and introduced what I am describing as the new physics of big data.
Having designed and deployed a number of multi-billion row context accumulating systems over the last 14 years I cannot help but notice some very interesting, very exciting phenomenology. Not research. Not Theory. Real.
1. Better Prediction. Simultaneously lower false positives and lower false negatives. A bit more about this here: Prediction: Channel Consolidation and Puzzling: How Observations Are Accumulated Into Context.
2. Bad data good. More specifically, natural variability in data including spelling errors, transposition errors, and even professionally fabricated lies – all helpful. A bit more about this here: It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You and There Is No Such Thing As A Single Version of Truth.
3. More data faster. Less compute effort as the database gets bigger. A bit more about this most exciting phenomenon here: The Fast Last Puzzle Piece.
4. Selective attention and curiosity. A better sense of when and where to place one’s attention (apply compute effort) including: (1) very smart observation filters and (2) fully automated ability to determine very specific, very relevant questions – answers to which it may decide to fetch itself. A system that Googles itself? Unfortunately, I have not blogged about this thinking yet, but will hopefully get to it one of these days.
Anyway, imagine that: As the database grows, fewer CPU cycles are needed for better predictions and you never really wanted to clean all that data up in the first place.
I also took this keynote opportunity to share my latest skunk works project – a project my team and I have been working on for almost two years now. Yes, it’s true, I am building something – a sensemaking engine, designed to fully harness this big data phenomenon. Among other exciting properties, this system will also have an unprecedented number of privacy-enhancing features baked into it. Internally I have been calling this little skunk works effort “G2.” And when this little girl grows up I have big hopes for her. For example, maybe she will help cancer researchers find a cure.
My Defrag 2010 MS PowerPoint presentation here.