R Programming Language
7 Big Data Trends That Will Impact Your Business
The topic of big data continues to pulsate with vigor in the market, as demonstrated by the wide variety of data innovations emerging daily and the talented professionals successfully pursuing the creation and use of big data solutions. So what trends might we see emerge in the Big Data ecosystem?[read more]
Lots of Data Does Not Equal "Big Data"
Lots of data does not necessarily equate to “Big Data." To my way of thinking, the single most important capability to implement in any large scale data platform that is going to support sophisticated analytics is the ability to quickly construct, high quality random samples.[read more]
NCAA Data Visualizer for March Madness Face-Offs
If you're laying down a friendly bet on the March Madness games or just tweaking your fantasy roster, this NCAA Data Visualizer by Rodrigo Zamith will be a boon. Just choose two teams to compare head-to-head, and choose an attribute to compare them on.[read more]
Open Data App for the Paris Métro
Back when my friends and I lived in different parts of Paris, it was tricky to find a mutually agreeable place to meet, so that we'd all be taking an approximately equally long Métro ride. If only we'd had Jean-Robert's Metro Meeting Point app, the decision would have been an easy one.[read more]
Data Science Education Gets Personal
This year with both Udacity and Harvard and MIT-backed edX offering interesting and challenging courses, the growth of MOOC enrollment must be astounding indeed. Then again, while MOOC courses are “free,” for a working professional they not without opportunity costs.[read more]
R Script Creates a Map of Worldwide Email Traffic
The Washington Post reports that by analyzing more than 10 million emails sent through the Yahoo! Mail service in 2012, a team of researchers used the R language to create a map of countries whose citizens email each other most frequently.[read more]
R Script Tracks Bookies' Favorites for the Next Pope
Tired of manually running a python script to scrape the latest bookmaker odds on the next Pope, R user AJ (an analytical research manager at a large healthcare company) instead created an R script to track the odds on the Papal successor.[read more]
Revolution Analytics CEO: Big Data Is a New Management Discipline
Dave Rich has an interesting theory explaining the rapidly growing interest in predictive analytics. “When the 2008 recession hit,” he told me, “the question was how come we weren’t better prepared with all the money we’ve spent in the last decade on information systems?"[read more]
Resampling Data in Hadoop with RHadoop
Uri Laserson has created an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to resampling with RHadoop.[read more]
Political Revolutions on Twitter, Visualized with R
Esteban Moro Egido, a mathematics professor at Universidad Carlos III in Madrid has produced a video depicting Twitter activity around Spain's general strike in March this year. He used the R language for analyzing all of the tweets, retweets and mentions related to the strike.[read more]
Big Data Trees with Hadoop HDFS
Last month's release of Revolution R Enterprise 6.1 added the capability to fit decision and regresson trees on large data sets (using a new parallel external memory algorithm included in the RevoScaleR package).[read more]
Real-Time Predictive Analytics with Big Data, and R
Can R be used for real-time applications? Absolutely! The key is in setting up an technology stack that can support real-time interactions with models developed in R ... and a clear understanding of what "real-time" really means, and its implications in the context of Big Data.[read more]
Simple Inter-row Computation: esProc Keeps It Simple!
The interrow computation is quite common, such as the aggregate, comparison with same period of any previous year, and link relative ratio in business statistics and analytics. Both R language and esProc provide a pretty good interrow computation ability with slight difference to each other.[read more]
Vector Computing, Who Is More Powerful, R Language or esProc?
Do you find Vector Computing tiresome while using statistical computing tools? Here we go for a Vector Computing Comparison: R Language vs. esProc. To me, one of the most attractive features of R language and esProc is that their codes are both agile, that is, only requiring a few lines of codes to implement plentiful functions.[read more]
Benchmarking bigglm
It would be nice to know if bigglm scales linearly with the number of records. If it does, that would bring bigglm in at about 4.5 hours to process the entire data set, considerably longer than the 54 minutes it took to process the large data set with RevoScaleR data base on [the author's] PC.[read more]
The moderated business community for business intelligence, predictive analytics, and data professionals.
The Predictive Analytics in the Cloud Study is complete!
Register here to access the full results of this exclsuive study on Predictive Analytics and Cloud Technology including a whitepaper, 2 webinars, multiple podcasts and more!
SmartData Collective

About Social Media Today



















“Mike, we are seeing an increase in businesses seeking specialized skills to help address challenges that arose with the era of big data. The HPCC Systems platform from LexisNexis helps to fill this gap by allowing data analysts themselves to own the complete data lifecycle. Designed by data scientists, ECL is a declarative programming language used to express data algorithms across the entire ...”
“Data variety is indeed both a challenge and an opportunity. I work for Gnip and we provide social data from a variety of sources and are constantly talking about what we call The Social Cocktail. We normalize the streams to help businesses overcome some of the challenges presented in this articles The Curse and Challenge of Data Variety section. Our customers are using multiple data sources ...”