Sign up | Login with →

r language

Lots of Data Does Not Equal "Big Data"

March 29, 2013 by David Smith

Lots of data does not necessarily equate to “Big Data." To my way of thinking, the single most important capability to implement in any large scale data platform that is going to support sophisticated analytics is the ability to quickly construct, high quality random samples.[read more]

NCAA Data Visualizer for March Madness Face-Offs

March 24, 2013 by David Smith

data visualizer for NCAA

If you're laying down a friendly bet on the March Madness games or just tweaking your fantasy roster, this NCAA Data Visualizer by Rodrigo Zamith will be a boon. Just choose two teams to compare head-to-head, and choose an attribute to compare them on.[read more]

Open Data App for the Paris Métro

March 22, 2013 by David Smith

Back when my friends and I lived in different parts of Paris, it was tricky to find a mutually agreeable place to meet, so that we'd all be taking an approximately equally long Métro ride. If only we'd had Jean-Robert's Metro Meeting Point app, the decision would have been an easy one.[read more]

SAS Innovates into the Big Data Analytics Era

March 18, 2013 by Tony Cosentino

SAS Institute held its 24th annual analyst summit last week in Steamboat Springs, Colorado. The 37-year-old privately held company is a key player in big data analytics, and company executives showed off their latest developments and product roadmaps.[read more]

Data Science Education Gets Personal

March 15, 2013 by David Smith

This year with both Udacity and Harvard and MIT-backed edX offering interesting and challenging courses, the growth of MOOC enrollment must be astounding indeed. Then again, while MOOC courses are “free,” for a working professional they not without opportunity costs.[read more]

R Script Creates a Map of Worldwide Email Traffic

March 14, 2013 by David Smith

r script email traffic

The Washington Post reports that by analyzing more than 10 million emails sent through the Yahoo! Mail service in 2012, a team of researchers used the R language to create a map of countries whose citizens email each other most frequently.[read more]

SnapLogic: Making Big Data Integration as a Service a Hadoop Reality

March 5, 2013 by Mark Smith

SnapLogic, a provider of data integration in the cloud, last week announced Big Data-as-a-Service to address businesses’ needs to integrate and process data across Hadoop big data environments. I look forward to seeing SnapLogic’s 2013 technology advancements.[read more]

R Script Tracks Bookies' Favorites for the Next Pope

March 5, 2013 by David Smith

Tired of manually running a python script to scrape the latest bookmaker odds on the next Pope, R user AJ (an analytical research manager at a large healthcare company) instead created an R script to track the odds on the Papal successor.[read more]

Resampling Data in Hadoop with RHadoop

February 28, 2013 by David Smith

Uri Laserson has created an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to resampling with RHadoop.[read more]

A Data Scientist Investigates the Belgian Municipal Elections

February 21, 2013 by Istvan Hajnal

After the elections of the 14th of October in Belgium, media reported several cases of candidates who had received more preference votes than normal. Using my data science skills and tools, I tested whether there was a faulty "Touch Screen Effect."[read more]

Adventures in MOOC: Back to School

February 21, 2013 by Doug Lautzenheiser

I am now in the homestretch of a Coursera MOOC that lasts eight weeks. At the five-week milestone of a Data Analysis course using the R statistical programming language, I have survived four weekly quizzes and the first written assignment.[read more]

10 R Packages Every Data Scientist Should Know About

February 19, 2013 by David Smith

The yhat blog lists 10 R packages they wish they'd known about earlier. Drew Conway calls them "10 reasons to always start your analysis in R." They're all very useful R packages that every data scientist should be aware of.[read more]

Personal Data Mining: Solving a Mystery

February 19, 2013 by Themos Kalafatis

If people had the ability to collect data on a daily basis (see Quantified Self) and then analyze them on a massive scale, several unknown patterns that call for closer investigation could emerge. I used a smartphone to capture data about my health, and then used analytics to process that data.[read more]

Video: Data Mining with R

February 18, 2013 by David Smith

Our recent "Introduction to R for Data Mining" webinar was a record setter, with more than 2000 registrants and more than 700 attending the live session presented by Joe Rickert. If you missed it, here's the video replay.[read more]

The First Data Scientist on the Evolution of Data Science

February 18, 2013 by Gil Press

Norman Nie was not surprised by the accurate predictions of the presidential election results from Nate Silver and others. “A lot of it,” he told me recently, “is good statistics and good science and good statistical programming packages.”[read more]