This post from Sherry LaMonica is the first in a series from members of the Revolution Analytics Engineering team — ed.
Do you know about the big data capabilities in the RevoScaleR package, included with every Revolution R Enterprise installation?
This post from Sherry LaMonica is the first in a series from members of the Revolution Analytics Engineering team — ed.
Do you know about the big data capabilities in the RevoScaleR package, included with every Revolution R Enterprise installation?
RevoScaleR provides a framework for fast and efficient multi-core processing of large data sets. You can visualize and model data sets with millions of records on your local machine using syntax like:
myLinMod <- rxLinMod(y ~ x + z, data=myData)
Some highlights of the RevoScaleR package include:
- The XDF file format, a binary file format with an R interface that optimizes row and column processing and analysis.
- Data transformation tools for exploring and preparing large data sets for analysis.
- Statistical algorithms optimized for large data sets.
Most users will want to proceed from data import to data analysis in a three-step process. Below are some of the frequently used RevoScaleR functions in each of these steps:
Step 1: Import the data you want to analyze from external file:
rxTextToXdf() – Import data to .xdf format from a delimited text file.
rxImportToXdf() – Import data from a data source, such as fixed-format text or SAS data (use together with the RxTextData and RxSasData functions)
rxDataStepXdf() – Transform your data and select subsets of variables and/or rows for data exploration and analysis.
Step 2: Explore and Transform the Data:
rxSummary(), rxCube, rxCrossTabs() – Obtain summary statistics and compute crosstabulations.
rxHistogram() – Plot a histogram of a variable in an .xdf file.
rxLinePlot() – Create a line or scatterplot from data in an .xdf file or the results from rxCube.
Step 3: Perform model fitting and additional statistical analysis on data:
rxLinMod() – Fit a linear regression model to data in an .xdf file.
rxLogit() – Fit a logistic regression model to data in an .xdf file.
rxPredict() – Compute predictions and residuals from a linear or logistic regeression fit.
rxCovCor() – Compute the covariance/correlation matrix for a linear or logistic regression model.
The RevoScaleR ‘Getting Started Guide’ contains several examples of how to analyze your data with the RevoScaleR package. You can open the PDF document from within Revolution R Enterprise for Windows by going to the ‘Help’ menu and selecting the option ‘R Manuals(PDF)’ from the menu. This will open the PDF portfolio, the third document listed is ‘RevoScaleRGetStart.pdf’.