Why would you want to go and do that?

Maybe it is not so much that you want to be a data scientist… maybe it is that you either need to be a data scientist or you need to hire one. No matter the field in which you’re working, recent data explosions are revolutionizing the way that people perform their work. If you are a corporate manager, you need data-driven intelligence to help you make informed decisions about internal operations or about sales and marketing campaigns. If you are a government manager, you need data insights in order to make the decisions that will help you make the most of taxpayers’ dollars. If you are an engineer, you can use data insights to help you create the most energy efficient or the safest design. If you are a crisis response coordinator, you can you use data insights to give you open source intel about what is happening on the ground during a crisis situation and how you can best target your relief efforts. Heck, these days, even non-profits are benefiting from harnessing big data with data science insight.

As someone who is a huge fan of all things data-driven, this opinion may be a little biased. It is my humble opinion, however, that sharp data skills are going to become more and more necessary as time unfolds. While, yes, you could hire an outside data scientist to handle the analyses, a data science technician can only go so far. A generic data scientist will know how to manipulate the data and extract insightful patterns, but unless he or she is knowledgeable about the nuances of the data’s particular domain, his or her insights might not take you very far. To draw great insights from data, you have to know the data, know the business, and know the contextual relationships that are built into the business.

I will illustrate by using myself as an example. I am an environmental engineer… let’s say I wanted to use large volumes of pressure sensor data, temperature sensor data, and image files from a local plant in order to optimize a redesign of one of the plant’s components. If I don’t know how to assimilate this data, run statistical analyses on the large datasets, and generate meaningful insights, then I will have to hire someone to do this. At this time, there aren’t any environmental engineering data scientists, so I would have to hire a data scientist that seems to have the best background for the job.

Could a data scientist successfully perform this analysis? Yes… but as an environmental engineer, I understand the nuances of plant operations and design to a far greater extent than any generic data scientist. If I could do the analysis myself, the results would be ten-fold more useful; not to mention the fact that I would be able to perform ad hoc testing on the data to get immediate decision-support at any phase of the design process. And, thus, you can see the importance of building data science skills regardless of your industry.

So, You Want To Be A Data Scientist?

So, you want to be a data scientist? Well, you are not alone… And we are in luck. There are many, many free resources out there that can help you get started. Here are just a few skills that you should have if you want to do data science. Also listed are resources to help you freely learn the programming skills that data science will require.


Big Data University
Hadoop Fundamentals - self-paced

Big Data University
Hadoop Reporting and Analysis & Java Fundamentals – Videos Available

MapR Academy Technologies
Map Reduce - self-paced


Java Script - self-paced
HTML / CSS - self-paced

HTML/CSS  - self-paced


Java Compiled Language - self-paced


Python Scripting Language
Google’s Python Class – self-paced

Python - self-paced


R @ Code School - self-paced

Data Analysis – Starts Jan 22


Natural Language Processing – Starts Feb 11

SAAS – Starts Feb 15

Intro to Data Science – Starts April 1

Machine Learning  - self-paced

Educational Big Data and Learning Analytics - self-paced