What is a data scientist? Why do you need to think about becoming one if you’re talented with analytical mathematics? What does it take to pin the Data Scientist badge on your pleated shirt? I will answer these questions for you with a surprisingly manageable number of steps.
Let’s take a step back and think about big data. Do you wear a smartwatch? Do you own some technology that tells you how much exercise you’ve done? Do you use Siri, or Cortana, or Amazon Echo, or Google Home? Do you live in a smart city? Do you purchase things online? Do you use a railcard to commute? All of these things harvest data – frequently, and on a massive scale. The way you use your everyday technology, the way you move through your day creates a pattern. This pattern can be taken and matched up to other users’ patterns, and analysed. Who analyses the data to recognize the patterns? The data scientist.
The data science field emerged from the existence of data and the hunger for data scientists that can meaningfully utilize the data to create algorithms. These algorithms needed to recognise and feed patterns are still being developed, and are desperately needed yesterday. And they won’t ever be done developing, at least for as long as data keeps pouring in.
The stage on which the dance of the data happens is large and intimidating; the implications for businesses, corporations, and governments are big, too. This, and the growing hunger for data scientists have created a job description with an attractive salary figure attached at the bottom: the data scientist – Glassdoor’s 2017 best job to have (US-centric).
But does the reality of the data scientist match up with the grandeur of big data?
How difficult is it to get your foot through the proverbial door?
According to a study recently published by 365 Data Science, not that much. Especially if you are a man, who speaks at least one foreign language, and has a second-cycle academic degree. And even less so if you have also been in the workforce for about 4 years, and can code in either R or Python. Hint: that last part is where the money’s at.
Let’s break that down.
First, the data comes from 1,001 LinkedIn profiles of people currently working as Data Scientists. The 365DS Team wanted to know how they got there and where exactly “there” is (in terms of skills and abilities).
So, do you need to be male to be a data scientist?
Not really. But the data tells us that 70% of the cohort is male. Perhaps we can interpret this with a slight nod in the direction of stereotypes learned and internalised in earlier childhood whereby young girls turn away from STEM subjects altogether, and not cry “wolf” (or gender discrimination) just yet. Nonetheless, the numbers are what they are.
Female (and female-identifying) folk, do not despair. What the research shows is that there is clearly a need for female data scientists. A lot of fields used to be male-dominated, but there is a lot of progress in the direction of balance taking place.
Speaking of places, don’t be intimidated out of claiming your place in the data workforce of the future. According to a study conducted by IBM, by 2020, there will be 2,720,000 data science positions in the US alone!
With gender out of the way, let’s look at what else is not-so-intimidating about the data scientist?
How many languages does the data scientist know?
Two. On average, if you are a data scientist, you need to speak one foreign language. This makes sense: both programming and mathematics are language-independent, so to speak. But lucrative job opportunities are still largely centralised in the US, UK, Western Europe, and India, where the primary professional language is English. That said, if your first, or second (or third) language is English, you’re good to go!
But let’s zoom into the first three Big Data Science Players – US, UK, and Western Europe… Care to guess what else is in those parts of the world, too? That’s right: the best higher education institutions.
So, do you need to obtain a degree from a Times Higher Education Top 50 University, and if you do, do your chances of being represented by the data increase?
No, not really. Data science is not a snobby discipline catering to the most prestigious universities and their graduates. Although 28% of the people in the sample did graduate from a Times Top 50 university, a quarter graduated from a university not even present on the Times chart (1001+, plus!), and the numbers in-between are somewhat evenly distributed.
Now, given that the data was collected with the idea to identify what it takes to become a data scientist working for a Fortune 500 giant, we can look at the data based on who made the cut to Fortune 500, and who didn’t, based on university ranking.
You will notice that the two largest groups that make up more than half of the people who work for a Fortune 500 company in the sample are graduates from a Top 50 university (28%). But the graduates from a university that did not make the cut for the Times chart is still 23%.
The two largest statistical groups that make up the representatives of non-Fortune-500 employees mirror the stats above. In addition, if you look only at the data scientists who have graduated from a top-tier university, the divide is effectively equal: half go to Fortune 500 companies, and half steer clear from them, securing positions in smaller companies and start-ups.
We can argue that none of this is coincidental. Sure, a diploma from a well-ranked institution acts as a guarantee for skill and qualification, and signals to a future employer that you have been “selected” amongst your peers. And it usually includes some serious student debt. But the high employability of graduates from less prestigious universities suggests that there are other ways to showcase your credibility and drive. You can still signal that you are knowledgeable and skilled through self-preparation.
If I prepare by myself, am I more likely to get the job?
Well, there is not an unequivocal answer to this question, but the data make some suggestions. At least 40% of the data scientists in the sample have reported on their LinkedIn profiles they have completed at least one online course on a topic related to their field. And that’s just the ones who have reported it; these numbers may indeed be much higher.
But to get back to the relevance of university degrees from prestigious universities… the data suggests that all cohorts engage in self-learning of some kind. But the groups who take the lead in external preparation are the people from less prestigious universities. Go figure!
It bears pointing out that graduates of the Times Top 50 schools don’t lag far behind. About 35% of them report having taken online courses, whereas lower-ranking universities seem to encourage roughly 50% of their pupils to go the extra mile. Self-preparation is a relevant contender when it comes to having the qualifications to become a data scientist.
And now that we know where data scientists graduate from, we need to ask what they graduated in!
Do you need to study Data Science to land the data scientist title?
Absolutely not. First of all, data science is a totally new discipline. It’s only been around a grand total of ten or so years, and universities by and large are yet to incorporate a holistic Data Science track in their curriculums.
There is a lot of diversity in terms of the academic backgrounds of current data scientists. The sample, for example, is richly populated with people coming from Computer Science (20%), Statistics and Mathematics (19%), and Economics and Social Sciences (19%). These are heavy on the quantitative aspect of things, but are otherwise diverse, which can serve to give you some peace of mind. If you are doing quantitative degree, you are on the right track.
That said, it is also not unheard of to go into data science from fields like Physics, Chemistry, and Biology, or Engineering. As long as you’re a quant enthusiast, the (data science) world is your oyster!
Finally, technical qualifications.
A number of mobile app developers are starting to use data more effectively. IT Rate, a leading app development resource, has said that big data has simplified their responsibilities and shaped their professionals’ roles in the years to come.
Is a Bachelor’s degree sufficient for the data scientist title?
Yes, and no. The majority of the professionals in the sample have at least a Master’s degree under their belts (48%), whereas 27% hold a PhD. Going into the field without either of these is not without precedent, as 15% of the data scientists in the sample have done just that. But the cohort is definitely underrepresented.
Do you need to be proficient in Java, C/C++, Matlab, SQL, SAS, R, Scala, LaTeX, and the gang?
No, no, and no. Whereas that will certainly be an impressive skillset, it is definitely not a must in the current data science world. Things differ a little according to country of employment and industry, of course, but the consensus is the following…
If you know R and (… or) Python, you will have a reasonable fighting chance. SQL is certainly also something to consider, although it is for the time being (and according to the sample) lagging behind in popularity (40% usage across the board, compared to 53% for both R and Python). The complete breakdown of programming languages and industry usage is beyond the scope of this article (but you can find it here).
Key takeaways for the future data scientists?
One conclusion to draw from the stats so far is that you don’t really need a PhD, a degree from a fancy university, or a Swiss army knife of programming languages (or languages altogether) to become a data scientist.
An aptitude for quantitative analyses, a good head on your shoulders, and a driven sense of curiosity ought to be enough.
After all, didn’t we mention that by 2020 there will be about 2,720,000 data science positions in the US alone?