According to a recent blog post from the SoundCloud team, the company has recently restructured and reorganized the way its data scientists and analysts function.
The goal is to help them be more effective, happy and productive members, hopefully improving many internal processes and operations.
If you want to dive into the process in more detail, you can read about it here.
Essentially, it parsed its entire strategy down into smaller segments or steps — beginning with problem definition, to preparation, to solution development, to production-ready deployment, and finally validation and maintenance. It’s a process that helps iron out the kinks and optimizes the solutions it’s deploying.
Looking at this, it’s not out of the question to wonder what it means for the big data world as a whole. How can this be evolved and reproduced elsewhere?
To make sense of it, first you need to understand the foundation and processes that help SoundCloud stand and walk.
What SoundCloud’s Data System Requires
Operating on a global scale, the SoundCloud audience uploads about 12 hours of audio content every minute. That is an extremely large amount of data being stored and processed on its servers. For each audio file uploaded, it must be transcoded and stored in varying formats.
This allows the customers or audience to download content in the format they prefer and use it on the device they prefer, such as an iPhone or iPod over a standard MP3 player.
Because SoundCloud is a hub for thousands — if not millions — of artists, bands, podcasters and audio creators, it needs to have an incredibly vast storage capacity to deliver.
Furthermore, all content uploaded through the service can and will be shared via blogs, websites, social networks, mobile apps and chat services.
The company’s audience is active 24 hours a day, seven days a week. That means traffic and performance requirements fluctuate. If and when multiple regions use the platform at the same time — such as the U.S. and U.K. — there are extremely high loading spikes.
That means the team behind the platform needs to be able to collect all this user and performance data and put it to work.
Alexander Grosse, vice president of engineering at SoundCloud, says “if our storage crashed, that would be the end of SoundCloud, we [must] focus on our platform’s core functionality.”
It needs the data its scientists are collecting to improve, enhance and support the product. Furthermore, it needs to be converted and made available for nearly everyone, including those with little to no analytics experience.
1. First Comes Understanding
Before organizing or making sense of data, you first have to understand the problem and what solutions can be used to solve it. This means you must understand the business needs, identify issues via metrics and narrow the scope so it’s manageable.
For example, one problem SoundCloud might have is performance issues. At the outset, it won’t exactly know where the issues are coming from or what’s causing them. Before sorting through collected data and putting together a new solution or strategy, it needs to understand the problem.
This would mean collecting additional information about traffic spikes, performance and power requirements, and more. This is the preparation and problem definition stages of SoundCloud’s multistep process.
Every second of every day we are creating new data. On Google alone, there are over 40,000 search queries every second, which is about 3.5 searches each day and 1.2 trillion searches a year. You need to be able to select the right data that’s going to solve your solution, as there’s so much to review.
2. Solution Development Or Fixing the Problem
After you understand the problem and solutions that can be deployed, you need to put your plans into action.
This means coming up with a solution and outline if you haven’t already, brainstorming various steps and solutions, and trialing to see if it will actually work. Prototyping could essentially be included here as it’s a vital part of the development and testing phase.
Additionally, peer feedback and customer responses can be used to further solve complicated issues and extended situations. One problem might take regular maintenance or trials to come up with a solution. This also requires not only sorting through relevant data, but also finding and organizing it in the first place. If you did a good job in preparing yourself, this should already be taken care of.
3. Validation And/Or Deployment
Before a solution can be deemed successful, you need to deploy it and validate its progress. More importantly, you need to stay on top of maintenance and future updates to ensure it’s clean, efficient and working.
SoundCloud uses A/B testing and the results of said testing to monitor the solutions it puts in place. This also ensures that anything it comes up with meets expectations and keeps customers satisfied. It’s possible to deploy a new system or solution that works but also displeases your audience.
Enabling a validation and deployment process to collect and analyze this data is necessary, just as SoundCloud has done. This not only improves problem-solving, but also the accuracy of the solutions you’re putting in place.
4. Broadcasting And Sharing
Finally, the data, solutions and progress all need to be shared and broadcasted within a team or company.
Data scientists need to make the information available to everyone, in a way they will understand. This is crucial, because data scientists might look at a set of data and not immediately know how and where it can be used. Sharing data within an organization is a necessary element of the process.
A marketing team may have the wherewithal to take data about customer purchases and turn it into something they can use for future promotions.
Data scientists, on the other hand, look at it and simply see it for what it is and possibly where it can be applied. Besides, it’s not their job to understand and put all the data into action across the organization. They need to organize and make it more convenient and readable.
By studying SoundCloud’s optimization strategy, data scientists can streamline their own operations and make the most of a smoother process.
Image by Kevin Ku