How to become a data scientist?

"Today's world is drowning in data and starving for insights." - Anon

I thought that was quite profound.

There are roughly 2.9 billion internet users in this world. Close to half the world's population uses the internet and for an internet company, that mean generation of a lot of data. And data being what it is, every bit is laden with information that can be used to make a business decision.

But the sheer volume of data, and the velocity with which it is generated make it impossible to analyse with conventional tools. This gave rise to the data scientist. A data scientist makes a hypothesis about a certain phenomenon and validates it with data. Just like how a physicist or a chemist makes a hypothesis and conducts experiments to prove it.

There are three main skills that a data scientist needs. On the one hand, a data scientist needs to be fluent in a software languages and algorithms. The data scientist must also be an adept mathematician and statistician. And finally, a good data scientist should have domain knowledge.

I will explain the importance of each of these skill sets in the next sections -

A software engineer

Algorithmic programming is an important skill for a data scientist. Ability to understand the use of algorithms and data structures to solve problems and the ability to implement them is a pre-requisite. So, if you've been exposed to algorithmic programming and practice it regularly, you've already got a third of the skill set required to become a data scientist.

There is no specific preference in language as such, but it is expected that a good data scientist should be well versed with more than one language. Also, it is important to understand that a developer who can run Hadoop or knows how to use big data tools or is a backend engineer isn't a data scientist. This is only one part of the skill set required to become a data scientist and it demands strong algorithm skills.

A mathematician/Statistician

A lot of the work of a data scientist goes into making sense of large amounts of data. The majority of the time, a data scientist spends in organising data in an understandable manner and then visualising it through various charting libraries. Once this is done, they would look for patterns and trends to derive an inference or find a solution to a problem.

Also, it is important for a data scientist to be fluent in advanced statistics concepts like regression analysis, cluster analysis, and optimisation techniques, and more so, the knowledge of knowing what to apply where and the ability to communicate mathematical inference in such a way that actionable business decisions can be made.

Business/Domain knowledge

There is no such thing as a general data scientist. The same sets of data mean different things in different domains. A good data scientist should know the inner workings of a particular domain, so as to make the right hypothesises and to understand the business significance of problems. This can either come from interest or past experience.

This is important for communicating the importance of data science in the business. A lot of data science requires high level academics and that in itself will be tough for everyone in the business value chain to understand. Domain knowledge will help a data scientist communicate how a decision derived from data science will help better operations or marketing or sales etc.

If you've got these three key skill sets, then you're ready to be a data scientist. But why should you be one?

A demand supply gap 

The field of data science is seeing a big demand-supply gap. Every company worth working for is generating a lot of data and insights from this data is detrimental to the company's future. For this, employers are willing to break the bank to onboard the best talent.

Also, it's a really fun and intellectually stimulating job. It's almost like a treasure hunt, looking for insights in this vast ocean of data. But then again, like with anything else, you've got to be also genuinely interested in it.

