The ONLY Data Science Roadmap You Have to Get a Job

you trying to grow to be an information scientist and don’t know where to begin?

In this text, I would like to give you an easy, no-nonsense learning roadmap that you could follow to interrupt into the industry.

By the tip, you’ll finally have a transparent understanding of what’s required and the perfect resources to make use of, which should hopefully reduce any overwhelm you’ll have and allow you to land that data science job quicker!

A hill that I’m willing to die on is that, in my view, statistics is a very powerful area it is best to know as an information scientist.

Latest machine learning trends come and go, technologies often get replaced, but statistics has stood the test of time for hundreds of years.

In response to Wikipedia:

Given the title is “data” scientist, I believe it’s obvious how vital statistics is to our field.

Fortunately, you don’t have to have a PhD in causal inference or stochastic calculus to have the required statistics knowledge. The basics are a very powerful and literally 90% of the job.

What To Learn

The areas it is advisable to strongly grasp are:

Summary Statistics — Mean, median, mode, variance, correlations, anything that permits you to summarise data to attract interesting conclusions.
Visualisations — Learn to plot data with graphs like bar chart, line graph, pie chart, etc. In any case, an image speaks a 1000 words.
Probability Distributions — Learn essentially the most common ones like Normal, Poisson, Binomial and Gamma. These are those I take advantage of most often.
Probability Theory — This area is kind of big, however the foremost things to learn are: random variables, central limit theorem, sampling and maximum likelihood estimation.
Hypothesis Testing — When you are going to work on any experiments, it is advisable to understand how they’re statistically run. This involves learning about confidence intervals, significance levels, the z-test, the t-test, and test statistics. You just have to know find out how to run hypothesis testing.
Bayesian Statistics — It’s well value knowing some Bayesian statistics, as I find people throw around this term loosely in the sphere on a regular basis without really understanding. It’s a large area, but as at all times, learn the basics, reminiscent of Bayes’ theorem, conjugate priors, credible intervals, and Bayesian regression.

How To Learn

As I discussed originally, I would like this roadmap to be easy and forestall any evaluation paralysis it’s possible you’ll experience, so to learn nearly all of the above, I like to recommend getting the Practical Statistics for Data Science (affiliate link) textbook.

Nevertheless, it doesn’t cover Bayesian statistics, and for that, I like to recommend Think Bayes (affiliate link) textbook.

These two books are all you wish they usually are specifically designed for data scientists and are in Python.

Statistics, by nature, is a fairly applied field, and a number of the concepts require pure maths knowledge to completely understand.

Moreover, in terms of areas like machine learning, you wish a very good understanding of linear algebra and calculus to completely grasp what is going on under the hood.

What To Learn

Calculus

Calculus is how machine learning algorithms actually “learn.” Their “learning” is completed through numerical continuous optimisation, and the areas it is best to learn are:

What’s a derivative, and what’s it measuring?
Learn the derivatives of ordinary functions like sine, cosine, exponential, tan, etc.
What are turning points, maxima and minima?
Chain and product rules are the rationale neural networks work so well, as they’re the core process behind backpropagation.
Understand partial derivatives and their use in multivariable calculus.
What’s integration, and what’s it doing?
Integration by parts and substitution.
The integral of ordinary functions like sine, natural log and other polynomials.

Linear Algebra

Linear algebra is a mathematical field that deals with vectors, matrices, and their transformations.

It’s best to learn:

Vectors, their magnitude, orientation and component. Moreover, operations reminiscent of the dot and cross product rules.
Matrices and their operations, including trace, inverse, transpose, dot product, and cross product rules.
Learn find out how to solve systems of linear equations through techniques like elimination, row reduction, and Cramer’s rule.
Gain an understanding of eigenvalues and eigenvectors. These are the inspiration of techniques like Principal Component Evaluation, which helps reduce dimensionality in datasets.

How To Learn

In previous videos, I beneficial some textbooks which, while useful, were quite dense and never practical for most individuals to get through in only a couple of months.

That’s why I now suggest taking the Mathematics for Machine Learning and Data Science Specialization on Coursera.

This course is tailored specifically for data science with exercises in Python. It skips the unnecessary theory and focuses on what you really need for real-world work.

There are two, and only two, programming languages you wish: Python and SQL.

What To Learn

Python

Keep it easy and learn the basics:

Variables and data types
Boolean and comparison operators
Control flow and conditionals
For and while loops
Functions and classes

You furthermore may wish to learn specific scientific computing libraries:

SQL

You ought to learn all the elemental functions needed for evaluation in SQL. It’s quite a small language, so there aren’t many things to learn.

SELECT * FROM (standard query)
ALTER, INSERT, CREATE (modify tables)
GROUP BY, ORDER BY
WHERE, AND, OR, BETWEEN, IN, HAVING (filter tables)
AVG, COUNT, MIN, MAX, SUM (aggregate functions)
FULL JOIN, LEFT JOIN, RIGHT JOIN, INNER JOIN, UNION
CASE (if statements)
DATEADD, DATEDIFF, DATEPART (date and time functions)

How To Learn

There are various introductory Python and SQL courses, they usually all teach the identical material. So, select one and get going with it. You literally can’t go improper here.

When you need a advice, then checkout W3Schools or freeCodeCamp videos. I actually have used each and located them superb.

In addition to Python and SQL, it is advisable to invest a while learning other technologies which are used on the job.

What To Learn

There are such a lot of tools, and each company is different, but these are those that remain consistent throughout:

Git and GitHub — Virtually every company uses this for version control, so it is advisable to learn it; there’s no way around it, I’m afraid.
Bash/Zsh — You’ll work within the terminal loads, and nearly all of corporations depend on UNIX-like systems, so it is advisable to be comfortable operating within the command line.
Poetry / PyEnv / UV — Managing packages and Python versions is crucial in any real-world application, so it’s well value getting acquainted with these tools.

How To Learn

For git, I like to recommend this crash course from freeCodeCamp:

For learning terminal and bash shell scripting, I also recommend this video from freeCodeCamp.

And for learning PyEnv, Poetry and UV, take a look at these articles:

Right, time for the fun stuff!

Machine learning is an enormous field, and we will’t learn every little thing, even when we tried our whole lives.

To be an information scientist, like I at all times say, we only have to know the basics and a bit of little bit of deep learning.

Forget learning LLMs, transformers, diffusion models, etc. That will not be obligatory for nearly all of entry-level positions, and to be honest, for a lot of jobs typically.

Concentrate on nailing the fundamentals, as they transcend into every little thing else. To this present day, I still use basic regression models, as do many senior machine learning engineers I work with.

It’s all in regards to the application and understanding your problem, moderately than attempting to be flashy by utilizing the newest state-of-the-art technology when it will not be needed.

What To Learn

The important thing algorithms and ideas it is best to learn are:

Linear, logistic and polynomial regression.
Decision trees, random forests and gradient-boosted trees.
Support vector machines.
Regular neural networks.
K-means and K-nearest neighbour clustering.
Regularisation, bias vs variance tradeoff and cross-validation.

How To Learn

The next two resources is all you wish. So, work through them iteratively, and your machine learning knowledge will surpass that of most practitioners within the industry. Trust me.

The primary course ML course I took was Machine Learning Specialisation by Andrew Ng and I believe it might be the perfect one on the market. You possibly can get away with just doing this one by itself, because it’s that good.

The second might be the perfect machine learning book ever written: Hands-On ML with Scikit-Learn, Keras, and TensorFlow (affiliate link). If I had to present just one book to learn machine learning, this might be it!

For my part, that is optional, but I do know lots of you’re considering deep learning, so I actually have included it here for completeness.

I personally wouldn’t waste an excessive amount of time here, as it might be easy to wander off in all the newest developments.

What To Learn

These deep learning concepts have stood the test of time, in order that they are well value investing your learning in:

How To Learn

These are the resources I actually have used to learn deep learning, they usually are all you wish.

Deep Learning Specialization by Andrew Ng. — That is the follow-on course from the Machine Learning Specialisation and can teach all it is advisable to learn about deep learning, CNNs, and RNNs.

Again, the Hands-On ML with Scikit-Learn, Keras, and TensorFlow (affiliate link) textbook as a wonderful deep learning section from chapter 14 onwards.

Finally, a few of you’ll have heard of Andrej Karpathy, when you haven’t he might be probably the greatest AI researchers in the mean time and has worked at Tesla and OpenAI.

Anyway, his Neural Networks: Zero to Hero YouTube course is phenomenal and teaches you find out how to construct your personal Generative Pre-trained Transformers (GPT) from scratch.

When you undergo every little thing in this text, you’ll have excellent knowledge to enter the info science field.

Nevertheless, having this information will not be enough; it is advisable to construct a solid portfolio to land a job.

That’s why I like to recommend trying out my previous article, where I explain the precise projects it is advisable to construct to secure a job as soon as possible.

See you there!

STOP Constructing Useless ML Projects – What Actually Works | Towards Data Science
towardsdatascience.com

I offer 1:1 coaching calls where we will chat about whatever you wish — whether it’s projects, profession advice, or simply determining the next step. I’m here to allow you to move forward!

1:1 Mentoring Call with Egor Howell
topmate.io

The ONLY Data Science Roadmap You Have to Get a Job

What To Learn

How To Learn