A Proven Method to Remember Data Science Concepts for as Long as You Need


Image by me. Via my good pal, Midjourney

The issue with self-learning data science

Each time I would like to put in a library with Anaconda, the -c a part of the command keeps moving around. So, like most individuals, I google it, sometimes 3-4 times a day:

conda install -c conda-forge library_name

Sounds familiar?

This little example signals a fundamental flaw in the best way most of us learn data science and machine learning today: Data science knowledge is cheaper than air, so we don’t take learning it as seriously as we must.

We see university students busting their brains to recollect a lot information to pass exams and tests. In the event that they don’t do well, they are going to get chucked out from the institution they paid a lot for.

As self-taught data scientists, we’ve got none of that pressure. All we’ve got is our self-discipline that keeps persuading us we’re doing a superb job as we watch a YouTube course on our couch.

Our learning processes are haphazard. We learn something latest and jump to the subsequent shiny thing without the very first thing quite penetrating our brains.

We leave information retention as much as probability.

Once we actually sit right down to practice what we “learned” (air quotes), we’ll realize we already forgot 80% of the brand new knowledge within the time it took to activate our computers.

So, we start googling. And after this behavior becomes the norm, we brag to others how we’re exceptional at googling in our little tweets. What we’re actually doing is subtly signaling to others that we’ve got no reliable systems in anyway to learn and retain the overwhelming amount of data in data science.

Through no fault of our own, we became the worst sort of learners.

The answer

Without effective methods and tools to learn and retain latest knowledge, it is hard to develop into an information scientist.

There’s just a lot to learn: math, statistics, machine learning theory, the functions and methods in dozens of Python libraries, and so forth. It is difficult to maintain track of all this information.

Image by Wikipedia. Wikimedia commons.

The Ebbinghaus forgetting curve above shows the speed at which latest information leaks from memory.

It is evident from the graph that it’s going to take only six days to lose latest info completely. And when it’s information learned in our haphazard and careless ways, it’s going to develop into even shorter.

But when you make a serious effort to place latest knowledge right into a reliable repetition system, you consciously select to recollect it for the remaining of your life or so long as you would like it.

Can I quite possibly be talking about rote learning (🤒)? No, in fact not. I’m talking about spaced repetition!

Spaced repetition is a robust memory technique that greatly takes advantage of the Ebbinghaus forgetting curve:

Image by Wikipedia. Wikimedia commons.

Spaced repetition re-exposes you to latest information at increasingly larger optimal intervals, each interval coming just when a memory leak is about to occur.

This can reset your memory and increase the subsequent interval where you’ve gotten to review the fabric.

What are the advantages of SR?

Perhaps, essentially the most useful thing about spaced repetition is the best way it transfers knowledge from short to long-term memory.

Other than the efficient use of time and improved retention, studies show the next advantages of the system:

  • Personalization: Customizable to your unique preferences, because it adapts to your pace and level of mastery of the fabric.
  • Improved comprehension: By reinforcing concepts and connections continually over time, it becomes easier so that you can construct a network of data and understand complex topics more deeply.
  • Increased motivation: Spaced repetition gives me an amazing sense of progress and achievement as my repetition intervals get longer.

These are probably why many medical students swear their lives on this method because they use it to memorize the names of bones, blood vessels, nerve branches, and all of the exhausting details concerning the human body.

Data science might not be as complicated, but we still have a pretty big amount of things to recollect.

Spaced repetition algorithms

There are numerous algorithms implementing spaced repetition in practice, the most well-liked of which is SuperMemo.

SuperMemo is a series of SR algorithms that has steadily been coming out since 1982. The creator, Dr. Piotr Wozniak, was recognized by Wired magazine because the “inventor of a method to show people into geniuses” in 2008.

So, how do you turn right into a genius with this method?

After sufficiently learning the underlying concepts and facts, you first break down the fabric into chunks using flashcards (yes, I understand it is a big problem but bear with me till the top).

After making a database of cards, you begin to review them in sessions. The primary session shows the cards within the order they were added or shuffled (based in your preferences). Then, you rate the cards on how well you recall them.

In SuperMemo-2, ther are six options:

  • 0: I don’t have any clue in anyway
  • 1: Incorrect, but after seeing the reply, it rings a bell
  • 2: Incorrect, but after seeing the reply, it got here rushing back to me
  • 3: Correct response, but I needed to dig deep and make an effort to recollect
  • 4: Correct response, but I’m hesitating
  • 5: I remember it as if it was minutes ago

Then, the chosen rating is plugged into long calculations that involve the variety of times the cardboard was successfully recalled before, the easiness factor of the cardboard (don’t ask), and the inter-repetition interval. The end result will determine when the cardboard have to be shown again.

For cards rated below 4, SuperMemo will ask you to review the cardboard as over and over as you would like through the current session until the rating goes above 4.

Each accurately recalled card will probably be shown after increasingly long intervals. For instance, in the event you memorize that the function to convert a timestamp right into a datetime is datatime.datetime.fromtimestamp, you simply need to review the cardboard showing this information 4–5 times over the span of a month to recollect it for the approaching six months.

As you may imagine, it is a significantly better repetition system than rote learning, fixed interval repetition, or worst, repetition when the mood strikes you.

Spaced repetition tools

There are numerous SR tools powered by SuperMemo-like algorithms.

The primary (and this one is the king) is Anki. It’s open-source and implements a modified version of SuperMemo-2. As a substitute of providing six recall rankings, it shows 4:

Anki getting used to memorize Russian vocab. Image by Wikipedia. Wikimedia commons.

Because it is open-source, it has a really antique look, but it surely is a cross-platform, free application (aside from the iOS version). The GitHub repo of the software has over 13k stars, which suggests massive support from the community.

They’ve been working on Anki for over ten years, and the present version has the next features:

  1. Available all over the place: Windows, macOS, Linux, Android, and iOS (this one costs money)
  2. Fully customizable: create your individual flashcards, organize them into decks, and set your individual parameters to the spaced repetition algorithm
  3. Sync across devices: the pc version of Anki is the essential app and mobile and web versions are only companions but synced.
  4. Multimedia support: Add images, audio, video, text formatting, and LaTeX to make flashcards memorable and interesting. There’s also support for image occlusions to memorize visual information.
  5. Add-ons: much like Python extensions, you’ll be able to create and add your individual functionality to the software, like custom keyboard shortcuts, themes, and advanced statistics.
  6. Pre-built decks: community continuously shares decks with pre-made cards for popular topics. This includes tons of of 1000’s of cards on language learning or virtually any subject in university exams and plenty of other great/cool/weird topics.

One obvious pain point we didn’t stress is creating flashcards unavailable locally.

I do know that data science is a comparatively young field with regards to spaced repetition. Anyone would have an infinite amount of data to convert into flashcards, which sounds tedious and sickening. However it is a vital evil.

I firmly imagine that the general time it takes so that you can create flashcards for one topic and totally master it with spaced repetition will probably be much lower than hours of googling or dozens of vicious cycles of forgetting and relearning.

Besides, we’re lucky to be living within the golden age of AI (we’re, aren’t we?). There are already low-cost AI-powered flashcard software like Monic.ai.

I already tried Monic.ai, and it looks great. You upload a screenshot or a PDF file, and it routinely converts the text inside into flashcards in mere seconds. It’s powered by spaced repetition as well.

If you happen to resolve to offer it a go, you need to consider downloading the GoFullPage Chrome extension to take full-page screenshots or know the best way to save web pages as PDFs so that you may turn any online article, tutorial, or documentation page of Python frameworks into flashcards with Monic.ai.


It’s time to change our approaches to learning data science. We should always ditch our careless, haphazard ways of watching YouTube videos only for the sake of watching or taking courses back-to-back in the hunt for a latest worthless e-certificate.

We should always stop learning something once and hope for the most effective that it stays there. We should always stop wishful pondering.

We should always stop leaving memory as much as probability.

As a substitute, we should always take deliberate actions to memorize every vital fact, piece of theory, concept, terminal command, Python function, or function argument for so long as we’d like them.

Yes, it will take some getting used to, but once we’re, we are able to significantly shorten the time it takes to go from “learning data science online” to “doing data science in a job that pays six figures”.

Thanks for reading!

Loved this text and, let’s face it, its bizarre writing style? Imagine accessing dozens more similar to it, all written by an excellent, charming, witty creator (that’s me, by the best way :).

For under 4.99$ membership, you’ll get access to not only my stories, but a treasure trove of data from the most effective and brightest minds on Medium. And in the event you use my referral link, you’ll earn my supernova of gratitude and a virtual high-five for supporting my work.

Image by me. Via Midjourney.


What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x