began using Pandas, I assumed I used to be doing pretty much.
I could clean datasets, run groupby, merge tables, and construct quick analyses in a Jupyter notebook. Most tutorials made it feel straightforward: load...
you're analyzing a small dataset:
]
You must calculate some summary statistics to get an idea of the distribution of this data, so you utilize numpy to calculate the mean and variance.
import numpy as np
X...
: after I first began using Pandas, I wrote loops like this on a regular basis:
for i in range(len(df)):
if df.loc > 1000:
df.loc = "high"
else:
df.loc = "low"
It worked. And I assumed, Seems… not a lot.
I...
an actual issue when coping with very large datasets. What I mean by “very large” is data that exceeds the capability of a single machine’s RAM.
A few of the key friction points Pandas...
with pandas, you’ve probably discovered this classic confusion: must you use loc or iloc to extract data? At first glance, they appear almost an identical. Each are used to slice, filter, and retrieve rows or columns from...
, I discussed the way to create your first DataFrame using Pandas. I discussed that the very first thing it's essential master is Data structures and arrays before moving on to data evaluation with...
! In case you’ve been following along, we’ve come a good distance. In Part 1, we did the “dirty work” of cleansing and prepping.
In Part 2, we zoomed out to a high-altitude view of...
! Welcome back to the “EDA in Public” series! That is Part 2 of the series; when you haven’t seen Part 1 yet, read it here. Here’s a recap of what we conquered.
In Part...