Pandas

A Tale of Two Variances: Why NumPy and Pandas Give Different Answers

you're analyzing a small dataset: ] You must calculate some summary statistics to get an idea of the distribution of this data, so you utilize numpy to calculate the mean and variance. import numpy as np X...

Why You Should Stop Writing Loops in Pandas 

: after I first began using Pandas, I wrote loops like this on a regular basis: for i in range(len(df)): if df.loc > 1000: df.loc = "high" else: df.loc = "low" It worked. And I assumed, Seems… not a lot. I...

PySpark for Pandas Users

an actual issue when coping with very large datasets. What I mean by “very large” is data that exceeds the capability of a single machine’s RAM.  A few of the key friction points Pandas...

The Rule Everyone Misses: The right way to Stop Confusing loc and iloc in Pandas

with pandas, you’ve probably discovered this classic confusion: must you use loc or iloc to extract data? At first glance, they appear almost an identical. Each are used to slice, filter, and retrieve rows or columns from...

Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames

, I discussed the way to create your first DataFrame using Pandas. I discussed that the very first thing it's essential master is Data structures and arrays before moving on to data evaluation with...

EDA in Public (Part 3): RFM Evaluation for Customer Segmentation in Pandas

! In case you’ve been following along, we’ve come a good distance. In Part 1, we did the “dirty work” of cleansing and prepping. In Part 2, we zoomed out to a high-altitude view of...

EDA in Public (Part 2): Product Deep Dive & Time-Series Evaluation in Pandas

! Welcome back to the “EDA in Public” series! That is Part 2 of the series; when you haven’t seen Part 1 yet, read it here. Here’s a recap of what we conquered. In Part...

7 Pandas Performance Tricks Every Data Scientist Should Know

an article where I walked through among the newer DataFrame tools in Python, comparable to Polars and DuckDB. I explored how they'll enhance the information science workflow and perform more effectively when handling large...

Recent posts

Popular categories

ASK ANA