The Rule Everyone Misses: The right way to Stop Confusing loc and iloc in Pandas

-

with pandas, you’ve probably discovered this classic confusion: must you use loc or iloc to extract data? At first glance, they appear almost an identical. Each are used to slice, filter, and retrieve rows or columns from a DataFrame — yet one tiny difference in how they work can completely change your results (or throw an error that leaves you scratching your head).

I remember the primary time I attempted choosing a row with df.loc[0] and wondered why it didn’t work. The rationale? Pandas doesn’t all the time “think” when it comes to positions — sometimes it uses labels. That’s where the loc vs iloc distinction is available in.

In this text, I’ll walk through an easy mini project using a small student performance dataset. By the tip, you’ll not only understand the difference between loc and iloc, but additionally know exactly when to make use of each in your individual data evaluation.

Introducing the dataset

The dataset comes from ChatGPT. It comprises some basic student exam rating records. Here’s a snapshot of our dataset

import pandas as pd
df = pd.read_csv(‘student_scores.csv’)
df

Output:

I’ll attempt to perform some data extraction tasks using loc and iloc, like

  • Extracting a single row from the DataFrame
  • Extracting a single value
  • Extracting multiple rows
  • Slicing a variety of rows
  • Extracting specific columns and
    Boolean Filtering

First, let me briefly explain what loc and iloc are in Pandas.

What’s loc and iloc

Loc and iloc are data extraction techniques in Pandas. They’re quite helpful for choosing data from records.

Loc uses labels to retrieve records from a DataFrame, so I find it easier to make use of. Iloc, nonetheless, are helpful for a more precise retrieval of records, because iloc selects data based on the integer positions of the rows and columns, much like how you’d index a Python list or array.

But should you’re like me, you is perhaps wondering. If loc is clearly easier due to row labels, why hassle using iloc? Why hassle attempting to determine row indexes, especially should you’re coping with large datasets? Listed here are a few reasons.

  • A whole lot of times, datasets don’t include neat row indexes (like 101, 102, …). As a substitute, you could have a plain index (0, 1, 2, …), or you would possibly misspell row labelling when retrieving records. On this case, you’re higher off using iloc. Later in this text, it’s something we’ll be addressing also.
  • In some scenarios, like machine learning preprocessing, labels don’t really matter. You simply care a few snapshot of the info. For example, the primary or last three records. iloc is de facto helpful on this scenario. iloc makes the code shorter and fewer fragile, especially if labels change, which could break your machine learning model
  • A whole lot of datasets have duplicate row labels. On this case, iloc all the time works since positions are unique.
  • The underside line is, use loc when your dataset has clear, meaningful labels and you would like your code to be readable.
  • Use iloc once you need position-based control, or when labels are missing/messy.

Now that I’ve cleared the air, here’s the fundamental syntax for loc and iloc below:

df.loc[rows, columns]
df.iloc[rows, columns]

The syntax is just about the identical. With this syntax, let’s attempt to retrieve some records using loc and iloc.

Extracting a single row from the DataFrame

To make a correct demonstration, let’s first change the column index and make it student_id. Currently, pandas is auto-indexing:

# setting student_id as index
df.set_index('student_id', inplace=True)

Here’s the output:

Looks higher. Now, let’s attempt to retrieve all of Bob’s records. Here’s how one can approach that using loc:

df.loc[102]

All I’m doing here is specifying the row label. This could retrieve all of Bob’s records.

Here’s the output:

name   Bob
math    58
english 64
science 70
Name: 102, dtype: object

The cool thing about that is that I can drill down, kinda like a hierarchy. For example, let’s attempt to retrieve specific info about Bob, like his rating on math.

df.loc[102, ‘math’]

The output can be 58.

Now let’s do this using iloc. If you happen to’re aware of lists and arrays, indexing all the time starts at 0. So if I would like to retrieve the primary record within the DataFrame, I’ll need to specify the index 0. On this case, I’m attempting to retrieve Bob, which is the second row in our DataFrame — so, on this case, the index can be 1.

df.iloc[1]

We’d get the identical output as above:

name   Bob
math    58
english 64
science 70
Name: 102, dtype: object

And if I attempt to drill down and retrieve the maths rating of Bob. Our index would even be 1, provided that math is on the second row

df.iloc[1, 1]

The output can be 58.

Alright, I can wrap this text up here, but loc and iloc offer some more impressive features. Let’s speed-run through a few of them.

Extract Multiple Rows (Specific Students)

Pandas lets you retrieve multiple rows using loc and iloc. I’m gonna make an indication by retrieving the records of multiple students. On this case, as a substitute of storing a single value in our loc/iloc method, we’d be storing a listing. Here’s how you possibly can do this with loc:

# Alice, Charlie and Edward's records
df.loc[[101, 103, 105]]

Here’s the output:

And here’s how one can do this with iloc:

df.iloc[[0, 2, 4]]

We’d get the identical output:

I hope you’re getting the hang of it.

Slice a Range of Rows

One other helpful feature Python Pandas offers is the power to slice a variety of rows. Here, you possibly can specify your start and end position. Here’s the syntax for loc/iloc slicing:

df.loc[start_label:end_label]

In loc, nonetheless, the tip label can be included within the output — quite different from the default Python slicing.

The syntax is identical for iloc, with the exception that the tip label can be excluded from the output (similar to the default Python slicing).

Let’s walk through an example:

I’m attempting to retrieve a variety of scholars’ records. Let’s try that using loc:

df.loc[101:103]

Output:

As you possibly can see above, the tip label is included within the result. Now, let’s try that using iloc. If you happen to recall, the primary row index can be 0, which might mean the third row can be 2.

df.iloc[0:3]

Output:

Here, the third row is excluded. But should you’re like me (someone who questions things rather a lot), you is perhaps wondering, why would you would like the last row to be excluded? In what scenarios would that be helpful? What if I told you it actually makes your life easier? Let’s clear that up real quick.

Assuming you must process your DataFrame in chunks of 100 rows each.

If slicing were inclusive, you’d need to do some awkward math to avoid repeating the last row.

But because slicing is exclusive at the tip, you possibly can do that quite easily, like so.

df.iloc[0:100] # first 100 rows
df.iloc[100:200] # next 100 rows
df.iloc[200:300] # next 100 rows

Here, there can be no overlaps, and there can be consistent chunk sizes. Another excuse is the way it’s much like how ranges work in Pandas. Often, when you must retrieve a variety of rows, it also starts at 0 and doesn’t include the last row. Having this same logic in iloc slicing is de facto helpful, especially once you’re working on some web scraping or looping through a variety of rows.

Extract Specific Columns (Subjects)

I’d also like to introduce you to the colon : sign. This lets you retrieve all records in your DataFrame using loc. Much like the in SQL. The cool thing about that is which you can filter and extract a subset of columns.

This is normally where I find myself starting. I take advantage of it to get an summary of a selected dataset. From there, I can begin to filter and drill down. Let me show you what I mean.

Let’s retrieve all records:

df.loc[:]

Output:

From here, I can extract specific columns like so. With loc:

df.loc[:, [‘math’, ‘science’]]

Output:

With iloc:

df.iloc[:, [2, 4]]

The output can be the identical.

I really like this feature since it’s so flexible. Let’s say I would like to retrieve Alice and Bob’s math and science scores. It’ll go something like this. I can just specify the range of records and columns I would like.

With loc:

df.loc[101:103, ['name', 'math', 'science']]

Output:

With iloc:

df.iloc[0:3, [0, 1, 3]]

We’d get the identical output.

Boolean Filtering (Who scored above 80 in Math?)

The ultimate feature I would like to share with you is Boolean filtering. This permits for a more flexible extraction. Let’s say I would like to retrieve the records of scholars who scored above 80 in Math. Often, in SQL, you’ll need to use the WHERE and HAVING clauses. Python makes this really easy.

# Students with Math > 80.
df.loc[df['math'] > 80]

Output:

You can even filter on multiple conditions using the AND(&), OR(|), and NOT(~) operators. For example:

# Math > 70 and Science > 80
df.loc[(df[‘math’] > 70) & (df[‘science’] > 80)]

Output:
P.S. I wrote an article on filtering with Pandas. You’ll be able to read it here

Often, you’ll end up using this feature with loc. It could get a bit complicated with iloc, because it doesn’t support Boolean conditions. To do that with iloc, you’ll need to convert the Boolean filtering into a listing, like so:

# Students with Math > 80.
df.iloc[list(df['math'] > 80)]

To avoid the headache, just go along with loc.

Conclusion

You’ll probably use the loc and iloc methods rather a lot once you’re working on a dataset. So it’s crucial to know the way they work and distinguish the 2. I really like how easy and versatile it’s to extract records with these methods. Every time you’re confused, just remember loc is all about labels while iloc is about positions.

I hope you found this text helpful. Try running these examples on your individual dataset to see the difference in motion.

 

Be at liberty to say hi on any of those platforms

Medium

LinkedIn

Twitter

YouTube

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x