Home Artificial Intelligence Goodbye os.path: 15 Pathlib Tricks to Quickly Master The File System in Python

Goodbye os.path: 15 Pathlib Tricks to Quickly Master The File System in Python

1
Goodbye os.path: 15 Pathlib Tricks to Quickly Master The File System in Python

A robot pal. — Via Midjourney

Pathlib could also be my favorite library (after Sklearn, obviously). And given there are over 130 thousand libraries, that’s saying something. Pathlib helps me turn code like this written in os.path:

import os

dir_path = "/home/user/documents"

# Find all text files inside a directory
files = [os.path.join(dir_path, f) for f in os.listdir(dir_path)
if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".txt")]

into this:

from pathlib import Path

# Find all text files inside a directory
files = list(dir_path.glob("*.txt"))

Pathlib got here out in Python 3.4 as a alternative for the nightmare that was os.path. It also marked a crucial milestone for Python language on the entire: they finally turned each thing into an object (even nothing).

The most important drawback of os.path was treating system paths as strings, which led to unreadable, messy code and a steep learning curve.

By representing paths as fully-fledged , Pathlib solves all these issues and introduces elegance, consistency, and a breath of fresh air into path handling.

And this long-overdue article of mine will outline a few of one of the best functions/features and tricks of pathlib to perform tasks that will have been truly horrible experiences in os.path.

Learning these features of Pathlib will make every little thing related to paths and files easier for you as an information skilled, especially during data processing workflows where you’ve got to maneuver around hundreds of images, CSVs, or audio files.

Let’s start!

Working with paths

Just about all features of pathlib is accessible through its Path class, which you need to use to create paths to files and directories.

There are a couple of ways you may create paths with Path. First, there are class methods like cwd and home for the present working and the house user directories:

from pathlib import Path

Path.cwd()

PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib')
Path.home()
PosixPath('/home/bexgboost')

You may also create paths from string paths:

p = Path("documents")

p

PosixPath('documents')

Joining paths is a breeze in Pathlib with the forward slash operator:

data_dir = Path(".") / "data"
csv_file = data_dir / "file.csv"

print(data_dir)
print(csv_file)

data
data/file.csv

Please, don’t let anyone ever catch you using os.path.join after this.

To ascertain whether a path, you need to use the boolean function exists:

data_dir.exists()
True
csv_file.exists()
True

Sometimes, your complete Path object won’t be visible, and you’ve got to examine whether it’s a directory or a file. So, you need to use is_dir or is_file functions to do it:

data_dir.is_dir()
True
csv_file.is_file()
True

Most paths you’re employed with might be relative to your current directory. But, there are cases where you’ve got to supply the precise location of a file or a directory to make it accessible from any Python script. That is once you use absolute paths:

csv_file.absolute()
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/data/file.csv')

Lastly, if you’ve got the misfortune of working with libraries that also require string paths, you may call str(path):

str(Path.home())
'/home/bexgboost'

Most libraries in the information stack have long supported Path objects, including sklearn, pandas, matplotlib, seaborn, etc.

Path objects have many useful attributes. Let’s see some examples using this path object that points to a picture file.

image_file = Path("images/midjourney.png").absolute()

image_file

PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/images/midjourney.png')

Let’s start with the parent. It returns a path object that’s one level up the present working directory.

image_file.parent
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/images')

Sometimes, it’s possible you’ll want only the file name as an alternative of the entire path. There’s an attribute for that:

image_file.name
'midjourney.png'

which returns only the file name with the extension.

There’s also stem for the file name without the suffix:

image_file.stem
'midjourney'

Or the suffix itself with the dot for the file extension:

image_file.suffix
'.png'

If you would like to divide a path into its components, you need to use parts as an alternative of str.split('/'):

image_file.parts
('/',
'home',
'bexgboost',
'articles',
'2023',
'4_april',
'1_pathlib',
'images',
'midjourney.png')

In the event you want those components to be Path objects in themselves, you need to use parents attribute, which creates a generator:

for i in image_file.parents:
print(i)
/home/bexgboost/articles/2023/4_april/1_pathlib/images
/home/bexgboost/articles/2023/4_april/1_pathlib
/home/bexgboost/articles/2023/4_april
/home/bexgboost/articles/2023
/home/bexgboost/articles
/home/bexgboost
/home
/

Working with files

bexgboost_classified_files._8k._sharp_quality._ed73fcdc-67e6-4b3c-ace4-3092b268cc42.png
Classified files. — Midjourney

To create files and write to them, you do not have to make use of open function anymore. Just create a Path object and write_text or write_btyes to them:

markdown = data_dir / "file.md"

# Create (override) and write text
markdown.write_text("# This can be a test markdown")

Or, if you happen to have already got a file, you may read_text or read_bytes:

markdown.read_text()
'# This can be a test markdown'
len(image_file.read_bytes())
1962148

Nonetheless, note that write_text or write_bytes overrides existing contents of a file.

# Write recent text to existing file
markdown.write_text("## This can be a recent line")
# The file is overridden
markdown.read_text()
'## This can be a recent line'

To append recent information to existing files, it is best to use open approach to Path objects in a (append) mode:

# Append text
with markdown.open(mode="a") as file:
file.write("n### That is the second line")

markdown.read_text()

'## This can be a recent linen### That is the second line'

It is usually common to rename files. rename method accepts the destination path for the renamed file.

To create the destination path in the present directory, i. e. rename the file, you need to use with_stem on the prevailing path, which replaces the stem of the unique file:

renamed_md = markdown.with_stem("new_markdown")

markdown.rename(renamed_md)

PosixPath('data/new_markdown.md')

Above, file.md is became new_markdown.md.

Let’s have a look at the file size through stat().st_size:

# Display file size
renamed_md.stat().st_size
49 # in bytes

or the last time the file was modified, which was a couple of seconds ago:

from datetime import datetime

modified_timestamp = renamed_md.stat().st_mtime

datetime.fromtimestamp(modified_timestamp)

datetime.datetime(2023, 4, 3, 13, 32, 45, 542693)

st_mtime returns a timestamp, which is the count of seconds since January 1, 1970. To make it readable, you need to use use the fromtimestamp function of datatime.

To remove unwanted files, you may unlink them:

renamed_md.unlink(missing_ok=True)

Setting missing_ok to True won’t raise any alarms if the file doesn’t exist.

Working with directories

image.png
Folders in an office. — Midjourney

There are a couple of neat tricks to work with directories in Pathlib. First, let’s examine easy methods to create directories recursively.

new_dir = (
Path.cwd()
/ "new_dir"
/ "child_dir"
/ "grandchild_dir"
)

new_dir.exists()

False

The new_dir doesn’t exist, so let’s create it with all its children:

new_dir.mkdir(parents=True, exist_ok=True)

By default, mkdir creates the last child of the given path. If the intermediate parents don’t exist, you’ve got to set parents to True.

To remove empty directories, you need to use rmdir. If the given path object is nested, only the last child directory is deleted:

# Removes the last child directory
new_dir.rmdir()

To list the contents of a directory like ls on the terminal, you need to use iterdir. Again, the result might be a generator object, yielding directory contents as separate path objects one after the other:

for p in Path.home().iterdir():
print(p)
/home/bexgboost/.python_history
/home/bexgboost/word_counter.py
/home/bexgboost/.azure
/home/bexgboost/.npm
/home/bexgboost/.nv
/home/bexgboost/.julia
...

To capture all files with a selected extension or a reputation pattern, you need to use the glob function with a daily expression.

For instance, below, we are going to find all text files inside my home directory with glob("*.txt"):

home = Path.home()
text_files = list(home.glob("*.txt"))

len(text_files)

3 # Only three

To go looking for text files recursively, meaning inside all child directories as well, you need to use recursive glob with rglob:

all_text_files = [p for p in home.rglob("*.txt")]

len(all_text_files)

5116 # Now rather more

Find out about regular expressions here.

You may also use rglob('*') to list directory contents recursively. It’s just like the supercharged version of iterdir().

Certainly one of the use cases of that is counting the variety of file formats that appear inside a directory.

To do that, we import the Counter class from collections and supply all file suffixes to it inside the articles folder of home:

from collections import Counter

file_counts = Counter(
path.suffix for path in (home / "articles").rglob("*")
)

file_counts

Counter({'.py': 12,
'': 1293,
'.md': 1,
'.txt': 7,
'.ipynb': 222,
'.png': 90,
'.mp4': 39})

Operating system differences

Sorry, but we have now to discuss this nightmare of a problem.

Up until now, we have now been coping with PosixPath objects, that are the default for UNIX-like systems:

type(Path.home())
pathlib.PosixPath

In the event you were on Windows, you’d get a WindowsPath object:

from pathlib import WindowsPath

# User raw strings that start with r to jot down windows paths
path = WindowsPath(r"C:users")
path

NotImplementedError: cannot instantiate 'WindowsPath' in your system

Instantiating one other system’s path raises an error just like the above.

But what if you happen to were forced to work with paths from one other system, like code written by coworkers who use Windows?

As an answer, pathlib offers pure path objects like PureWindowsPath or PurePosixPath:

from pathlib import PurePosixPath, PureWindowsPath

path = PureWindowsPath(r"C:users")
path

PureWindowsPath('C:/users')

These are primitive path objects. You have access to some path methods and attributes, but essentially, the trail object stays a string:

path / "bexgboost"
PureWindowsPath('C:/users/bexgboost')
path.parent
PureWindowsPath('C:/')
path.stem
'users'
path.rename(r"C:losers") # Unsupported
AttributeError: 'PureWindowsPath' object has no attribute 'rename'

Conclusion

If you’ve got noticed, I lied within the title of the article. As a substitute of 15, I feel the count of latest tricks and functions was 30ish.

I didn’t need to scare you off.

But I hope I’ve convinced you sufficient to ditch os.path and begin using pathlib for much easier and more readable path operations.

Forge a recent path, if you happen to will 🙂

bexgboost_Paths_and_pathlib._Extreme_quality._76f2bbe4-7c8d-45a6-abf4-ccc8d9e32144.png
Path. — Midjourney

In the event you enjoyed this text and, let’s face it, its bizarre writing style, consider supporting me by signing as much as change into a Medium member. Membership costs 4.99$ a month and provides you unlimited access to all my stories and lots of of hundreds of articles written by more experienced folk. In the event you join through this link, I’ll earn a small commission with no extra cost to your pocket.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here