No headaches and unreadable code from os.path
Pathlib could also be my favorite library (after Sklearn, obviously). And given there are over 130 thousand libraries, that’s saying something. Pathlib helps me turn code like this written in os.path:
import osdir_path = "/home/user/documents"
# Find all text files inside a directory
files = [os.path.join(dir_path, f) for f in os.listdir(dir_path)
if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".txt")]
into this:
from pathlib import Path# Find all text files inside a directory
files = list(dir_path.glob("*.txt"))
Pathlib got here out in Python 3.4 as a alternative for the nightmare that was os.path. It also marked a crucial milestone for Python language on the entire: they finally turned each thing into an object (even nothing).
The most important drawback of os.path was treating system paths as strings, which led to unreadable, messy code and a steep learning curve.
By representing paths as fully-fledged , Pathlib solves all these issues and introduces elegance, consistency, and a breath of fresh air into path handling.
And this long-overdue article of mine will outline a few of one of the best functions/features and tricks of pathlib to perform tasks that will have been truly horrible experiences in os.path.
Learning these features of Pathlib will make every little thing related to paths and files easier for you as an information skilled, especially during data processing workflows where you’ve got to maneuver around hundreds of images, CSVs, or audio files.
Let’s start!
Working with paths
Just about all features of pathlib is accessible through its Path class, which you need to use to create paths to files and directories.
There are a couple of ways you may create paths with Path. First, there are class methods like cwd and home for the present working and the house user directories:
from pathlib import PathPath.cwd()
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib')
Path.home()
PosixPath('/home/bexgboost')
You may also create paths from string paths:
p = Path("documents")p
PosixPath('documents')
Joining paths is a breeze in Pathlib with the forward slash operator:
data_dir = Path(".") / "data"
csv_file = data_dir / "file.csv"print(data_dir)
print(csv_file)
data
data/file.csv
Please, don’t let anyone ever catch you using os.path.join after this.
To ascertain whether a path, you need to use the boolean function exists:
data_dir.exists()
True
csv_file.exists()
True
Sometimes, your complete Path object won’t be visible, and you’ve got to examine whether it’s a directory or a file. So, you need to use is_dir or is_file functions to do it:
data_dir.is_dir()
True
csv_file.is_file()
True
Most paths you’re employed with might be relative to your current directory. But, there are cases where you’ve got to supply the precise location of a file or a directory to make it accessible from any Python script. That is once you use absolute paths:
csv_file.absolute()
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/data/file.csv')
Lastly, if you’ve got the misfortune of working with libraries that also require string paths, you may call str(path):
str(Path.home())
'/home/bexgboost'
Most libraries in the information stack have long supported
Pathobjects, includingsklearn,pandas,matplotlib,seaborn, etc.
Path objects have many useful attributes. Let’s see some examples using this path object that points to a picture file.
image_file = Path("images/midjourney.png").absolute()image_file
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/images/midjourney.png')
Let’s start with the parent. It returns a path object that’s one level up the present working directory.
image_file.parent
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/images')
Sometimes, it’s possible you’ll want only the file name as an alternative of the entire path. There’s an attribute for that:
image_file.name
'midjourney.png'
which returns only the file name with the extension.
There’s also stem for the file name without the suffix:
image_file.stem
'midjourney'
Or the suffix itself with the dot for the file extension:
image_file.suffix
'.png'
If you would like to divide a path into its components, you need to use parts as an alternative of str.split('/'):
image_file.parts
('/',
'home',
'bexgboost',
'articles',
'2023',
'4_april',
'1_pathlib',
'images',
'midjourney.png')
In the event you want those components to be Path objects in themselves, you need to use parents attribute, which creates a generator:
for i in image_file.parents:
print(i)
/home/bexgboost/articles/2023/4_april/1_pathlib/images
/home/bexgboost/articles/2023/4_april/1_pathlib
/home/bexgboost/articles/2023/4_april
/home/bexgboost/articles/2023
/home/bexgboost/articles
/home/bexgboost
/home
/
Working with files
To create files and write to them, you do not have to make use of open function anymore. Just create a Path object and write_text or write_btyes to them:
markdown = data_dir / "file.md"# Create (override) and write text
markdown.write_text("# This can be a test markdown")
Or, if you happen to have already got a file, you may read_text or read_bytes:
markdown.read_text()
'# This can be a test markdown'
len(image_file.read_bytes())
1962148
Nonetheless, note that write_text or write_bytes overrides existing contents of a file.
# Write recent text to existing file
markdown.write_text("## This can be a recent line")
# The file is overridden
markdown.read_text()
'## This can be a recent line'
To append recent information to existing files, it is best to use open approach to Path objects in a (append) mode:
# Append text
with markdown.open(mode="a") as file:
file.write("n### That is the second line")markdown.read_text()
'## This can be a recent linen### That is the second line'
It is usually common to rename files. rename method accepts the destination path for the renamed file.
To create the destination path in the present directory, i. e. rename the file, you need to use with_stem on the prevailing path, which replaces the stem of the unique file:
renamed_md = markdown.with_stem("new_markdown")markdown.rename(renamed_md)
PosixPath('data/new_markdown.md')
Above, file.md is became new_markdown.md.
Let’s have a look at the file size through stat().st_size:
# Display file size
renamed_md.stat().st_size
49 # in bytes
or the last time the file was modified, which was a couple of seconds ago:
from datetime import datetimemodified_timestamp = renamed_md.stat().st_mtime
datetime.fromtimestamp(modified_timestamp)
datetime.datetime(2023, 4, 3, 13, 32, 45, 542693)
st_mtime returns a timestamp, which is the count of seconds since January 1, 1970. To make it readable, you need to use use the fromtimestamp function of datatime.
To remove unwanted files, you may unlink them:
renamed_md.unlink(missing_ok=True)
Setting missing_ok to True won’t raise any alarms if the file doesn’t exist.
Working with directories
There are a couple of neat tricks to work with directories in Pathlib. First, let’s examine easy methods to create directories recursively.
new_dir = (
Path.cwd()
/ "new_dir"
/ "child_dir"
/ "grandchild_dir"
)new_dir.exists()
False
The new_dir doesn’t exist, so let’s create it with all its children:
new_dir.mkdir(parents=True, exist_ok=True)
By default, mkdir creates the last child of the given path. If the intermediate parents don’t exist, you’ve got to set parents to True.
To remove empty directories, you need to use rmdir. If the given path object is nested, only the last child directory is deleted:
# Removes the last child directory
new_dir.rmdir()
To list the contents of a directory like ls on the terminal, you need to use iterdir. Again, the result might be a generator object, yielding directory contents as separate path objects one after the other:
for p in Path.home().iterdir():
print(p)
/home/bexgboost/.python_history
/home/bexgboost/word_counter.py
/home/bexgboost/.azure
/home/bexgboost/.npm
/home/bexgboost/.nv
/home/bexgboost/.julia
...
To capture all files with a selected extension or a reputation pattern, you need to use the glob function with a daily expression.
For instance, below, we are going to find all text files inside my home directory with glob("*.txt"):
home = Path.home()
text_files = list(home.glob("*.txt"))len(text_files)
3 # Only three
To go looking for text files recursively, meaning inside all child directories as well, you need to use recursive glob with rglob:
all_text_files = [p for p in home.rglob("*.txt")]len(all_text_files)
5116 # Now rather more
Find out about regular expressions here.
You may also use rglob('*') to list directory contents recursively. It’s just like the supercharged version of iterdir().
Certainly one of the use cases of that is counting the variety of file formats that appear inside a directory.
To do that, we import the Counter class from collections and supply all file suffixes to it inside the articles folder of home:
from collections import Counterfile_counts = Counter(
path.suffix for path in (home / "articles").rglob("*")
)
file_counts
Counter({'.py': 12,
'': 1293,
'.md': 1,
'.txt': 7,
'.ipynb': 222,
'.png': 90,
'.mp4': 39})
Operating system differences
Sorry, but we have now to discuss this nightmare of a problem.
Up until now, we have now been coping with PosixPath objects, that are the default for UNIX-like systems:
type(Path.home())
pathlib.PosixPath
In the event you were on Windows, you’d get a WindowsPath object:
from pathlib import WindowsPath# User raw strings that start with r to jot down windows paths
path = WindowsPath(r"C:users")
path
NotImplementedError: cannot instantiate 'WindowsPath' in your system
Instantiating one other system’s path raises an error just like the above.
But what if you happen to were forced to work with paths from one other system, like code written by coworkers who use Windows?
As an answer, pathlib offers pure path objects like PureWindowsPath or PurePosixPath:
from pathlib import PurePosixPath, PureWindowsPathpath = PureWindowsPath(r"C:users")
path
PureWindowsPath('C:/users')
These are primitive path objects. You have access to some path methods and attributes, but essentially, the trail object stays a string:
path / "bexgboost"
PureWindowsPath('C:/users/bexgboost')
path.parent
PureWindowsPath('C:/')
path.stem
'users'
path.rename(r"C:losers") # Unsupported
AttributeError: 'PureWindowsPath' object has no attribute 'rename'
Conclusion
If you’ve got noticed, I lied within the title of the article. As a substitute of 15, I feel the count of latest tricks and functions was 30ish.
I didn’t need to scare you off.
But I hope I’ve convinced you sufficient to ditch os.path and begin using pathlib for much easier and more readable path operations.
Forge a recent path, if you happen to will 🙂
In the event you enjoyed this text and, let’s face it, its bizarre writing style, consider supporting me by signing as much as change into a Medium member. Membership costs 4.99$ a month and provides you unlimited access to all my stories and lots of of hundreds of articles written by more experienced folk. In the event you join through this link, I’ll earn a small commission with no extra cost to your pocket.
