No headaches and unreadable code from os.path
Pathlib could also be my favorite library (after Sklearn, obviously). And given there are over 130 thousand libraries, that’s saying something. Pathlib helps me turn code like this written in os.path
:
import osdir_path = "/home/user/documents"
# Find all text files inside a directory
files = [os.path.join(dir_path, f) for f in os.listdir(dir_path)
if os.path.isfile(os.path.join(dir_path, f)) and f.endswith(".txt")]
into this:
from pathlib import Path# Find all text files inside a directory
files = list(dir_path.glob("*.txt"))
Pathlib got here out in Python 3.4 as a alternative for the nightmare that was os.path
. It also marked a crucial milestone for Python language on the entire: they finally turned each thing into an object (even nothing).
The most important drawback of os.path
was treating system paths as strings, which led to unreadable, messy code and a steep learning curve.
By representing paths as fully-fledged , Pathlib solves all these issues and introduces elegance, consistency, and a breath of fresh air into path handling.
And this long-overdue article of mine will outline a few of one of the best functions/features and tricks of pathlib
to perform tasks that will have been truly horrible experiences in os.path
.
Learning these features of Pathlib will make every little thing related to paths and files easier for you as an information skilled, especially during data processing workflows where you’ve got to maneuver around hundreds of images, CSVs, or audio files.
Let’s start!
Working with paths
Just about all features of pathlib
is accessible through its Path
class, which you need to use to create paths to files and directories.
There are a couple of ways you may create paths with Path
. First, there are class methods like cwd
and home
for the present working and the house user directories:
from pathlib import PathPath.cwd()
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib')
Path.home()
PosixPath('/home/bexgboost')
You may also create paths from string paths:
p = Path("documents")p
PosixPath('documents')
Joining paths is a breeze in Pathlib with the forward slash operator:
data_dir = Path(".") / "data"
csv_file = data_dir / "file.csv"print(data_dir)
print(csv_file)
data
data/file.csv
Please, don’t let anyone ever catch you using os.path.join
after this.
To ascertain whether a path, you need to use the boolean function exists
:
data_dir.exists()
True
csv_file.exists()
True
Sometimes, your complete Path object won’t be visible, and you’ve got to examine whether it’s a directory or a file. So, you need to use is_dir
or is_file
functions to do it:
data_dir.is_dir()
True
csv_file.is_file()
True
Most paths you’re employed with might be relative to your current directory. But, there are cases where you’ve got to supply the precise location of a file or a directory to make it accessible from any Python script. That is once you use absolute
paths:
csv_file.absolute()
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/data/file.csv')
Lastly, if you’ve got the misfortune of working with libraries that also require string paths, you may call str(path)
:
str(Path.home())
'/home/bexgboost'
Most libraries in the information stack have long supported
Path
objects, includingsklearn
,pandas
,matplotlib
,seaborn
, etc.
Path
objects have many useful attributes. Let’s see some examples using this path object that points to a picture file.
image_file = Path("images/midjourney.png").absolute()image_file
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/images/midjourney.png')
Let’s start with the parent
. It returns a path object that’s one level up the present working directory.
image_file.parent
PosixPath('/home/bexgboost/articles/2023/4_april/1_pathlib/images')
Sometimes, it’s possible you’ll want only the file name
as an alternative of the entire path. There’s an attribute for that:
image_file.name
'midjourney.png'
which returns only the file name with the extension.
There’s also stem
for the file name without the suffix:
image_file.stem
'midjourney'
Or the suffix
itself with the dot for the file extension:
image_file.suffix
'.png'
If you would like to divide a path into its components, you need to use parts
as an alternative of str.split('/')
:
image_file.parts
('/',
'home',
'bexgboost',
'articles',
'2023',
'4_april',
'1_pathlib',
'images',
'midjourney.png')
In the event you want those components to be Path
objects in themselves, you need to use parents
attribute, which creates a generator:
for i in image_file.parents:
print(i)
/home/bexgboost/articles/2023/4_april/1_pathlib/images
/home/bexgboost/articles/2023/4_april/1_pathlib
/home/bexgboost/articles/2023/4_april
/home/bexgboost/articles/2023
/home/bexgboost/articles
/home/bexgboost
/home
/
Working with files
To create files and write to them, you do not have to make use of open
function anymore. Just create a Path
object and write_text
or write_btyes
to them:
markdown = data_dir / "file.md"# Create (override) and write text
markdown.write_text("# This can be a test markdown")
Or, if you happen to have already got a file, you may read_text
or read_bytes
:
markdown.read_text()
'# This can be a test markdown'
len(image_file.read_bytes())
1962148
Nonetheless, note that write_text
or write_bytes
overrides existing contents of a file.
# Write recent text to existing file
markdown.write_text("## This can be a recent line")
# The file is overridden
markdown.read_text()
'## This can be a recent line'
To append recent information to existing files, it is best to use open
approach to Path
objects in a
(append) mode:
# Append text
with markdown.open(mode="a") as file:
file.write("n### That is the second line")markdown.read_text()
'## This can be a recent linen### That is the second line'
It is usually common to rename files. rename
method accepts the destination path for the renamed file.
To create the destination path in the present directory, i. e. rename the file, you need to use with_stem
on the prevailing path, which replaces the stem
of the unique file:
renamed_md = markdown.with_stem("new_markdown")markdown.rename(renamed_md)
PosixPath('data/new_markdown.md')
Above, file.md
is became new_markdown.md
.
Let’s have a look at the file size through stat().st_size
:
# Display file size
renamed_md.stat().st_size
49 # in bytes
or the last time the file was modified, which was a couple of seconds ago:
from datetime import datetimemodified_timestamp = renamed_md.stat().st_mtime
datetime.fromtimestamp(modified_timestamp)
datetime.datetime(2023, 4, 3, 13, 32, 45, 542693)
st_mtime
returns a timestamp, which is the count of seconds since January 1, 1970. To make it readable, you need to use use the fromtimestamp
function of datatime
.
To remove unwanted files, you may unlink
them:
renamed_md.unlink(missing_ok=True)
Setting missing_ok
to True
won’t raise any alarms if the file doesn’t exist.
Working with directories
There are a couple of neat tricks to work with directories in Pathlib. First, let’s examine easy methods to create directories recursively.
new_dir = (
Path.cwd()
/ "new_dir"
/ "child_dir"
/ "grandchild_dir"
)new_dir.exists()
False
The new_dir
doesn’t exist, so let’s create it with all its children:
new_dir.mkdir(parents=True, exist_ok=True)
By default, mkdir
creates the last child of the given path. If the intermediate parents don’t exist, you’ve got to set parents
to True
.
To remove empty directories, you need to use rmdir
. If the given path object is nested, only the last child directory is deleted:
# Removes the last child directory
new_dir.rmdir()
To list the contents of a directory like ls
on the terminal, you need to use iterdir
. Again, the result might be a generator object, yielding directory contents as separate path objects one after the other:
for p in Path.home().iterdir():
print(p)
/home/bexgboost/.python_history
/home/bexgboost/word_counter.py
/home/bexgboost/.azure
/home/bexgboost/.npm
/home/bexgboost/.nv
/home/bexgboost/.julia
...
To capture all files with a selected extension or a reputation pattern, you need to use the glob
function with a daily expression.
For instance, below, we are going to find all text files inside my home directory with glob("*.txt")
:
home = Path.home()
text_files = list(home.glob("*.txt"))len(text_files)
3 # Only three
To go looking for text files recursively, meaning inside all child directories as well, you need to use recursive glob with rglob
:
all_text_files = [p for p in home.rglob("*.txt")]len(all_text_files)
5116 # Now rather more
Find out about regular expressions here.
You may also use rglob('*')
to list directory contents recursively. It’s just like the supercharged version of iterdir()
.
Certainly one of the use cases of that is counting the variety of file formats that appear inside a directory.
To do that, we import the Counter
class from collections
and supply all file suffixes to it inside the articles folder of home
:
from collections import Counterfile_counts = Counter(
path.suffix for path in (home / "articles").rglob("*")
)
file_counts
Counter({'.py': 12,
'': 1293,
'.md': 1,
'.txt': 7,
'.ipynb': 222,
'.png': 90,
'.mp4': 39})
Operating system differences
Sorry, but we have now to discuss this nightmare of a problem.
Up until now, we have now been coping with PosixPath
objects, that are the default for UNIX-like systems:
type(Path.home())
pathlib.PosixPath
In the event you were on Windows, you’d get a WindowsPath
object:
from pathlib import WindowsPath# User raw strings that start with r to jot down windows paths
path = WindowsPath(r"C:users")
path
NotImplementedError: cannot instantiate 'WindowsPath' in your system
Instantiating one other system’s path raises an error just like the above.
But what if you happen to were forced to work with paths from one other system, like code written by coworkers who use Windows?
As an answer, pathlib
offers pure path objects like PureWindowsPath
or PurePosixPath
:
from pathlib import PurePosixPath, PureWindowsPathpath = PureWindowsPath(r"C:users")
path
PureWindowsPath('C:/users')
These are primitive path objects. You have access to some path methods and attributes, but essentially, the trail object stays a string:
path / "bexgboost"
PureWindowsPath('C:/users/bexgboost')
path.parent
PureWindowsPath('C:/')
path.stem
'users'
path.rename(r"C:losers") # Unsupported
AttributeError: 'PureWindowsPath' object has no attribute 'rename'
Conclusion
If you’ve got noticed, I lied within the title of the article. As a substitute of 15, I feel the count of latest tricks and functions was 30ish.
I didn’t need to scare you off.
But I hope I’ve convinced you sufficient to ditch os.path
and begin using pathlib
for much easier and more readable path operations.
Forge a recent path, if you happen to will 🙂
In the event you enjoyed this text and, let’s face it, its bizarre writing style, consider supporting me by signing as much as change into a Medium member. Membership costs 4.99$ a month and provides you unlimited access to all my stories and lots of of hundreds of articles written by more experienced folk. In the event you join through this link, I’ll earn a small commission with no extra cost to your pocket.
japanese