Don’t Let Conda Eat Your Hard Drive

-


Should you’re an Anaconda user, that  make it easier to manage package dependencies, avoid compatibility conflicts, and share your projects with others. Unfortunately, they may take over your computer’s hard disk.

I write plenty of computer tutorials and to maintain them organized, each has a dedicated folder structure complete with a Conda Environment. This worked great at first, but soon my computer’s performance degraded, and I noticed that my SSD was filling up. At one point I had only 13 GB free.

Conda helps manage this problem by storing downloaded package files in a single “cache” (pkgs_dirs). If you install a package, conda checks for it within the package cache before downloading. If not found, conda will download and extract the package and link the files to the energetic environment. Since the cache is “shared,” different environments can use the identical downloaded files without duplication.

Because conda caches , pkgs_dirs can grow to many gigabytes. And while conda links to shared packages within the cache, there remains to be a have to store some packages within the environment folder. This is especially to avoid , where different environments need different versions of the identical (a package required to run one other package).

As well as, large, compiled binaries like OpenCV may require  within the environment’s directory, and every environment requires a duplicate of the Python interpreter (at 100–200 MB). All these issues can bloat conda environments to several gigabytes.

On this  project, we’ll have a look at some techniques for reducing the storage requirements for conda environments, including those stored in default locations and dedicated folders.


Memory Management Techniques

Below are some Memory Management techniques that can make it easier to reduce conda’s storage footprint in your machine. We’ll discuss each in turn.

  1. Cache cleansing
  2. Sharing task-based environments
  3. Archiving with environment and specifications files
  4. Archiving environments with conda-pack
  5. Storing environments on an external drive
  6. Relocating the package cache
  7. Using virtual environments (venv)

1. Cleansing the Package Cache

Cleansing the package cache is the primary and easiest step for freeing up memory. Even after deleting environments, conda keeps the related package files within the cache. You may release space by removing these unused packages and their associated  (compressed package files), logs,  (metadata stored in conda), and temporary files.

Conda permits an optional “dry run” to see how much memory shall be reclaimed. You’ll need to run this from either the terminal or Anaconda Prompt in your  environment:

conda clean --all --dry-run

To commit, run:

conda clean --all

Here’s how this looks on my machine:

Conda dry run and clean command in Anaconda Prompt (by writer)

This process trimmed a healthy 6.28 GB and took several minutes to run.

2. Sharing Task-based Environments

Creating a number of environments for  — like computer vision or geospatial work — is more memory efficient than using dedicated environments for every . These environments would come with basic packages plus ones for the particular task (similar to OpenCV, scikit-image, and PIL for computer vision).

A bonus of this approach is you could easily keep all of the packages up so far and link the environments to multiple projects. Nevertheless, this won’t work if some projects require different versions of the shared packages.

3. Archiving with Environment and Specifications Files

Should you don’t have enough storage sites or need to preserve legacy projects efficiently, think about using  or files. These small files record an environment’s , allowing you to rebuild it later.

Saving conda environments in this way reduces their size on disk from gigabytes to a number of kilobytes. In fact, you’ll need to recreate the environment to make use of it. So, you’ll need to avoid this method when you regularly revisit projects that link to the archived environments.

NOTE: Think about using Mamba, a drop-in alternative for conda, for faster rebuilds. Because the docs say, “Should you know conda, Mamba!”

Using Environment Files: An  is a small file that lists all of the packages and versions installed in an environment, including those installed using Python’s package installer (pip). This helps you each restore an environment and share it with others.

The environment file is written in  (), a human-readable data-serialization format for data storage. To generate an environment file, you have to activate after which export the environment. Here’s easy methods to make a file for an environment named :

 conda activate my_env
 conda env export > my_env.yml

You may name the file any valid filename but watch out as an existing file with the identical name shall be overwritten.

By default, the environment file is written to the directory. Here’s a truncated example of the file’s contents:

name: C:Usershannaquick_successfed_hikesfed_env
channels:
  - defaults
  - conda-forge
dependencies:
  - asttokens=2.0.5=pyhd3eb1b0_0
  - backcall=0.2.0=pyhd3eb1b0_0
  - blas=1.0=mkl
  - bottleneck=1.3.4=py310h9128911_0
  - brotli=1.0.9=ha925a31_2
  - bzip2=1.0.8=he774522_0
  - ca-certificates=2022.4.26=haa95532_0
  - certifi=2022.5.18.1=py310haa95532_0
  - colorama=0.4.4=pyhd3eb1b0_0
  - cycler=0.11.0=pyhd3eb1b0_0
  - debugpy=1.5.1=py310hd77b12b_0
  - decorator=5.1.1=pyhd3eb1b0_0
  - entrypoints=0.4=py310haa95532_0

  ------SNIP------

You may now remove your conda environment and reproduce it again with this file. To remove an environment, first deactivate it after which run the remove command (where ENVNAME is the name of your environment):

conda deactivate
conda remove -n ENVNAME --all

If the conda environment exists outside of Anaconda’s default  folder, then include the directory path to the environment, as so:

conda remove -p PATHENVNAME --all

Note that this archiving technique will only work perfectly when you proceed to make use of the identical operating system, similar to Windows or macOS. It is because solving for dependencies can introduce packages that may not be compatible across platforms.

To revive a conda environment using a file, run the next, where my_env represents your conda environment name and environment.yml represents your environment file:

 conda env create -n my_env -f directorypathtoenvironment.yml

You can even use the environment file to recreate the environment in your D: drive. Just provide the brand new path when using the file. Here’s an example:

conda create --prefix D:my_envsmy_new_env --file environment.yml

For more on environment files, including easy methods to manually produce them, visit the docs.

Using Specifications Files: Should you haven’t installed any packages using pip, you should utilize a  to breed a conda environment on the identical operating system. To create a specification file, activate an environment, similar to , and enter the next command:

 conda list --explicit > exp_spec_list.txt

This produces the next output, truncated for brevity:

 # This file could also be used to create an environment using:
 # $ conda create --name  --file 
 # platform: win-64
 @EXPLICIT
 https://conda.anaconda.org/conda-forge/win-64/ca-certificates-202x.xx.x-h5b45459_0.tar.bz2
 https://conda.anaconda.org/conda-forge/noarch/tzdata-202xx-he74cb21_0.tar.bz2

------snip------

Note that the --explicit flag ensures that the targeted platform is annotated within the file, on this case,  within the third line.

You may now remove the environment as described within the previous section.

To re-create  using this text file, run the next with a correct directory path:

conda create -n my_env -f directorypathtoexp_spec_list.txt

4. Archiving Environments with conda-pack

The conda-pack command enables you to archive a conda environment before removing it. It packs your complete environment right into a compressed archive with the extension: . It’s handy for backing up, sharing, and moving environments without the necessity to reinstall packages.

The next command will preserve an environment but remove it out of your system (where  represents the name of your environment):

conda install -c conda-forge conda-pack
conda pack -n my_env -o my_env.tar.gz

To revive the environment later run this command:

mkdir my_env && tar -xzf my_env.tar.gz -C my_env

This system won’t save as much memory because the text file option. Nevertheless, you won’t have to re-download packages when restoring an environment, which suggests it may well be used without web access.

5. Storing Environments on an External Drive

By default, conda stores all environments in a default location. For Windows, that is under the  folder. You may see these environments by running the command conda info --envs in a prompt window or terminal. Here’s the way it looks on my C: drive (this can be a truncated view):

Truncated view of conda environments on my C: drive (by writer)

Using a Single Environments Folder: In case your system supports an external or secondary drive, you may configure conda to store environments there to release space in your primary disk. Here’s the command; you’ll have to substitute your specific path:

conda config --set envs_dirs /path/to/external/drive

Should you enter a path to your D drive, similar to , conda will create recent environments at this location.

This system works well when your external drive is a quick SSD and once you’re storing packages with large dependencies, like TensorFlow. The downside is slower performance. In case your OS and notebooks remain on the first drive, it’s possible you’ll experience some read/write latency when running Python.

As well as, some OS settings may power down idle external drives, adding a delay after they spin back up. Tools like Jupyter may struggle to locate conda environments if the drive letter changes, so that you’ll need to use a hard and fast drive letter and be sure that the right kernel paths are set.

Using Multiple Environment Folders: As a substitute of using a single envs_dirs directory for  environments, you may store each environment inside its respective  folder. This enables you to store every part related to a project in a single place.

Example project file structure with embedded (1.7 GB) conda environment (opencv_env) (by writer)

For instance, suppose you could have a project in your Windows D: drive in a folder called . To put the project’s conda environment on this folder, loaded with ipykernel for JupyterLab, you’ll run:

conda create -p D:projectsgeospatialenv ipykernel

In fact, you may call  something more descriptive, like .

As with the previous example, environments stored on a distinct disk may cause performance issues.

Special Note on JupyterLab: Depending on the way you launch JupyterLab, its default behavior could also be to open in your  directory (similar to, ). Since its file browser is restricted to the directory from which it’s launched, you won’t see directories on other drives like D:. There are lots of ways to handle this, but one in every of the only is to launch JupyterLab from the D: drive.

For instance, in Anaconda Prompt, type:

D:

followed by:

jupyter lab

Now, you’ll give you the option to select from kernels on the D: drive.

For more options on changing JupyterLab’s working directory, ask an AI about “easy methods to change Jupyter’s default working directory” or “easy methods to create a Symlink to D: in your user folder.”

Moving Existing Environments: You must never manually move a conda environment, similar to by cutting and pasting to a brand new location. It is because conda relies on internal paths and metadata that may turn into invalid with location changes.

As a substitute, it is best to existing environments to a different drive. This can  the environment, so that you’ll have to manually remove it from its original location.

In the next example, we use the --clone flag to supply a precise copy of a C: drive environment (called ) on the D: drive:

conda create -p D:new_envsmy_env --clone C:pathtooldenv

NOTE: Consider exporting your environment to a  file (as described in Section 3 above) before cloning. This lets you recreate the environment if something goes flawed with the clone procedure.

Now, once you run conda env list, you’ll see the environment listed in each the C: and D: drives. You may remove the old environment by running the next command within the environment:

conda remove --name my_env --all -y

Again, latency issues may affect these setups when you’re working across two disks.

You might be wondering, is it higher to maneuver a conda environment using an environment (YAML) file or to make use of--clone? The short answer is that --clone is the most effective and fastest option for moving an environment to a distinct drive on the  machine. An environment file is best for recreating the identical environment on a machine. While the file guarantees a consistent environment across different systems, it may well take for much longer to run, especially with large environments.

6. Relocating the Package Cache

In case your primary drive is low on space, you may move the package cache to a bigger external or secondary drive using this command:

conda config --set pkgs_dirs D:conda_pkgs

In this instance, packages at the moment are stored on the D drive () as a substitute of the default location.

Should you’re working in your primary drive and each drives are SSD, then latency issues mustn’t be significant. Nevertheless, if one in every of the drives is a slower HDD, you may experience slowdowns when creating or updating environments. If D: is an external drive connected by USB, it’s possible you’ll see significant slowdowns for giant environments.

You may mitigate a few of these issues by keeping the package cache (pkgs_dirs) and regularly used environments on the faster SSD, and other environments on the slower HDD.

One final thing to contemplate is . Primary drives could have routine backups scheduled but secondary or external drives may not. This puts you liable to losing all of your environments.

7. Using Virtual Environments

In case your project doesn’t require conda’s extensive package management system for handling heavy dependencies (like TensorFlow or GDAL), you may significantly reduce disk usage with a Python  (venv). This represents a light-weight alternative to a conda environment.

To create a venv named , run the next command:

Any such environment has a small base installation. A minimal conda environment takes up about 200 MB and includes multiple utilities, similar to condapipsetuptools, and so forth. A venv is way lighter, with a minimum install size of only 5–10 MB.

Conda also caches package tarballs in pkgs_dirs. These tarballs can grow to several GBs over time. Because venv installs packages directly into the environment, no extra copies are preserved.

Normally, you’ll want to contemplate venv once you only need basic Python packages like NumPy, pandas, or Scikit-learn. Packages for which conda is strongly really useful, like Geopandas, should still be placed in a conda environment. Should you use plenty of environments, you’ll probably need to follow conda and profit from its package linking.

You’ll find details on easy methods to activate and use Python virtual environments within the venv docs.


Recap

High impact/low disruption memory management techniques for conda environments include cleansing the package cache and storing little-used environments as YAML or text files. These methods can save many gigabytes of memory while retaining Anaconda’s default directory structure.

Other high impact methods include moving the package cache and/or conda environments to a secondary or external drive. This can resolve memory problems but may introduce latency issues, especially if the brand new drive is a slow HDD or uses a USB connection.

For easy environments, you should utilize a Python virtual environment (venv) as a light-weight alternative to conda.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x