Accelerating AI-Powered Chemistry and Materials Science Simulations with NVIDIA ALCHEMI Toolkit-Ops

-


Machine learning interatomic potentials (MLIPs) are transforming the landscape of computational chemistry and materials science. MLIPs enable atomistic simulations that mix the fidelity of computationally expensive quantum chemistry with the scaling power of AI. 

Yet, developers working at this intersection face a persistent challenge: an absence of strong, Pythonic toolbox for GPU-accelerated atomistic simulation. To be used cases similar to running numerous simultaneous, GPU-accelerated simulations, robust and well-supported tools are either missing in the present software ecosystem or are fragmented across several open source software tools.

Over the past few years, available software for running atomistic simulations with MLIPs has been CPU-centric. Core operations similar to neighbor identification, dispersion corrections, long-range interactions, and their associated gradient calculation have traditionally supported only CPU computation, which regularly struggles to deliver the speed that contemporary research demands. High-throughput simulations of small- to medium-sized atomic systems quickly turn into bottlenecked by inefficient GPU usage in hybrid workflows where the model is GPU-accelerated in PyTorch however the simulation tooling is serial and CPU-based.

While developers have attempted to implement these operations directly in PyTorch through the years, the general-purpose design of PyTorch leaves performance on the table for the specialized spatial and force calculation operations required in atomistic simulation. This fundamental mismatch between PyTorch capabilities and the demands of atomistic modeling raises a crucial query: What’s needed to bridge this gap?

NVIDIA ALCHEMI (AI Lab for Chemistry and Materials Innovation), announced at Supercomputing 2024, provides chemistry and materials science developers and researchers with domain-specialized toolkits and NVIDIA NIM microservices optimized on NVIDIA accelerated computing platforms. It’s a group of high-performance, batched and GPU-accelerated tools specifically for enabling atomistic simulations in chemistry and materials science research on the machine learning framework level.

NVIDIA ALCHEMI delivers capabilities across three integrated layers:

  • ALCHEMI Toolkit-Ops: A repository of GPU-accelerated, batched common operations for AI-enabled atomistic simulation tasks, similar to neighbor list construction, DFT-D3 dispersion corrections, and long-range electrostatics.
  • ALCHEMI Toolkit: A group of GPU-accelerated simulation constructing blocks, including geometry optimizers, integrators, and data structures to enable large-scale, batched simulations leveraging AI.
  • ALCHEMI NIM microservices: A scalable layer of cloud‑ready, domain‑specific microservices for chemistry and materials science, enabling deployment and orchestration on NVIDIA‑accelerated platforms.

This post introduces NVIDIA ALCHEMI Toolkit-Ops, the accelerated batched common operations layer of ALCHEMI. ALCHEMI Toolkit-Ops uses NVIDIA Warp to speed up and batch common operations in AI-driven atomistic modeling. These operations are exposed through a modular PyTorch accessible API (with a JAX API targeted for a future release) that permits rapid iteration and integration with existing and future atomistic simulation packages.

Figure 1 shows the accelerated batched common operations for atomistic simulations included on this initial release of ALCHEMI Toolkit-Ops. This beta release includes two versions of neighbor lists (naive and cell), DFT-D3 dispersion correction, and long-range coulombic (Ewald and Particle Mesh Ewald) functions. 

Graphic illustrates ALCHEMI Toolkit-Ops as a key set of features for atomistic simulation made available through a modular plug-and-play API–including GPU-accelerated batched kernels such as neighbor lists, DFT-D3 corrections, and long-range electrostatics—to empower developers, researchers, and ISVs working on AI-driven chemical and materials discovery.
Graphic illustrates ALCHEMI Toolkit-Ops as a key set of features for atomistic simulation made available through a modular plug-and-play API–including GPU-accelerated batched kernels such as neighbor lists, DFT-D3 corrections, and long-range electrostatics—to empower developers, researchers, and ISVs working on AI-driven chemical and materials discovery.
Figure 1. NVIDIA ALCHEMI Toolkit-Ops is a repository of modules developed specifically for GPU-accelerated batched operations (one GPU, many systems) support for MLIPs and molecular dynamics engines

Figure 2 demonstrates the performance of accelerated kernels in ALCHEMI Toolkit-Ops versus popular kernel-accelerated models like MACE (cuEquivariance) and TensorNet (Warp) to attain fully parallelized performance and scalability. The blue MLIP baseline allows comparison with advanced features like neighbor lists and dispersion corrections (DFT-D3). Test systems consisted of ammonia clusters of accelerating size packed into various cells using Packmol. Timing results were averaged over 20 runs on an NVIDIA H100 80 GB GPU. The DFT-D3 calculation doesn’t include 6Å as a result of the long-range nature of D3.

Benchmark graphs of several ALCHEMI Toolkit features compared to MLIPs. Contains two logarithmic plots showing that cell-based algorithms for neighbor lists scale efficiently, with the time per atom decreasing significantly as the system size grows to 128K atoms, effectively outperforming the provided MLIP baseline and naive algorithmic approaches. The DFT-D3 panel shows scalability in the number of atoms also compared to an MLIP baseline. Batched DFT-D3 calculations achieve the same scaling efficiency as running a single, larger system with an equivalent total number of atoms.
Benchmark graphs of several ALCHEMI Toolkit features compared to MLIPs. Contains two logarithmic plots showing that cell-based algorithms for neighbor lists scale efficiently, with the time per atom decreasing significantly as the system size grows to 128K atoms, effectively outperforming the provided MLIP baseline and naive algorithmic approaches. The DFT-D3 panel shows scalability in the number of atoms also compared to an MLIP baseline. Batched DFT-D3 calculations achieve the same scaling efficiency as running a single, larger system with an equivalent total number of atoms.
Figure 2. Benchmarks showing the speed of ALCHEMI Toolkit neighbors list (each naive O(N²) and cell list O(N) implementations) and DFT-D3 in comparison with the computational cost of popular kernel-accelerated MLIPs 

ALCHEMI Toolkit-Ops is designed to integrate seamlessly with the broader PyTorch-based atomistic simulation ecosystem. We’re excited to announce in-progress integrations with leading open source tools within the chemistry and materials science community: TorchSim, MatGL, and AIMNet Central.

TorchSim

TorchSim, a next-generation open source atomistic simulation engine, is adopting ALCHEMI Toolkit-Ops kernels to power its GPU-accelerated workflows.TorchSim is a PyTorch-native simulation engine purpose-built for the MLIP era, enabling batched molecular dynamics and structural rest across hundreds of systems concurrently on a single GPU. TorchSim will leverage our optimized neighbor lists to drive high-throughput batched operations without sacrificing flexibility or performance.

MatGL

MatGL (Materials Graph Library) is an open source framework for constructing graph-based machine learning interatomic potentials and foundation potentials for inorganic, molecular, and hybrid materials systems. By integrating ALCHEMI Toolkit-Ops, MatGL significantly accelerates graph-based treatments of long-range interactions, enabling large-scale atomistic simulations which are each faster and more computationally efficient without compromising accuracy.

AIMNet Central

AIMNet Central is a repository for AIMNet2, a general-purpose MLIP able to modeling neutral, charged, organic, and elemental-organic systems with high fidelity. AIMNet Central is leveraging ALCHEMI Toolkit-Ops to further enhance the performance of its flexible long-range interaction models. Using NVIDIA-accelerated DFT-D3 and neighbor list kernels, AIMNet2 can deliver even faster atomistic simulations for big and periodic systems without compromising accuracy.

Getting began with ALCHEMI Toolkit-Ops is easy and designed with ease of use in mind.

System and package requirements

  • Python 3.11+
  • Operating System: Linux (primary), Windows (WSL2), macOS
  • NVIDIA GPU (A100 or newer beneficial), CUDA compute capability ≥ 8.0
  • CUDA Toolkit 12+, NVIDIA driver 570.xx.xx+

Installation 

To put in ALCHEMI Toolkit-Ops, use the next snippet:

# Install via pip wheel
pip install nvalchemi-toolkit-ops

# Make certain it's importable
python -c "import nvalchemiops; print(nvalchemiops.__version__)"

See the ALCHEMI Toolkit-Ops documentation for other installation instructions. Explore the examples directory within the GitHub repository and run them to check acceleration on your personal hardware.

Typical troubleshooting suggestions:

  • Confirm CUDA installation and device availability: nvidia-smi, nvcc --version
  • Ensure compatible Python version: python --version
  • Upgrade dependencies as needed: pip list | grep torch and pip list | grep warp

Feature highlights

This section dives into three ALCHEMI Toolkit-Ops initial features: high-performance neighbor lists, DFT-D3 dispersion corrections, and long-range electrostatic interactions.

Neighbor lists

Neighbor list construction is the backbone of atomistic simulations enabling calculation of energies and forces with local or semi-local MLIPs. ALCHEMI Toolkit-Ops delivers state-of-the-art GPU performance in PyTorch, achieving performance scaling to hundreds of thousands of atoms per second for batches of many small to medium atomic systems or single large atomic systems.

Capabilities

  • Each O(N) (cell list) and O(N²) (naive) algorithms with batched processing
  • Periodic boundary support for triclinic cells with arbitrary cell dimensions and partial periodicity
  • Supports end-to-end compute graph compilation
  • Direct API compatibility with PyTorch

API example

import torch
from nvalchemiops.neighborlist import neighbor_list


# Water molecule
water_positions = torch.tensor([
   [0.0, 0.0, 0.0],      # O
   [0.96, 0.0, 0.0],     # H
   [-0.24, 0.93, 0.0],   # H
], device="cuda", dtype=torch.float32)
# Ammonia molecule (NH3)
ammonia_positions = torch.tensor([
   [0.0, 0.0, 0.0],      # N
   [1.01, 0.0, 0.0],     # H
   [-0.34, 0.95, 0.0],   # H
   [-0.34, -0.48, 0.82], # H
], device="cuda", dtype=torch.float32)
# Concatenate positions for batch processing
positions = torch.cat([water_positions, ammonia_positions], dim=0)
# Create batch indices (0 for water, 1 for ammonia)
batch_idx = torch.cat([
   torch.zeros(3, dtype=torch.int32, device="cuda"),   # Water
   torch.ones(4, dtype=torch.int32, device="cuda"),    # Ammonia
])
# Define cells for every molecule (large enough to contain them without PBC)
cells = torch.stack([
   torch.eye(3, device="cuda") * 10.0,  # Water cell
   torch.eye(3, device="cuda") * 10.0,  # Ammonia cell
])
# non-periodic molecule case
pbc = torch.tensor([
   [False, False, False],  # Water
   [False, False, False],  # Ammonia
], device="cuda")
# Cutoff distance in Angstroms
cutoff = 4.0
# Compute neighbor list; here we explicitly request a batched cell list algorithm
neighbor_matrix, num_neighbors, shift_matrix = neighbor_list(
   positions, cutoff, cell=cells, pbc=pbc, batch_idx=batch_idx, method="batch_cell_list"
)
print(f"Neighbor matrix: {neighbor_matrix.cpu()}")  # [7, num_neighbors.max()]
print(f"Neighbors per atom: {num_neighbors.cpu()}")  # [7,]
print(f"Periodic shifts: {shift_matrix.cpu()}")

DFT-D3 dispersion corrections

Realistic molecular modeling must fully account for van der Waals interactions, which standard DFT functionals don’t account for systematically. DFT-D3 uses empirical pairwise corrections, resulting in substantial improvements in binding energies, lattice structures, conformational evaluation, and adsorption studies for common DFT functionals. 

Capabilities

  • Becke-Johnson (BJ) rational damping variant
  • Supports batched and periodic calculations
  • Supports smoothing at cutoff distance
  • Joint energy, forces, and virial calculation

API example

from nvalchemiops.interactions.dispersion import dftd3

batch_ptr = torch.tensor([0, 3, 7], dtype=torch.int32, device="cuda")
atomic_numbers = torch.tensor(
    [6, 1, 1, 7, 1, 1, 1], dtype=torch.int32, device="cuda"
)
# For this snippet, assume d3_params is loaded as:
# d3_params = D3Parameters(rcov=..., r4r2=..., c6ab=..., cn_ref=...)
# Users can consult with the documentation to source DFT-D3 parameters
# and understand the expected data structure
d3_params = ...
# call the DFT-D3 functional interface
energy, forces, coordination_numbers = dftd3(
    positions=positions,
    numbers=atomic_numbers,
	 a1=0.3981, a2=4.4211, s8=0.7875,  # PBE parameters
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=shift_matrix,
    batch_idx=batch_idx,
    d3_params=d3_params
)
print(f"Energies: {energy.cpu()}")  # [2,]
print(f"Forces: {forces.cpu()}")  # [7, 3]

Limitations

The present implementation computes two-body terms only (C6 and C8). Three-body Axilrod-Teller-Muto (ATM/C9) contributions are usually not included. This generally results in some over-estimation of dispersion energies. 

Long-range electrostatic interactions

Accurate modeling of electrostatic interactions is critical for simulations involving ions/charged species and polar systems. Currently, essentially the most common approach for MLIPs is to learn Coulomb interactions inside the short-ranged model. Systematic underestimation of long-range Coulombic effects results in lack of accuracy in binding energies, solvation structures, and interfacial phenomena. 

ALCHEMI Toolkit-Ops provides fully GPU-accelerated Ewald summation methods—each standard Ewald and particle mesh Ewald (PME)—enabling GPU-accelerated, efficient and accurate treatment of long-range electrostatics in PyTorch.

For giant periodic systems, Ewald-based methods separate electrostatic interactions into short-range and long-range components, each computed within the domain best fitted to performance. ALCHEMI Toolkit-Ops provides a dual-cutoff strategy that dramatically reduces redundant neighbor queries and memory overhead in comparison with naive all-pairs approaches, making high-throughput simulations of charged systems practical on modern GPUs. Users can choose from standard Ewald for smaller systems or PME for larger periodic systems, depending on their specific performance and accuracy needs.

Capabilities

  • Ewald summation method
  • Particle Mesh Ewald (PME) using B-splines
  • Supports batched and periodic systems
  • GPU-optimized computation, leveraging cuFFT for fast reciprocal-space evaluation
  • PyTorch integration provides native tensor support for end-to-end differentiable workflows

API example

from nvalchemiops.interactions.electrostatics import particle_mesh_ewald

# charges for every atom are randomly generated here
atomic_charges = torch.randn(
    positions.size(0),  dtype=torch.float32, device="cuda"
)
# compute energy and forces with particle mesh ewald
energy, forces = particle_mesh_ewald(
    positions,
    atomic_charges,
    cells,
    alpha=0.3,  # adjust Ewald splitting parameter
    batch_idx=batch_idx,
    neighbor_matrix=neighbor_matrix,
    neighbor_matrix_shifts=shift_matrix,
    compute_forces=True
)
print(f"Energy: {energy.cpu()}")  # [2]
print(f"Forces: {forces.cpu()}")  # [7, 3]

ALCHEMI Toolkit-Ops empowers the community with high-performance, accessible atomistic modeling tools on NVIDIA GPUs. To speed up your chemistry and materials science simulations, visit the NVIDIA/nvalchemi-toolkit-ops GitHub repo and NVIDIA ALCHEMI Toolkit-Ops documentation. You may as well explore the examples gallery. This beta release of ALCHEMI Toolkit-Ops focuses on highly efficient neighbor lists, dispersion corrections, and long-range electrostatics. Stay tuned for brand spanking new features and performance optimizations in future releases.

Acknowledgments

We’d prefer to thank Professor Shyue Ping Ong; Professor Olexandr Isayev; and the TorchSim committee members Abhijeet Gangan, Orion Archer Cohen, Will Engler, and Ben Blaiszik for working with us to adopt NVIDIA ALCHEMI Toolkit-Ops into their open source projects. We also thank Wen Jie Ong, Piero Altoe, and Kibibi Moseley from NVIDIA for his or her help preparing this blog post.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x