Easy methods to Unlock Local Detail in Coarse Climate Projections with NVIDIA Earth-2

-


Global climate models are good at the large picture—but local climate extremes, like hurricanes and typhoons, often disappear in the main points. Those patterns are still there—you only need the correct tools to unlock them in high-resolution climate data.

Using NVIDIA Earth‑2, this blog post shows you how one can downscale coarse climate projections into higher-resolution, bias‑corrected fields—revealing local detail not resolved within the raw data. 

Why downscaling climate projections is essential to risk assessment

High-resolution projections play a key role in assessing physical climate risk, informing decisions from infrastructure planning to agricultural adaptation. Nevertheless, running global models at positive resolution is computationally prohibitive—requiring significant compute, storage, and time. Coupled Model Intercomparison Project Phase 6 (CMIP6) provides essentially the most widely used global climate projections—underpinning IPCC reports and sector-specific risk models—but its outputs are sometimes too coarse to capture short-lived weather events, like tropical cyclones or extreme heat. 

Small ensemble sizes compound the issue. When only a number of simulations can be found, rare but extreme events will be missed, reducing the reliability of tail-risk estimations. Many weather workflows construct on ERA5, a reanalysis dataset. CMIP6 outputs often differ in grid, cadence, pressure levels, and variable definitions, creating a niche between available projections and climate risk assessment needs. For researchers and practitioners, closing this gap calls for accelerated computing and AI-driven tools that may bridge coarse climate projections and analysis-ready data.

NVIDIA Earth-2 provides a full-stack platform to construct and deploy AI-powered weather and climate applications. NVIDIA CorrDiff—a model inside this ecosystem—replaces compute-intensive and time-consuming high-resolution simulations with a generative downscaling model that transforms coarse CMIP6 output into high-resolution fields. It consists of a regression model that predicts the conditional mean, and a diffusion model that predicts the residuals between the regression output and the bottom truth. 

CorrDiff performs core climate data transformations—including spatial and temporal downscaling, bias correction, and variable synthesis concurrently—by training to map from biased climate model outputs to observation-constrained reanalysis, like ERA5. As a model-to-model translation tool, it might generate large ensembles from a single input sample, providing uncertainty estimates that help assess tail risks. 

This blog post walks you thru training CorrDiff for CMIP6 downscaling step-by-step, including dataset selection and configuration. 

It also shows how S&P Global Energy, a pacesetter in climate hazard assessment, is developing this workflow with NVIDIA to supply data grounded in statement to be used in risk analytics. 

Input and goal datasets for CorrDiff training

Training CorrDiff requires paired low-resolution input and high-resolution goal datasets covering the identical time period and weather evolution. You’ll need an observation-assimilated climate model run for training. Free-running climate simulations don’t align with observed patterns. 

Using CanESM5 assimilated hindcasts as input

This guide uses data from the CanESM5 climate model. The output is on a 64×128 global grid at roughly 2.8° resolution (~300 km on the equator). This captures large-scale climate patterns but is simply too coarse to resolve the finer-scale features needed for climate risk assessment. 

Select data that balances availability across historical and future scenarios, temporal resolution near ERA5’s hourly output, and vertical coverage. You’ll want to include pressure-level variables to explain the 3D atmospheric state. 

You’ll use day by day data from CanESM5’s assimilated hindcast runs (experiment_id: dcppA-assim). These runs use ERA-Interim to keep up CanESM5’s large-scale circulation aligned with observed weather. Because CanESM5 provides day by day data and ERA5 provides hourly data, use three consecutive day by day timesteps—the day before, the day of, and the day after the goal hour as input.

Surface inputs include temperature, humidity, precipitation, radiation fluxes, wind, cloud cover, and cryosphere variables. Pressure-level inputs (ta, ua, va, zg, hus, wap) are provided at 1000, 850, 700, 500, 250, 100, 50, and 10 hPa.

ERA5 reanalysis because the high-resolution goal

The goal dataset is ERA5, which provides hourly data at 0.25° resolution (~31 km on the equator) on a 721×1440 grid. With comprehensive coverage of variables, it’s an ordinary goal for a lot of AI weather models, including FourCastNet and HENS. Given an input resolution of ~2.8° and a goal resolution of ~0.25°, this achieves roughly 11x super-resolution per spatial dimension. 

On the output side, use the identical variables as HENS plus sea surface temperature (SST). SST missing values over land are filled using a smoothed nearest-neighbor interpolation to avoid missing values and artificial gradients at coastlines. 

Additional inputs and preprocessing 

Beyond CanESM5 variables, it’s essential to incorporate temporal and geographic context channels (Table 1). 

Category Variable Description
Time-dependent Solar zenith angle Sun position
Time-dependent Hour of the day Output Goal hour 0-23 UTC;
guides mixing of temporal context
Static Geopotential height Orographic effects
Static Distance to ocean Land-sea contrasts
Static Land-sea mask Surface type distinction
Static Longitude (sin, cos) Spherical encoding (2 channels)
Static Latitude (sin, cos) Spherical encoding (2 channels)
Table 1. Additional input channels providing temporal and geographic context 

Trigonometric encodings provide continuous spherical coordinates. The hour-of-day channel guides temporal weighting. For instance, midnight predictions rely more on “yesterday” while hour 23 draws more from “tomorrow.” 

Apply three preprocessing steps:  

  1. Merge snow cover and sea ice concentration right into a single variable to avoid missing values over land.  
  2. Normalize each channel using z-scores for stable training.
  3. Upsample bilinearly to the ERA5 grid, as CorrDiff requires matched input and output grids.

CanESM5 and the ERA5 core dataset have 38 years of overlap. Combined with 10 ensemble members from CanESM5, this yields roughly 138,700 training samples—well above the advisable 50,000 minimum. The ultimate input comprises 231 channels: 222 from CanESM5 (74 variables × 3 timesteps), 2 time-dependent, and seven static. 

Easy methods to train CorrDiff for climate downscaling

Training CorrDiff involves five steps: data loading, model configuration, regression training, regression evaluation, and diffusion training. The complete pipeline takes tens of GPU-hours for small datasets or ~2,000 GPU-hours for larger ones, like on this guide.

  1. Data loading
    1. Construct your custom dataloader using the NVIDIA PhysicsNeMo DownscalingDataset as a template.
    2. Keep per-sample work lightweight.
    3. Precompute dataset-wide statistics (e.g., climatologies).
    4. If data loading becomes a bottleneck, consider optimizing or switching to a faster loader.
  2. Configure the model and datasets
    1. Edit Hydra-based YAML configuration files as described within the documentation to define datasets, model architecture, training hyperparameters, and data splits.
    2. If required, override settings at runtime using Hydra ++ syntax.
  3. Training the regression model
    1. Run python train.py with the regression config and train a UNet to predict the conditional mean of the mapping.
    2. Track validation loss for early stopping to avoid overfitting.
    3. For scaling suggestions and faster training, see the CorrDiff training performance optimization post.
  4. Evaluating the regression model
    1. Run the trained regression model on the validation set.
    2. Compute the usual deviation of prediction errors per output channel.
    3. Use these to set sigma_data within the diffusion model’s loss function.
    4. This balances optimization across variables with different error scales.
  5. Training the diffusion model
    1. Run python train.py with the diffusion config, using sigma_data from the regression evaluation step.
    2. This stage learns so as to add realistic fine-scale variation.

Running inference and evaluating downscaled output for CanESM5 SSP585

Use NVIDIA Earth2Studio—an open-source Python package for running AI weather and climate models—for inference. It features a CorrDiff wrapper and a CMIP6 dataset that downloads from ESGF servers robotically. The package is well extensible with custom sources and wrappers. 

Running downscaling for CanESM5 SSP585 is easy: 

import torch
import numpy as np
import xarray as xr
from earth2studio.models.dx import CorrDiffCMIP6
from earth2studio.data import CMIP6MultiRealm, CMIP6, fetch_data

# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CorrDiffCMIP6.load_model(
    CorrDiffCMIP6.load_default_package(), 
    output_lead_times=np.array([np.timedelta64(-12, "h"), np.timedelta64(-6, "h")])
)
model.seed = 1 # Set seed for reproducibility
model.number_of_samples = 1 # Modify variety of samples if needed
model = model.to(device)
# Construct CMIP6 multi-realm data source, about 60 Gbs of information shall be fetched
cmip6_kwargs = dict(experiment_id="ssp585", source_id="CanESM5", variant_label="r1i1p2f1", exact_time_match=True)
data = CMIP6MultiRealm([CMIP6(table_id=t, **cmip6_kwargs) for t in ("day", "Eday", "SIday")])
x, coords = fetch_data(
    source=data,
    time=np.array([np.datetime64("2037-09-06T12:00")]), # CMIP day by day data provided at 12:00 UTC
    lead_time=model.input_coords()["lead_time"],
    variable=model.input_coords()["variable"],
    device=device,
)
# Run model forward pass
out, out_coords = model(x, coords)
da = xr.DataArray(data=out.cpu().numpy(), coords=out_coords, dims=list(out_coords.keys()))

This code loads the CorrDiff model and the CMIP6 datasets (atmosphere and sea ice realms), configures the experiment and variant IDs, and runs the downscaling workflow for the desired timestamps. Results are returned as xarray DataArray.

To learn more about Earth2Studio, join for the online course Applying AI Weather Models with NVIDIA Earth-2 from the NVIDIA Deep Learning Institute. 

An image showing10-meter wind speed for 09-03-2037An image showing10-meter wind speed for 09-03-2037
Figure 1. 10 m wind speed on Sept. 3, 2037 (SSP585). The coarse input (left) misses local detail, while AI downscaling to hourly resolution (right) reveals two tropical cyclones

Figure 1 shows the model’s capabilities on a single timestamp. The left panel shows raw CMIP6 10 m wind speed at ~2.8° resolution. The proper panel shows CorrDiff output at ~0.25° resolution, revealing a hurricane within the Caribbean and a typhoon within the Pacific—neither is resolved within the coarse input. 

CorrDiff recovers these systems by detecting subtle large-scale fingerprints that correlate with cyclone presence. While ERA5 has known limitations for tropical cyclones, this result shows how fine-scale signals will be extracted from coarse climate model output. 

You may also evaluate the model quantitatively. For the 2010 test set, outputs are downscaled 4 times day by day (00, 06, 12, 18 UTC), averaged into day by day means, and compared against bilinearly interpolated CMIP6 as a baseline. Table 2 summarizes results for near-surface air temperature (T2m) and 10m wind components (U10m, V10m). 

CorrDiff outperforms the baseline on Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), with strong temperature improvements. The T2m bias drops from nearly 1 K to -0.11 K. For wind components, the baseline shows a lower bias, but this reflects the minimal wind bias in CanESM5’s assimilated runs fairly than a limitation of CorrDiff. 

Variable Source Bias MAE RMSE
T2m CMIP6 0.97 K 2.06 K 3.19 K
T2m CorrDiff -0.11 K 0.99 K 1.55 K
U10m CMIP6 0.01 m/s 1.55 m/s 2.17 m/s
U10m CorrDiff 0.09 m/s 0.87 m/s 1.24 m/s
V10m CMIP6 -0.01 m/s 1.55 m/s 2.15 m/s
V10m CorrDiff 0.20 m/s 0.87 m/s 1.21 m/s
Table 2. Each day-averaged test metrics for 2010 comparing bilinearly interpolated CMIP6 (baseline) to CorrDiff output. Metrics computed against ERA5

CorrDiff can generate ensembles of plausible outputs from a single input sample. Table 3 reports hourly metrics using an 8-member ensemble, including the Continuous Ranked Probability Rating (CRPS), which generalizes MAE to probabilistic forecasts. CRPS rewards accuracy and well-calibrated spread—lower values indicate higher performance. Values below the deterministic MAE show that the ensemble spread reflects uncertainty, not noise. 

Variable Source Bias MAE RMSE CRPS
T2m CorrDiff -0.12 K 1.24 K 1.88 K 0.94 K
U10m CorrDiff 0.09 m/s 1.24 m/s 1.78 m/s 0.94 m/s
V10m CorrDiff 0.20 m/s 1.27 m/s 1.80 m/s 0.96 m/s
Table 3. Hourly test metrics for 2010, including ensemble-based CRPS (8 members), computed against ERA5

These quantitative evaluations use historical data where ERA5 provides ground truth. To check generalization, apply the model to SSP585 projections (r1i1p1f1), which extend beyond the training period to 2100. 

Follow the identical protocol: downscale 4 times day by day, aggregate to day by day means, then compute global annual averages. Figure 2 shows the time series for T2m, U10m, and V10m.

Comparison of variables over time. Left panel shows ERA5, CMIP6, and CorrDiff trends for temperature and wind components. Right panel shows the difference between CorrDiff and CMIP6.Comparison of variables over time. Left panel shows ERA5, CMIP6, and CorrDiff trends for temperature and wind components. Right panel shows the difference between CorrDiff and CMIP6.
Figure 2. Global annual means for T2m, U10m, and V10m across ERA5, CMIP6, and CorrDiff. The vertical line marks the transition to SSP585; the correct panel shows the CorrDiff − CMIP6 correction over time

Through the historical period, the pattern matches the 2010 findings. CanESM5 shows a warm bias, while CorrDiff output aligns closely with ERA5. Within the SSP585 projection, the CorrDiff correction stays stable, suggesting the learned bias adjustment holds in future conditions. Nevertheless, variability within the correction increases because the projections move farther from the training period, especially for wind components.

While encouraging, the outcomes warrant caution. The model hasn’t been trained on future climate data, and a few distribution shifts are inevitable. For applications requiring high confidence in future projections, additional validation strategies—resembling rolling-window cross-validation—might help quantify extrapolation limits. 

Applications, takeaways, and next steps for climate downscaling

The following query is how these high-resolution fields— especially large, correlated ensembles—translate into downstream decision workflows. S&P Global Energy is using CorrDiff and FourCastNet models to generate large sets of climate ensembles for probabilistic portfolio-level impact and resilience evaluation. These large ensembles enable the team to represent ranges and structures of plausible future climate states with a set of initial conditions, including rare but high-impact extremes. 

With access to lots of or 1000’s of realizations, S&P Global Energy can higher define and quantify variability, tail behavior, and joint risks across assets—critical for modeling nonlinear, highly correlated climate impacts. They’re developing a scalable capability for probabilistic, portfolio-level risk evaluation using very large ensembles. The ensembles function inputs to internal impact functions, translating climate variables into losses resembling damage to buildings and infrastructure, disruption of energy systems or transportation networks, and stress on community supply chains supporting essential services. 

With this technology, it is feasible to enhance resilience and switch climate risk into actionable insight. While this work continues to be in energetic development, these technologies might help organizations and decision-makers higher understand, prepare for, and reply to climate risks.

Start

This guide covered how CorrDiff downscales coarse climate model output by combining 11x spatial super-resolution, daily-to-hourly temporal downscaling, variable synthesis, and bias correction in a single workflow. 

 You’ll be able to now apply the identical approach to your personal climate downscaling projects. Start: 

  1. Run inference using Earth2Studio to downscale CMIP6 scenarios with the pretrained CorrDiff model.
  2. Train your personal model with PhysicsNeMo to customize CorrDiff on your specific datasets and wishes.
  3. Watch the tutorial video Get Began with NVIDIA Earth-2 in 5 Minutes.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x