NVIDIA & UW Use 18,000 Years of Synthetic Climate for AI Forecasting

Image Credit: Jacky Lee

A new arXiv preprint (arXiv:2512.22814, submitted 28 Dec 2025) describes a technique called long range distillation, aimed at improving AI forecasting at subseasonal to seasonal time scales, meaning weeks to months ahead.

The authors frame the core challenge as twofold: many AI weather systems are trained as short step predictors and can accumulate errors when rolled forward repeatedly, and the effective number of independent long range training examples in reanalysis data can be too small for training expressive long lead probabilistic models without overfitting.

The arXiv HTML version lists affiliations including NVIDIA Research (Santa Clara, California) and the University of Washington (Seattle).

What The Paper Proposes

The approach uses a teacher model to generate a very large synthetic climate dataset, then trains a student model to forecast directly at longer lead times.

In this work, the teacher is the Deep Learning Earth System Model (DLESyM), run via NVIDIA Earth2Studio. The authors say they initialise DLESyM from ERA5 and generate climate simulations across many decades to build a training set that is much larger than typical observational records used in ML weather training.

How The Synthetic Dataset Is Generated

The paper reports generating an ensemble of 200 simulations, initialised on dates evenly spaced between 1 Jan 2008 and 31 Dec 2016, with each run extended for 90 years. That totals 18,000 simulated years. The authors then prune runs that became unstable, reporting 14% instability, leaving 15,000 years retained for training and validation.

They also report the throughput: generating the ensemble took about 4 hours when run in parallel on 96 NVIDIA H100 GPUs.

What The Student Model Looks Like

The student is a conditional diffusion model that produces probabilistic forecasts. The implementation uses a HEALPix64 representation of the globe and conditions on a short history of daily averaged states (the paper describes using four previous daily averages after downsampling the teacher output).

The DLESyM configuration used here predicts a limited set of variables including sea surface temperature, 2 m air temperature, 850 hPa temperature, total column water vapour, 10 m wind speed, and several geopotential height fields.

What Results Are Reported

The preprint reports two main evaluation tracks:

  • Perfect model experiments where the student is trained and tested within the teacher generated world. The authors report the distilled student outperforms climatology and can approach the teacher’s skill while replacing hundreds of short autoregressive steps with a single long lead prediction step.

  • Real world evaluation where the student is adapted toward observations using ERA5 fine tuning and bias correction steps. The abstract claims the resulting system achieves subseasonal to seasonal forecast skill comparable to the ECMWF ensemble forecast after ERA5 fine tuning.

Because this is a preprint, these outcomes should be treated as the authors’ reported results until independently replicated.

How It Fits with The Current AI Forecasting Wave

Over the past two years, AI weather forecasting has moved quickly in the medium range. For example, GraphCast demonstrated skill for forecasts out to 10 days in a peer reviewed Science paper, and GenCast reported probabilistic ensemble forecasts out to 15 days in Nature using a diffusion approach.

Operational centres have also started deploying AI guidance. ECMWF says it took the ensemble version of its Artificial Intelligence Forecasting System (AIFS) into operations on 1 July 2025, running side by side with its physics based system.

What differentiates the long range distillation paper is not speed at 10 to 15 days, but the attempt to scale training data for weeks ahead forecasting by using AI generated synthetic climate as a substitute for additional long range observations.

What to Interpret Carefully

The paper refers to a “40 year reanalysis record” when discussing overfitting risk for long lead training, which aligns with the common practice of training primarily on the modern satellite era subset.

However, it is also worth noting that ERA5 as a dataset covers January 1940 to present according to ECMWF and Copernicus documentation, even though earlier decades have different observational constraints and are not always used in ML training setups.

The manuscript also flags practical and scientific limits: the DLESyM configuration used here simulates a small set of variables compared with many leading medium range models, and the authors observed instability in a minority of long rollouts that they later pruned.

What to Watch Next

Two follow ups will matter for credibility and real world relevance.

First, independent evaluation on established benchmarks and across multiple variables and seasons, ideally by groups not involved in the work. Frameworks like WeatherBench are often used for like for like comparisons of ML and physics based systems.

Second, reproducibility. The authors say their long range distillation code will be released in a public GitHub repository upon publication, while the DLESyM teacher model code is available via the Earth2Studio package.

3% Cover the Fee
TheDayAfterAI News

We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.

Previous
Previous

AI Power Struggle: US Grid Rules Tighten as States Push Back

Next
Next

OpenAI Updates ChatGPT Atlas Browser Against Prompt Injection Attacks