Clouds’ diversity and fleeting nature pose challenges
To simulate precipitation, we must go to its source: clouds. Clouds can exist at scales smaller than 100 meters, the size of an athletic field — far below the kilometers-scale resolution of global weather models, or the tens-of-kilometers–scale resolution of global climate models. Clouds come in different types, change quickly, and the intricate physics happening at even smaller scales can generate water droplets or ice crystals. All this complexity is impossible for large-scale models to resolve or calculate.
To account for the effect of small-scale atmospheric processes like cloud formation on the climate, models use approximations, called parameterizations, which are based on other variables. Rather than depending on these parameterizations, NeuralGCM uses a neural network to learn the effects of such small-scale events directly from existing weather data.
We improved the representation of precipitation in this version of our model by training the ML portion of NeuralGCM directly on satellite-based precipitation observations. The initial offering of NeuralGCM was, like most ML weather models, trained on recreations of previous atmospheric conditions, i.e., reanalyses, that combine physics-based models with observations to fill in gaps in observational data. But the physics of clouds is so complex that even reanalyses struggle to get precipitation right. Training on output from reanalyses means reproducing their weaknesses, for example, on precipitation extremes and the daily cycle.
Instead, we trained the precipitation part of NeuralGCM directly on NASA satellite-based precipitation observations spanning from 2001 to 2018. NeuralGCM’s differential dynamical core infrastructure allowed us to train it on satellite observations. Previous hybrid models that combine physics and AI could only use output from high-fidelity simulations or reanalysis data. By training the AI component of NeuralGCM directly on high-quality satellite observations instead of relying on reanalyses, we are effectively finding a better, machine-learned parameterization for precipitation.

