GPU & Architectures
The GPU support of SpeedyWeather.jl is still work in progress and some parts of this documentation might not be always updated to the latest state. We will extend this documentation over time. Don't hesitate to contact us via GitHub issues or mail when you have questions or want to collaborate.
Some of SpeedyWeather.jl already supports GPU acceleration, e.g. the barotropic model. Our development focuses on CUDA GPUs, but other architectures are thinkable in the future as well, as our approach relies on the device agnostic KernelAbstractions.jl
. The SpeedyWeather.jl submodule Architectures
encodes all the information of the device we run our models on. In order to initialize a model on a GPU, we need to load the CUDA
package and pass the architecture to the model constructor. For example, to initialize a barotropic model on a GPU, we can do the following:
using SpeedyWeather, CUDA
architecture = SpeedyWeather.GPU()
spectral_grid = SpectralGrid(trunc=41, nlayers=1, architecture=architecture)
model = BarotropicModel(spectral_grid=spectral_grid)
CUDA.@allowscalar simulation = initialize!(model)
run!(simulation, period=Day(10))
Note that we need to use CUDA.@allowscalar
here during initialization. Currently we do not yet support a fully GPU-accelerated model construction and initialization.
Architectures Utilities
In order to easily transfer our structures between CPU (e.g. for plotting and output) and GPU, we have the following utilities that make can make use of the architecture
object defined above and the on_architecture
function, e.g. as follows:
using SpeedyWeather, CUDA
nlat_half = 6
arch_cpu = SpeedyWeather.CPU()
arch_gpu = SpeedyWeather.GPU()
grid_cpu = HEALPixGrid(nlat_half, arch_cpu)
grid_gpu = on_architecture(arch_gpu, grid_cpu)
field_cpu = rand(grid_cpu)
field_gpu = on_architecture(arch_gpu, field_cpu)
spectrum_cpu = Spectrum(trunc=41, architecture=arch_cpu)
spectrum_gpu = on_architecture(arch_gpu, spectrum_cpu)
spec_cpu = rand(spectrum_cpu)
spec_gpu = on_architecture(arch_gpu, spec_cpu)
Be aware that directly calling e.g. CuArray
or adapt
on the data structres is not recommended, as it can lead to unexpected behavior, e.g. mismatching internal architecture representations when launching kernels and other operations. Please use the on_architecture
function instead for all transfer between devices.
Benchmarks
More to follow...