diff --git a/docs/src/backends.md b/docs/src/backends.md index 54075763..205a1242 100644 --- a/docs/src/backends.md +++ b/docs/src/backends.md @@ -1,14 +1,28 @@ -## Backends +# Backends -ClimaCalibrate can scale calibrations on different distributed computing environments, referred to as backends. Most of these are high-performance computing clusters. +ClimaCalibrate can scale calibrations on different distributed computing environments, referred to as backends. Each backend is optimized for specific use cases and computing resources. The backend system is implemented through Julia's multiple dispatch, allowing seamless switching between different computing environments. -Each backend has an associated `calibrate(::AbstractBackend, ...)` dispatch, which initializes and runs the calibration on the given backend. +## Available Backends -The following backends are currently supported: +1. [`JuliaBackend`](@ref): The simplest backend that runs everything serially on a single machine. Best for initial testing and small calibrations that do not require parallelization. -- [`JuliaBackend`](@ref) -- [`WorkerBackend`](@ref) -- [`CaltechHPCBackend`](@ref) -- [`ClimaGPUBackend`](@ref) -- [`DerechoBackend`](@ref) +2. [`WorkerBackend`](@ref): Uses Julia's built-in distributed computing capabilities, assigning forward model runs to separate workers using Distributed.jl. Workers can be created using [`SlurmManager`](@ref), [`Distributed.addprocs`](https://docs.julialang.org/en/v1/stdlib/Distributed/#Distributed.addprocs), or by initializing julia with the `-p` option: `julia -p 2`. Available workers can be accessed using [`Distributed.workers()`](https://docs.julialang.org/en/v1/stdlib/Distributed/#Distributed.workers). +3. HPC Cluster Backends: These backends schedule forward model runs on HPC clusters using Slurm or PBS. + - [`CaltechHPCBackend`](@ref): Caltech's Resnick HPC cluster + - [`ClimaGPUBackend`](@ref): CliMA's private GPU server + - [`DerechoBackend`](@ref): NSF NCAR Derecho supercomputing system. + +## Choosing the Right Backend + +The right backend is largely determined by the computational cost of your forward model. + +If your model is very simple or you are debugging, use the `JuliaBackend`. + +If your model requires just one CPU core or GPU, the best backend is the `WorkerBackend`. + +If your forward model requires parallelization across multiple cores or GPUs, choose one of the HPC Cluster backends. These allow you allocate more resources to each forward model using Slurm or PBS. + +## Using a Backend + +Backends are the first argument to the [`calibrate`](@ref) function, which runs iterations of the forward model, updating model parameter based on observations. diff --git a/docs/src/index.md b/docs/src/index.md index 30147332..40769404 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,11 +1,14 @@ # ClimaCalibrate.jl -ClimaCalibrate.jl is a toolkit for developing scalable and reproducible model -calibration pipelines using [EnsembleKalmanProcesses.jl](https://github.com/CliMA/EnsembleKalmanProcesses.jl/) with minimal boilerplate. +ClimaCalibrate provides a scalable framework for calibrating forward models models using +the Ensemble Kalman Process (EKP). It integrates with [EnsembleKalmanProcesses.jl](https://github.com/CliMA/EnsembleKalmanProcesses.jl/) +to enable distributed model calibration with minimal boilerplate code. -This documentation assumes basical familiarity with inverse problems and [Ensemble Kalman Inversion](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/ensemble_kalman_inversion/#eki) in particular. +Key Features -To use this framework, component models define their own versions of the functions provided in the interface. -Calibrations can either be run using just Julia, the Caltech central cluster, NCAR Derecho, or CliMA's GPU server. +- Distributed computing support for multiple HPC environments +- Integration with EnsembleKalmanProcesses.jl for parameter estimation +- Flexible model interface for different component models +- Support for emulation and sampling workflows For more information, see our [Getting Started page](https://clima.github.io/ClimaCalibrate.jl/dev/quickstart/). diff --git a/docs/src/quickstart.md b/docs/src/quickstart.md index 30965d79..27d1d0ed 100644 --- a/docs/src/quickstart.md +++ b/docs/src/quickstart.md @@ -1,10 +1,9 @@ # Getting Started -## Minimal Requirements -Every calibration using Ensemble Kalman Inversion requires the following information: -- A forward model that uses parameters to predict output -- Observational data -- Model parameters to calibrate on +Every calibration requires +- a forward model, which uses input parameters to return diagnostic output +- observational data, which can be a Vector or an [`EnsembleKalmanProcess.Observation`](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/API/Observations/#EnsembleKalmanProcesses.Observation) +- a prior parameter distribution. The easiest way to construct a distribution is with the [`EnsembleKalmanProcess.constrained_gaussian`](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/API/ParameterDistributions/#EnsembleKalmanProcesses.ParameterDistributions.constrained_gaussian) function. ## Implementing your experiment @@ -78,6 +77,21 @@ And we can put it all together: `calibrate(ensemble_size, n_iterations, observations, noise, prior, output_dir)` +Lastly, you need to set the output directory, ensemble size and the number of iterations to run for. A good rule of thumb for your ensemble size is 10x the number of free parameters. + +```julia +n_iterations = 7 +ensemble_size = 10 +output_dir = "output/my_experiment" +``` +Once all of this has been set up, you can call put it all together using the [`calibrate`](@ref) function: + +```julia +calibrate(ensemble_size, n_iterations, observations, noise, prior, output_dir) +``` + +For more information on parallelizing your calibration, see the [Backends](https://clima.github.io/ClimaCalibrate.jl/dev/backends/) page. + # Example Calibration A good way to get started is to run the example experiment, `surface_fluxes_perfect_model`,