Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need additional kwargs in observation map function when wrapped into CAL.calibrate #128

Open
Julians42 opened this issue Jan 15, 2025 · 2 comments
Assignees

Comments

@Julians42
Copy link
Member

When running calibrations using one of the new methods like:

calibrate(WorkerBackend, 
        ekiobj, 
        ensemble_size,
        n_iterations,
        prior,
        output_dir
)

the observation map function is inaccessible for passed configuration arguments that are needed to extract observations correctly. I can load them in by hardcoding the config_dict read into the function but would be nice to be able to stick with the methods already used in ClimaAtmos.jl e.g., https://github.com/CliMA/ClimaAtmos.jl/blob/8b6f11d86ce53fcadf126d41ff5ce8c500b43621/calibration/experiments/gcm_driven_scm/observation_map.jl#L9

Perhaps I'm missing something and this is already there - if so feel free to delete this :)

@nefrathenrici
Copy link
Member

What are some of the passed configuration arguments you want to support? Are these related to the EKP.Observations module?

@Julians42
Copy link
Member Author

Not related to EKP.Observations, this is more related to how to process the data. E.g., in order to create the G_ensemble matrix we need to know the height of the column (e.g., number of levels), the variables we want to extract, and the batch size. We also need to specify which section of the simulation we're processing for our observations, e.g., hours 60-72, for example. I think it's easiest to pass a dict here (here's my workflow):

function observation_map(iteration; config_dict::Dict)

    full_dim =
        config_dict["dims_per_var"] *
        length(config_dict["y_var_names"]) *
        config_dict["batch_size"]
    
    G_ensemble =
        Array{Float64}(undef, full_dim..., config_dict["ensemble_size"])

    iter_path = path_to_iteration(config_dict["output_dir"], iteration)
    eki = JLD2.load_object(joinpath(iter_path, "eki_file.jld2"))
    for m in 1:config_dict["ensemble_size"]
        member_path =
            path_to_ensemble_member(config_dict["output_dir"], iteration, m)
        try
            G_ensemble[:, m] .= process_member_data(
                member_path,
                eki;
                y_names = config_dict["y_var_names"],
                t_start = config_dict["g_t_start_sec"],
                t_end = config_dict["g_t_end_sec"],
                z_max = config_dict["z_max"],
                norm_factors_dict = config_dict["norm_factors_by_var"],
                log_vars = config_dict["log_vars"],
            )
        catch err
            @info "Error during observation map for ensemble member $m" err
            G_ensemble[:, m] .= NaN
        end
    end
    return G_ensemble
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants