Skip to content

SCT dual with fg

Cristian Lussana edited this page Mar 6, 2025 · 3 revisions

Introduction

We validate observations by defining an "event" that categorizes observed values into two groups: "yes" observations, where the event has occurred, and "no" observations, where it has not. Validation is performed using a Spatial Consistency Test (SCT dual) specifically adapted for binary variables, incorporating spatial properties and first-guess values. SCT dual compares observed event occurrences to expected event occurrences derived from neighboring observations and first-guess values. If an observed event occurrence is significantly unlikely based on nearby observations, it is flagged as suspect. The expected event occurrence at a given location is determined by dividing nearby observations into two subsets—one containing all "yes" observations and the other containing all "no" observations. Each subset is then processed through statistical interpolation routines to generate indicators assessing the likelihood of a "yes" or "no" event at that location. By comparing these two indicators, called IDIv, we determine the expected event occurrence. If the observed occurrence differs significantly from the expected one, the observation may be classified as suspect or bad (terms used interchangeably).

All spatial analysis calculations are based on Optimal Interpolation (OI), and SCT dual is based on the SCT described in:

Lussana, C., Uboldi, F., and Salvati, M.R. (2010), A spatial consistency test for surface observations from mesoscale meteorological networks, Q.J.R. Meteorol. Soc., 136: 1075-1088. https://doi.org/10.1002/qj.622

Definitions
Pseudo-Algorithm
Function Signature
Diagnostic File


Definitions

  • Good Observation: An observation where the event occurrence aligns with the predicted value based on its nearest neighbors.
  • Suspect or Bad Observation: An observation where the event occurrence significantly deviates from the predicted value based on its nearest neighbors.
  • Centroid Observation: The center point of two concentric circles—the outer circle and inner circle—with radii outer_circle and inner_circle, respectively.
  • Outer Circle: The area used to select observations for assessing the quality of one or more observations simultaneously. It may include observations that help evaluate others but are not themselves assessed for quality.
  • Inner Circle: The area that allows multiple observations to be flagged at the same time. Checking more observations simultaneously speeds up the quality control process but increases the risk of misclassifying good observations as suspect.
  • IDIv (leave-one-out Integral Data Influence): measures the sensitivity of an analysis at a location to variations in observed values at nearby locations. In practice, it is the OI leave-one-out analysis, where all observations are set to 1 (indicating "observation available here") and the background is set to 0 (indicating "no observation information available here"). The analysis follows normal correlation functions, with eps2 as a user-specified parameter governing the exact fit of the observations: smaller values force IDIv to be 1 at observation locations, while eps2 = 1 gives less weight to that observation in the OI analysis. In SCT dual, observations and analyses are transformed into "yes" and "no" tags indicating event occurrence. When observation and background tags differ, eps2 is set to 1, overriding user settings.
  • IDIv(yes)*: IDIv calculated for all observations in a region, considering only those tagged as "yes".
  • IDIv(no)*: IDIv calculated for all observations in a region, considering only those tagged as "no".

Introduction


Pseudo-Algorithm

SCT dual iteration: Main Loop

  • Tag each observation and background value as "yes" or "no" based on event occurrence.
  • SCT dual iteration
    • Gradually tighten flagging criteria to make it harder to classify observations as suspect.
    • Detection Loop: Identify suspect observations.
    • Stray Data Redemption Loop: Bringing back good observations that got caught in the wrong crowd.
    • Flag Assignment Step: Assign a final flag to each observation based on detection and redemption results.
    • Exit Condition: Terminate the SCT dual iteration if no suspect observations are found or the maximum number of iterations is reached.

The final flag is assigned by the Stray Data Redemption Loop, which relies on the flags from the detection loop. Subsequent SCT iterations do not change the suspect flags from earlier iterations, but they may flag previously good observations as suspect.

Detection Loop

  • Loop over all observations:
    • Check if the current observation qualifies as a centroid observation.
    • If yes, gather all neighbors within the outer circle that were not flagged as suspect in previous SCT dual iterations.
    • If the observation is isolated, exit without flagging it.
    • For all selected observations, calculate IDIv(yes) and IDIv(no) values.
    • Flag as suspect any observations within the inner circle that meet either of the following conditions:
      • The observation is tagged as "no", and IDIv(yes) is greater than IDIv(no) plus a buffer (this buffer increases with each SCT dual iteration).
      • The observation is tagged as "yes", and IDIv(no) is greater than IDIv(yes) plus the same buffer as above.

Stray Data Redemption Loop

  • Loop over all observations flagged by the Detection Loop:
    • Treat each flagged observation as a centroid observation.
    • Retrieve all neighbors close to the centroid and consider only non-flagged neighbours
    • Break out if observation isolated and flag it as suspect
    • For all selected observations, calculate IDIv(yes) and IDIv(no) values.
    • Flag as suspect any observations within the inner circle that meet either of the following conditions:
      • The observation is tagged as "no", and IDIv(yes) is greater than IDIv(no) plus a buffer (this buffer increases with each SCT dual iteration).
      • The observation is tagged as "yes", and IDIv(no) is greater than IDIv(yes) plus the same buffer as above.

Introduction


Function Signature

ivec titanlib::sct_dual_with_fg(const Points& points,
        const vec& values,
        const vec& background_values,
        const vec& event_thresholds,
        ConditionType condition,
        int num_min,
        int num_max,
        float inner_radius,
        float outer_radius,
        int num_iterations,
        float min_horizontal_scale,
        float max_horizontal_scale,
        float vertical_scale,
        const vec& eps2,
        bool diagnostics,
        const std::string& filename_diagnostics,
        vec& sct_cvidis_yes,
        vec& sct_cvidis_no,
        const ivec& obs_to_check)

Description:

  • points: Longitude, latitude, and elevation of observation locations
  • values: Observed values
  • background_values: First-guess values at observation locations
  • event_thresholds: Event thresholds
  • condition: Event definition, one of: Eq, Gt, Geq, Lt, Leq
  • num_min: Minimum required observations within the outer radius (must be > 1)
  • num_max: Maximum observations used for the test (must be > num_min)
  • inner_radius: Radius for flagging [m]
  • outer_radius: Radius for computing OI [m]
  • num_iterations: Maximum iterations (stops if no new flags are set)
  • min_horizontal_scale: Minimum horizontal decorrelation length [m]
  • max_horizontal_scale: Maximum horizontal decorrelation length [m]
  • vertical_scale: Vertical decorrelation length [m]
  • eps2: Observation-to-background error variance ratio (e.g., 0.5 means observations are trusted twice as much as the background)
  • diagnostics: Should we write the diagnostics on a file? True or False
  • filename_diagnostics: Diagnostics filename
  • obs_to_check: Observations to be checked (1 = check, 0 = ignore)

Returns:

  • Flags indicating: suspect observations (1 = suspect, 0 = good)
  • sct_cvidis_yes: Score (0-1) representing the expected likelihood of a "yes" observation based on neighbor values
  • sct_cvidis_no: Score (0-1) representing the expected likelihood of a "no" observation based on neighbor values

Introduction


Diagnostic file

Header:

it;loop;curr;i;index;lon;lat;z;vyes;byes;dh;flags_d;cvidi_yes_d;cvidi_no_d;flags_r;cvidi_yes_r;cvidi_no_r;saved_r;flags;sct_cvidi_yes;sct_cvidi_no;

Description:

  • it: SCT iteration.
  • loop: Loop index—1: Detection Loop, 2: Cluster Preservation Loop, 3: Stray Data Redemption Loop, 4: Flag Assignment Step.
  • curr: Index of the centroid observation.
  • i: Index of an observation within the outer circle.
  • index: Index of an observation in the outer circle, referencing the full observation vector.
  • lon: Longitude.
  • lat: Latitude.
  • z: Elevation (m a.m.s.l.).
  • vyes: Observed event occurrence at the observation location (1 = yes, 0 = no).
  • byes: First-guess estimate of event occurrence at the observation location (1 = yes, 0 = no).
  • dh: Horizontal decorrelation length for the Gaussian correlation function used in OI (m).
  • flags_d: Flag assigned by the Detection Loop.
  • cvidi_yes_d: IDIv(yes) from the Detection Loop.
  • cvidi_no_d: IDIv(no) from the Detection Loop.
  • flags_r: Flag assigned by the Stray Data Redemption Loop.
  • cvidi_yes_r: IDIv(yes) from the Stray Data Redemption Loop.
  • cvidi_no_r: IDIv(no) from the Stray Data Redemption Loop.
  • saved_r: Indicates whether the Stray Data Redemption Loop saved this observation.
  • flags: Final assigned flag.
  • sct_cvidi_yes: IDIv(yes), either from the latest SCT iteration or from when the observation was flagged as suspect.
  • sct_cvidi_no: IDIv(no), either from the latest SCT iteration or from when the observation was flagged as suspect.

Introduction