-
Notifications
You must be signed in to change notification settings - Fork 9
Spatial consistency test dual
sct_dual
is based on the concept of an event, which may or may not occur at a specific location. The Spatial Consistency Test (SCT) is applied to dichotomous variables, such as 'Yes-event: the event occurred' or 'No-event: the event did not occur.'
This method involves a vector containing p observed values, known as values
, along with their corresponding position vectors, expressed as points (latitude, longitude, and elevation). Users can select which observations to assess using the p-vector obs_to_check
(where the i th element: 1, indicates checking the i th observation; 0, means using the i th observation without checking).
Similar to sct
and sct_resistant
, sct_dual
employs a moving window approach across the domain. It identifies several central observations (centroids) and defines an inner and an outer circle around each. This approach allows for local adaptation of the test and optimization of computational resources and time.
The outer circle must encompass at least num_min_outer
observations, considering only the closest num_max_outer
observations to the centroid.
Each observation falls into one of two categories: 'Yes-event: the event occurred' or 'No-event: the event did not occur.' Event thresholds vary by observation location (e.g., location-dependent) and are specified via event_thresholds
and condition
. event_thresholds
can be any number, while condition
must be one of the following: Eq, Gt, Geq, Lt, Leq (equal to, greater than, greater or equal to, etc.).
Considering an outer circle around a centroid, the event definition divides observations into two sub-networks: yes-network and no-network. We assume spatial distribution of information via correlation functions, akin to statistical interpolation with error correlation matrices. The correlation between two points is based on their radial and vertical distances, scaled against two characteristic length scales. The vertical scale, vertical_scale
, is user-defined, while the horizontal scale is estimated as the average distance between observations in the outer circle, defined by kth_closest_obs_horizontal_scale
. Users can set the range with min_horizontal_scale
and max_horizontal_scale
.
At each observation location, sct_dual
first determines whether an event is more likely to occur based on neighboring observations. Then, it compares this prediction with the actual event occurrence.
For an arbitrary observation location, the guess on the event occurrence is calculated using leave-one-out integral data influence (idiv, as per Uboldi 2008). The yes-network and no-network are analyzed separately, yielding two idiv values at each location. The concept of integral influence indicates sensitivity at the observation location to variations anywhere on the network.
To quantify the difference between idiv(Yes) and idiv(No), we use the relative information content I(YwrtN) = idiv(Yes) * log( idiv(Yes) / idiv(No)) and I(NwrtY) = idiv(No) * log( idiv(No) / idiv(Yes)), as per Tarantola's "Inverse problem theory" (2005). These indices help determine how much more likely a Yes-event or a No-event is at a given location.
For example, an I(YwrtN) value of 0.1 indicates a 10% higher likelihood of a Yes-event than a No-event. Negative values of I(YwrtN) imply a higher likelihood of a No-event. When I(YwrtN) exceeds 1, a Yes-event occurrence is highly expected.
Figures Figure I(YwrtN) and Figure I(NwrtY) depict these functions. The shaded areas represent relative information content, while dashed lines are isolines for the differences I(YwrtN)-I(NwrtY), shown only where the content is positive. In Figure I(YwrtN), the likelihood of a Yes-event isn't simply proportional to idiv(Yes)-idiv(No); it depends on the specific value of idiv(Yes).
The decision on an observation's quality is based on whether:
1) For a Yes-event observation, if buddies determine a No-event is test_thresholds% more likely, or
2) For a No-event observation, if buddies determine a Yes-event is test_thresholds% more likely.
Among all candidate bad observations in an inner circle, only the one with the worst score is flagged as bad. If all inner circle observations are good, they are directly flagged as such. Observations in the outer circle are flagged as good if they all belong to the same class (either all 'Yes' or all 'No').
The general algorithm implemented to minimize unwarranted rejections is not reiterated in this context, as it closely resembles the approach utilized in sct_resistant
.
Returned values are: the p-vector of the quality flags.
flag | description |
---|---|
-999 | missing flag (observation not checked) |
0 | good observation |
1 | bad observation |
11 | isolated observation, it is the only observation inside the inner circle |
12 | isolated observation, less than num_min_outer observations inside outer circle |
The provided description depicts a scenario visualized in the referenced Figure. In this scenario, it's assumed that half of the domain consists of 'Yes-event' occurrences, represented by blue dots, and the other half comprises 'No-event' occurrences, indicated by red dots. Additionally, a certain number of 'yes' observations are strategically placed in the 'no' region and vice versa. This setup is exemplified in the accompanying code.
In this context, sct_dual
is designed to identify and flag observations that are incorrectly placed within the domain, marked by crosses in the figure. However, there are exceptions, denoted by white squares, which represent observations that have not been flagged as erroneous.
A notable point of interest is the border area between the blue and red regions, which is more prone to testing errors. Adjusting the test_thresholds
parameter may enhance the accuracy and effectiveness of the test in these border areas, potentially reducing the likelihood of incorrect flagging. This optimization aims to improve the discernment of the algorithm in distinguishing between 'Yes-event' and 'No-event' observations, particularly in areas where they closely intersect or overlap.
The thick line is the curve of the relative information content of f1 with respect to f2:
I = f1 * log( f1 / f2)
when f1=0.9. Note that I<0 for f2>f1. The thin line is f1-f2.
# R code
obs_to_check = rep(1, npoints)
event_thresholds = 0.1
test_thresholds = 0.8
condition = "Gt"
num_min_outer = 3
num_max_outer = 10
inner_radius = 20000
outer_radius = 50000
num_iterations = 10
min_horizontal_scale = 250
max_horizontal_scale = 100000
kth_closest_obs_horizontal_scale = 2
vertical_scale = 200
debug = T
res <- sct_dual( points, precip_obs, obs_to_check, event_thresholds, condition,
num_min_outer, num_max_outer, inner_radius, outer_radius,
num_iterations, min_horizontal_scale, max_horizontal_scale,
kth_closest_obs_horizontal_scale, vertical_scale,
test_thresholds, debug)
Copyright © 2019-2023 Norwegian Meteorological Institute