-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KDVH migration package #30
base: trunk
Are you sure you want to change the base?
Conversation
|
Flag definitions: |
Related to point 4, sorry for the confusion! I was specifically asking since in lard/kdvh_importer/data_functions.go Lines 28 to 39 in 648d5f8
But I'm with you, it doesn't make sense to dump them 👌 |
Hmm, could this be in the case of blob data? |
cc7a951
to
40b69e6
Compare
CREATE TABLE IF NOT EXISTS flags.kdvh ( | ||
timeseries INT4 REFERENCES public.timeseries, | ||
obstime TIMESTAMPTZ NOT NULL, | ||
controlinfo TEXT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the reverse-engineered flags would be stored? Or the original 5 digits of useinfo that KDVH has kept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both, the controlinfo
is completely reverse-engineered, while the 5-digit KDVH flags are stored in useinfo
with the 11 trailing digits set to their default values (line 242 in kdvh/import_functions.go
). They should both be following what's defined in the excel document you shared with me.
migrations/kdvh/import.go
Outdated
return data, nil | ||
} | ||
|
||
// TODO: add CALL_SIGN? It's not in stinfosys? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this column appear? It might be ship names, but it might also be the initials of the observer for older data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T_MDATA (basically the only table I've been testing against). I'll add this info to the code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, if T_MDATA (or any other table) contains data from moving stations we probably should not store it in the same table we store static stations, since they require additional metadata (position?). But we haven't decided on the table structure yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-moving installations like oil platforms also have call signs, and they make up most of that table, I think. We do also have ship data, and there are paramid for MLON and MLAT which contains timeseries of the coordinates of these stations. That is how it was solved in KDVH. So if you see MLAT and MLON, that means there could be ship data in a table. (The table T_CDCV_DATA also contains data from buoys, though I don't remember if they move around a lot.)
migrations/lard/import.go
Outdated
"github.com/jackc/pgx/v5/pgxpool" | ||
) | ||
|
||
// TODO: I'm not sure I like the interface solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like it either. Would it be better to generate all the rows up front and use CopyFromRows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, do you mean decoupling LardObs
to separate data and flags timeseries, instead of having them together? That sounds reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also a good idea, but I meant have methods on the timeseries type that generate [][]any
(all the rows at once). Then you could drop the interface and just do pgx.CopyFromRows(ts.new_method())
. Not sure what the performance implications would be, depends a bit on Go's runtime and pgx's implementation.
@Lun4m Do you think there would be a point in removing the indices while importing? I would think it's maybe redundant with In any case we should probably run |
I don't have an opinion about it, but there aren't any drawbacks at the moment, since we don't have users, right? I don't see any reason why we shouldn't do it 👍 On another note, do you think KDVH and Kvalobs flags should be stored in the same table? |
Hmm, I'm not sure... I think it makes sense to store them in the same table, they would have been migrated to the same table in ODA. It's really a question of what Frost needs, so I would maybe ask Jo if and how Frost handles this. Louise would be the best person to ask if she wasn't on permisjon |
@Lun4m Are you filtering open and closed data? I don't remember seeing anything about that when reading the code, but perhaps I missed it |
Good point, yeah, I forgot I need to filter those out for the time being |
2337032
to
93fbb62
Compare
I can provide a list of which stnr / tables are defined as closed |
Thanks, but I don't think that's necessary, we can fetch it from stinfosys |
This PR tackles the first part of issue #14.
I adapted some scripts originally written by Ketil for ODA to dump tables from KDVH and import them into LARD. Almost happy with the code...
Things I'm still not 100% sure about:
T_MONTH_INTERPOLATED
andT_DIURNAL_INTERPOLATED
, probably I need to connect to KDVH directly?blobData
used for non-scalar parameters (KLOBS
for example)?data_functions.go
we do a lot of postprocessing that I haven't touched since I don't really understand it. My main concern is with the insertion of "null" values (i.e.-32767
, etc.), are these used somewhere downstream? Otherwise I might simply drop them, since our schema can handle real NULLs.@ketilt, do you have any insight regarding these points? Or comments about the code, if you have time to take a look at it?
Still left to do:
CorrKDVH
field, since original and corrected have the same valuecontrolinfo
anduseinfo
are correct