-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification about handling of changed MH records in import API #25
Comments
Hey @emiliom, You raise an excellent point in regards to the insert statement into the CSO database in that if someone changes the depth of their previous record, it does appear from the code that this would be treated as a new observation by the user, at least in terms of the import to the database. I'm not sure if there is a good way around this though in its current implementation, as the risk of invalid data entering the database is an inherent risk with trusting citizen science (i.e. the data could be spoofed) and determining that a record has been modified rather than newly created is a determination that we are not currently making when importing from the three services of RegObs, SnowPilot, and MH. As for the crypto function, it appears that if the data changes (i.e. the snow depth is modified), the SHA1 hash will also change representing that the ID that is being generated for that record in the database will also change. If the same record with the same data unchanged was passed to that createHash function, it would always return the same hashed value as it is a way to indicate the underlying data hasn't changed between runs of the hash function. The same is true of downloaded software; you can often get the expected hash of a binary from their website and after downloading, run the same hash function to determine if the data has been corrupted or changed in flight over the network. If you would like, we could discuss possible fixes for this, but there are probably not a terribly large number of these records that are modified by their user that would skew the database's results wildly. |
Thanks @BobTorgerson . At this stage we're just trying to understand how things currently work, exactly. The team is having discussions about what changes we might need if we want to accommodate subsequent edits. I've reported that the import API doesn't handle that at this time. Gabe will probably contact you to schedule a team call over the next few weeks to brainstorm about this. Regarding the SHA1 hash, that's what I figured. But I'll add the detail that the call to that function passes an id value as an argument; eg, for the MH code, it's id: generateId(record.observation._id). So, it looks like the hash is constructed based on the id value alone and not on the entire record? |
@BobTorgerson @brucecrevensten I'm trying to clarify a behavior in the import API. The situation and examples have arisen in the context of MountainHub data, and I'll use it to illustrate it, but it applies to all 3 providers.
We need to know what happens when the import API encounters a recent record that was previously ingested, except that there has been a change at the source since it was previously ingested. Specifically, the snowpack depth has been modified. I know the import API only queries for records with timestamps < 1 week from run time, so the edited record does meet that criteria.
The SQL INSERT statement has a condition that will reject the insertion if the record fully matches one that already exists:
So, my interpretation is that since
depth
will be different, the record will be treated as new and inserted. Can you confirm?In addition, the
id
is generated as a random has via thegenerateId
function:This function is called in each provider script; for MH, it's here. I don't know enough Javascript to understand if
crypto.createHash
would create a new "random" id even ifdata
is the same as in a previous record. Can you clarify?Thanks!
cc @gjwolken
The text was updated successfully, but these errors were encountered: