-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BFD-3664: Pipeline job for SAMHSA tag backfill. #2506
Conversation
…if/samhsa/CcwTagKey.java Co-authored-by: aschey-forpeople <[email protected]>
ConfigLoader config, boolean ccwPipelineEnabled) { | ||
boolean enabled = config.booleanOption(SSM_PATH_SAMHSA_BACKFILL_ENABLED).orElse(false); | ||
// We don't want to run if we're on a CCW Pipeline instance | ||
if (!enabled || ccwPipelineEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit confusing that this runs on the RDA pipeline, but I understand the reasoning. Ideally this could run as its own pipeline, but that would increase the complexity a fair bit. I think this is fine for now and we can revisit once we're running in ECS.
890887b
to
b75b481
Compare
/** Query to perform upsert on the backfill progress table for a given claim table. */ | ||
public static final String UPSERT_PROGRESS_QUERY = | ||
""" | ||
INSERT INTO ccw.samhsa_backfill_progress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we have to remember to reset the progress table each time we want to run a new backfill job. I was thinking we could tie it to a job ID or something that can increment each time, but that's probably not worth the complexity. Probably just want to add a note to the runbook that this table will need to be reset.
JIRA Ticket:
BFD-3664
What Does This PR Do?
This PR adds a pipeline job to backfill the SAMHSA tags tables. This will be able to run concurrently with the RDA pipeline job, but is disabled on CCW pipeline instances.
This will process all of the tables that could have SAMHSA codes concurrently, each with its own entityManager. There were some tradeoffs for the sake of performance, the biggest being that it does not construct entities from the SQL queries. Instead, it returns arrays of objects, and relies on the code to be aware of the types in each array position. This is obviously not ideal, the biggest issue being possible ClassCastExceptions if the array objects are not processed in the correct order; however, without the entity class to map the columns to the types, it is unfortunately unavoidable for this implementation.
What Should Reviewers Watch For?
If you're reviewing this PR, please check for these things in particular:
What Security Implications Does This PR Have?
Please indicate if this PR does any of the following:
Adds any new software dependencies
Modifies any security controls
Adds new transmission or storage of data
Any other changes that could possibly affect security?
I have considered the above security implications as it relates to this PR. (If one or more of the above apply, it cannot be merged without the ISSO or team security engineer's (
@sb-benohe
) approval.)Validation
Have you fully verified and tested these changes? Is the acceptance criteria met? Please provide reproducible testing instructions, code snippets, or screenshots as applicable.