Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register ingested groups using the correct URI #679

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

sgillies
Copy link
Collaborator

@sgillies sgillies commented Nov 11, 2024

In the directory ingestion case, the deployed code creates groups at OUTPUT_URI/NAME/NAME/ (note the doubled NAME) where NAME is the basename/stem of an H5ad file. But then it tries to register a group at OUTPUT_URI. There's no group there, so this fails.

As a part of solving this problem, I'm generalizing directory ingest to cover the single file case. We no longer have two ingestion branches. There's much less code and I think the intent of the remaining code is more clear. This change helped me fix the bug and helps us make sure it stays fixed.

There's an output change, though: every H5ad file becomes a group at OUTPUT_URI/NAME/ where NAME is the basename/stem of the H5ad file no matter what. Ingested as a single file, or as an item in a folder, the same output. In theory (and practice, often) this kind of symmetry and de-specialization is a win. I'm going to need some review from actual SOMA users and developers, for sure.

The test failures are unrelated. Trouble with task graph run times again.

Fixes the registration error in the directory case. Much less code
and clearer about the intent of it.
@sgillies sgillies self-assigned this Nov 11, 2024
@sgillies sgillies changed the title Generalize directory ingest to cover the single file case Register ingested groups using the correct URI Nov 11, 2024
pathlib loses the double slashes in our URIs, so we can't use that
everywhere.
@sgillies sgillies marked this pull request as ready for review November 12, 2024 14:36
@sgillies
Copy link
Collaborator Author

sgillies commented Nov 12, 2024

@sgillies
Copy link
Collaborator Author

@johnkerl @aaronwolen can you check the soundness of my decision to normalize the ingestion outputs as I have done?

entry_output_uri = output_uri + "/" + base
if not output_uri.endswith("/"):
entry_output_uri += "/"
entry_output_uri += base
Copy link
Collaborator Author

@sgillies sgillies Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we appended base to the output path twice. Now we only do it once. If twice is a requirement, I can restore that behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a requirement

@sgillies sgillies merged commit 06e3bbb into main Nov 13, 2024
16 of 18 checks passed
@sgillies sgillies deleted the sg/sc-58644/simplify-soma-ingest branch November 13, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants