-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register ingested groups using the correct URI #679
Conversation
Fixes the registration error in the directory case. Much less code and clearer about the intent of it.
pathlib loses the double slashes in our URIs, so we can't use that everywhere.
Directory ingestion works: https://beta.cloud.tiledb.com/taskgraphlogs/sean-gillies/dd293232-6537-44b5-aa20-9b7befc75787. Single file is succeeding, too: https://beta.cloud.tiledb.com/taskgraphlogs/sean-gillies/8f05b23d-e53b-41b7-a964-0ffe4159a69f. |
@johnkerl @aaronwolen can you check the soundness of my decision to normalize the ingestion outputs as I have done? |
entry_output_uri = output_uri + "/" + base | ||
if not output_uri.endswith("/"): | ||
entry_output_uri += "/" | ||
entry_output_uri += base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we appended base
to the output path twice. Now we only do it once. If twice is a requirement, I can restore that behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not a requirement
In the directory ingestion case, the deployed code creates groups at
OUTPUT_URI/NAME/NAME/
(note the doubledNAME
) whereNAME
is the basename/stem of an H5ad file. But then it tries to register a group atOUTPUT_URI
. There's no group there, so this fails.As a part of solving this problem, I'm generalizing directory ingest to cover the single file case. We no longer have two ingestion branches. There's much less code and I think the intent of the remaining code is more clear. This change helped me fix the bug and helps us make sure it stays fixed.
There's an output change, though: every H5ad file becomes a group at
OUTPUT_URI/NAME/
whereNAME
is the basename/stem of the H5ad file no matter what. Ingested as a single file, or as an item in a folder, the same output. In theory (and practice, often) this kind of symmetry and de-specialization is a win. I'm going to need some review from actual SOMA users and developers, for sure.The test failures are unrelated. Trouble with task graph run times again.