-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(store): deleting compute graphs will now delete all dependencies #987
bug(store): deleting compute graphs will now delete all dependencies #987
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, have some questions.
} | ||
None => {} | ||
} | ||
txn.delete_cf(&IndexifyObjectsColumns::Tasks.cf_db(&db), &key)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this needs to be a SystemTask as well. May be add a TODO/FIXME here if we want to do this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment, but it is already tracked here: #986
I would make this whole Delete a SystemTask. Let me know if the issue makes sense, I wrote it based on your explanation last week.
)?; | ||
} | ||
|
||
delete_cf_prefix( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this use an API like this under the hood or are we iterating? https://docs.rs/rocksdb/0.22.0/rocksdb/type.DBWithThreadMode.html#method.delete_range_cf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are iterating right now
indexify/server/state_store/src/state_machine.rs
Lines 412 to 418 in 6645d22
let iter = txn.iterator_cf_opt(cf, read_options, iterator_mode); | |
for key in iter { | |
let (key, _) = key?; | |
if !key.starts_with(prefix) { | |
break; | |
} | |
txn.delete_cf(cf, &key)?; |
Some gaps exist in the delete compute graph code:
This PR correctly deletes all these dependencies.
This PR does not
Verification
Verification was performed using this dump script (manually added to
server/state_store/src/lib.rs
):Storage at initial server startup
All column families are empty.
Storage dump
Storage after second server startup without any workloads
Storage dump
``` 2024-10-29T13:40:23.864436Z INFO state_store: Listing all keys in the db Column Family: StateMachineMetadata Len = 0 Column Family: Executors key: "FVeH-HtHccKoRPLJicFq1" Value: "{\n \"addr\": \"\",\n \"executor_version\": \"0.2.19\",\n \"id\": \"FVeH-HtHccKoRPLJicFq1\",\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"labels\": {\n \"architecture\": \"arm64\",\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"os\": \"Darwin\",\n \"python_major_version\": 3,\n \"python_minor_version\": 11\n }\n}" Len = 1 Column Family: Namespaces key: "default" Value: "{\n \"created_at\": 1730209140367,\n \"name\": \"default\"\n}" Len = 1 Column Family: ComputeGraphs Len = 0 Column Family: Tasks Len = 0 Column Family: GraphInvocationCtx Len = 0 Column Family: ReductionTasks Len = 0 Column Family: GraphInvocations Len = 0 Column Family: FnOutputs Len = 0 Column Family: TaskOutputs Len = 0 Column Family: StateChanges key: "\0\0\0\0\0\0\0\0" Value: "{\n \"change_type\": \"ExecutorAdded\",\n \"created_at\": 1730209145257,\n \"id\": 0,\n \"object_id\": \"FVeH-HtHccKoRPLJicFq1\",\n \"processed_at\": 1730209145260\n}" Len = 1 Column Family: UnprocessedStateChanges Len = 0 Column Family: TaskAllocations Len = 0 Column Family: UnallocatedTasks Len = 0 Column Family: GcUrls Len = 0 Column Family: SystemTasks Len = 0 Column Family: Stats Len = 0 ```Storage after a compute graph run
Storage dump
``` 2024-10-29T13:43:08.164077Z INFO state_store: Listing all keys in the db Column Family: StateMachineMetadata Len = 0 Column Family: Executors key: "4qcYQMcXl5HuSvh3NgErV" Value: "{\n \"addr\": \"\",\n \"executor_version\": \"0.2.19\",\n \"id\": \"4qcYQMcXl5HuSvh3NgErV\",\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"labels\": {\n \"architecture\": \"arm64\",\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"os\": \"Darwin\",\n \"python_major_version\": 3,\n \"python_minor_version\": 11\n }\n}" Len = 1 Column Family: Namespaces key: "default" Value: "{\n \"created_at\": 1730209140367,\n \"name\": \"default\"\n}" Len = 1 Column Family: ComputeGraphs key: "default|object_detection_workflow" Value: "{\n \"code\": {\n \"path\": \"file:///Users/seriousben/src/github.com/seriousben/indexify-detect-image-objects/indexify_storage/blobs/default_aGF4gRN2uBKH5_5ZfZSFy\",\n \"sha256_hash\": \"0a67e5c6a6814a05b2adfa9bb1013f8d0319f792a3a80ad5d2d4315d236bce1f\",\n \"size\": 13735\n },\n \"created_at\": 0,\n \"description\": \"\",\n \"edges\": {},\n \"name\": \"object_detection_workflow\",\n \"namespace\": \"default\",\n \"nodes\": {\n \"object_detector\": {\n \"Compute\": {\n \"description\": \"\",\n \"fn_name\": \"object_detector\",\n \"image_information\": {\n \"base_image\": \"python:3.10.15-slim-bookworm\",\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"run_strs\": [\n \"pip install indexify\"\n ],\n \"tag\": \"3.10\"\n },\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"name\": \"object_detector\",\n \"payload_encoder\": \"cloudpickle\",\n \"placement_constraints\": [],\n \"reducer\": false\n }\n }\n },\n \"runtime_information\": {\n \"major_version\": 3,\n \"minor_version\": 11\n },\n \"start_fn\": {\n \"Compute\": {\n \"description\": \"\",\n \"fn_name\": \"object_detector\",\n \"image_information\": {\n \"base_image\": \"python:3.10.15-slim-bookworm\",\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"run_strs\": [\n \"pip install indexify\"\n ],\n \"tag\": \"3.10\"\n },\n \"image_name\": \"tensorlake/indexify-executor-default\",\n \"name\": \"object_detector\",\n \"payload_encoder\": \"cloudpickle\",\n \"placement_constraints\": [],\n \"reducer\": false\n }\n },\n \"version\": 1\n}" Len = 1 Column Family: Tasks key: "default|object_detection_workflow|17eacf2d6ba0bf36|object_detector|47b6599c-8824-4bb1-a9a3-a847645a1856" Value: "{\n \"compute_fn_name\": \"object_detector\",\n \"compute_graph_name\": \"object_detection_workflow\",\n \"creation_time\": {\n \"nanos_since_epoch\": 950807000,\n \"secs_since_epoch\": 1730209377\n },\n \"diagnostics\": {\n \"exception\": null,\n \"stderr\": {\n \"path\": \"file:///Users/seriousben/src/github.com/seriousben/indexify-detect-image-objects/indexify_storage/blobs/default.object_detection_workflow.object_detector.17eacf2d6ba0bf36.stderr\",\n \"sha256_hash\": \"d7921c2d8ea1fb4e5afc08f1aa7249f6d4325fc9311f061fbc61777d4c5c4c4d\",\n \"size\": 27\n },\n \"stdout\": {\n \"path\": \"file:///Users/seriousben/src/github.com/seriousben/indexify-detect-image-objects/indexify_storage/blobs/default.object_detection_workflow.object_detector.17eacf2d6ba0bf36.stdout\",\n \"sha256_hash\": \"7e942ba422f0d4a4af6ecedec681b0bd48294bdb5ffeda0cac788eb1d1ea8c9b\",\n \"size\": 183\n }\n },\n \"graph_version\": 1,\n \"id\": \"47b6599c-8824-4bb1-a9a3-a847645a1856\",\n \"input_node_output_key\": \"17eacf2d6ba0bf36\",\n \"invocation_id\": \"17eacf2d6ba0bf36\",\n \"namespace\": \"default\",\n \"outcome\": \"Success\",\n \"reducer_output_id\": null\n}" Len = 1 Column Family: GraphInvocationCtx key: "default|object_detection_workflow|17eacf2d6ba0bf36" Value: "{\n \"completed\": true,\n \"compute_graph_name\": \"object_detection_workflow\",\n \"fn_task_analytics\": {\n \"object_detector\": {\n \"failed_tasks\": 0,\n \"pending_tasks\": 0,\n \"successful_tasks\": 1\n }\n },\n \"graph_version\": 1,\n \"invocation_id\": \"17eacf2d6ba0bf36\",\n \"is_system_task\": false,\n \"namespace\": \"default\",\n \"outstanding_tasks\": 0\n}" Len = 1 Column Family: ReductionTasks Len = 0 Column Family: GraphInvocations key: "default|object_detection_workflow|17eacf2d6ba0bf36" Value: "{\n \"compute_graph_name\": \"object_detection_workflow\",\n \"id\": \"17eacf2d6ba0bf36\",\n \"namespace\": \"default\",\n \"payload\": {\n \"path\": \"file:///Users/seriousben/src/github.com/seriousben/indexify-detect-image-objects/indexify_storage/blobs/01e9c656-eb5c-4f14-a6f9-b9d1661aaaca\",\n \"sha256_hash\": \"65ea2ac07eefe9812ba95e962dcffcae10b9bf78a82309650d2884a643027d1f\",\n \"size\": 296293\n }\n}" Len = 1 Column Family: FnOutputs key: "default|object_detection_workflow|17eacf2d6ba0bf36|object_detector|d6e77e36553267be" Value: "{\n \"compute_fn_name\": \"object_detector\",\n \"compute_graph_name\": \"object_detection_workflow\",\n \"errors\": null,\n \"graph_version\": 1,\n \"id\": \"d6e77e36553267be\",\n \"invocation_id\": \"17eacf2d6ba0bf36\",\n \"namespace\": \"default\",\n \"payload\": {\n \"Fn\": {\n \"path\": \"file:///Users/seriousben/src/github.com/seriousben/indexify-detect-image-objects/indexify_storage/blobs/default.object_detection_workflow.object_detector.17eacf2d6ba0bf36.47b6599c-8824-4bb1-a9a3-a847645a1856.0\",\n \"sha256_hash\": \"85fcd79e6de768b203eea80693f9f59943b4fe19f87eb41e5db8fe1ca678f072\",\n \"size\": 890871\n }\n },\n \"reduced_state\": false\n}" Len = 1 Column Family: TaskOutputs key: "default|47b6599c-8824-4bb1-a9a3-a847645a1856|d6e77e36553267be" Value: "\"default|object_detection_workflow|17eacf2d6ba0bf36|object_detector|d6e77e36553267be\"" Len = 1 Column Family: StateChanges key: "\0\0\0\0\0\0\0\u{4}" Value: "{\n \"change_type\": {\n \"TaskFinished\": {\n \"compute_fn\": \"object_detector\",\n \"compute_graph\": \"object_detection_workflow\",\n \"invocation_id\": \"17eacf2d6ba0bf36\",\n \"namespace\": \"default\",\n \"task_id\": \"47b6599c-8824-4bb1-a9a3-a847645a1856\"\n }\n },\n \"created_at\": 1730209381002,\n \"id\": 4,\n \"object_id\": \"47b6599c-8824-4bb1-a9a3-a847645a1856\",\n \"processed_at\": 1730209381003\n}" key: "\0\0\0\0\0\0\0\u{3}" Value: "{\n \"change_type\": \"TaskCreated\",\n \"created_at\": 1730209377950,\n \"id\": 3,\n \"object_id\": \"47b6599c-8824-4bb1-a9a3-a847645a1856\",\n \"processed_at\": 1730209377951\n}" key: "\0\0\0\0\0\0\0\u{2}" Value: "{\n \"change_type\": {\n \"InvokeComputeGraph\": {\n \"compute_graph\": \"object_detection_workflow\",\n \"invocation_id\": \"17eacf2d6ba0bf36\",\n \"namespace\": \"default\"\n }\n },\n \"created_at\": 1730209377950,\n \"id\": 2,\n \"object_id\": \"17eacf2d6ba0bf36\",\n \"processed_at\": 1730209377950\n}" key: "\0\0\0\0\0\0\0\u{1}" Value: "{\n \"change_type\": \"ExecutorRemoved\",\n \"created_at\": 1730209228876,\n \"id\": 1,\n \"object_id\": \"FVeH-HtHccKoRPLJicFq1\",\n \"processed_at\": 1730209228876\n}" key: "\0\0\0\0\0\0\0\0" Value: "{\n \"change_type\": \"ExecutorAdded\",\n \"created_at\": 1730209228738,\n \"id\": 0,\n \"object_id\": \"4qcYQMcXl5HuSvh3NgErV\",\n \"processed_at\": 1730209228741\n}" Len = 5 Column Family: UnprocessedStateChanges Len = 0 Column Family: TaskAllocations Len = 0 Column Family: UnallocatedTasks Len = 0 Column Family: GcUrls Len = 0 Column Family: SystemTasks Len = 0 Column Family: Stats Len = 0 ```Storage after compute graph deletion
API and Storage dump
Files deleted by gc:
Storage:
Testing
make fmt
.pip install -e .
, start server and executor, cd topython-sdk/tests
,python test_graph_behaviours.py
.