Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingestion] trying to retry a job #1982

Open
sticky-note opened this issue Feb 17, 2025 · 2 comments
Open

[Ingestion] trying to retry a job #1982

sticky-note opened this issue Feb 17, 2025 · 2 comments

Comments

@sticky-note
Copy link

Describe the bug
When we try to restart failed job, we are facing this issue in r2r v3.4.0

500: Error during ingestion: Error parsing document: cannot access local variable 'lobject' where it is not associated with a value
Traceback (most recent call last):
File "/app/core/providers/database/files.py", line 224, in _read_lobject
raise R2RException(
shared.abstractions.exception.R2RException: Large object 352775 not found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/core/main/services/ingestion_service.py", line 236, in parse_file
await self.providers.database.files_handler.retrieve_file(
File "/app/core/providers/database/files.py", line 153, in retrieve_file
file_content = await self._read_lobject(conn, oid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/core/providers/database/files.py", line 252, in _read_lobject
await conn.execute("SELECT lo_close($1)", lobject)
^^^^^^^
UnboundLocalError: cannot access local variable 'lobject' where it is not associated with a value

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/core/main/orchestration/hatchet/ingestion_workflow.py", line 108, in parse
async for extraction in extractions_generator:
File "/app/core/main/services/ingestion_service.py", line 286, in parse_file
raise R2RDocumentProcessingError(
shared.abstractions.exception.R2RDocumentProcessingError: Error parsing document: cannot access local variable 'lobject' where it is not associated with a value

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/hatchet_sdk/worker/runner/runner.py", line 139, in inner_callback
output = task.result()
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/hatchet_sdk/worker/runner/runner.py", line 265, in async_wrapped_action_func
raise e
File "/usr/local/lib/python3.12/site-packages/hatchet_sdk/worker/runner/runner.py", line 243, in async_wrapped_action_func
return await action_func(context)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/core/main/orchestration/hatchet/ingestion_workflow.py", line 300, in parse
raise HTTPException(
fastapi.exceptions.HTTPException: 500: Error during ingestion: Error parsing document: cannot access local variable 'lobject' where it is not associated with a value

To Reproduce
Steps to reproduce the behavior:

  1. Import hundred of thousand files,
  2. Bring down service used for ingestion ( ollama/mbxai-embed/large for example )
  3. Jobs are going to fail
  4. Bring up service used for ingestion
  5. retry job
  6. See error

It is just one manner to reproduce, we faced this issue many times

Expected behavior
The job retry happens succesfully

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: debian docker
  • Browser [e.g. chrome, safari] Not relevant
  • Version [e.g. 22] Not relevant
@sticky-note
Copy link
Author

Culprits are files that was uploaded without content

@TheMcSebi
Copy link

TheMcSebi commented Feb 27, 2025

For me this was not the case, the file i provided the path to wasn't empty.
Windows / Python 3.12
Tried latest commit as well as latest tag

Edit: My problem lies deeper. Will create an own issue for this.

Edit 2: Fixed it. It was because the database user I created for R2R did not have permission on table pg_largeobject. Executing GRANT SELECT ON pg_largeobject TO r2ruser; inside the database I created for R2R did the trick. Previously the _read_lobject function wasn't able to SELECT EXISTS(SELECT 1 FROM pg_largeobject WHERE loid = $1) with the oid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants