Use docling with spark UDF or pandas UDF but get no parallelization #943
Unanswered
qian-yu-db
asked this question in
Q&A
Replies: 1 comment 1 reply
-
I read more on the technical report. Section 3 stated that "Docling implements a linear pipeline of operations, which execute sequentially on each given document" . Shall I conclude that parallel operations of multiple pipelines for multiple documents are not supported? Thanks! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I wrap a pipeline in a pandas UDF to process a table where the column contains the path to pdf files.
The job however only ran in sequence instead of running in parallel at work nodes. Any recommendations?
Beta Was this translation helpful? Give feedback.
All reactions