Tika server 2.9.1 Pdf tesseract Ocr #406

Tarik37 · 2024-03-30T04:52:33Z

Hello,
The beginner that i am need your help, i use tika server to extract meta and text with ocr strategy auto on native pdf documents no problem as thé process Time is low but on scanned pdf files (hundreds pages) i hit the timeout of thé request throught python or curl.
Is their a way to config tika-config.yml file to make the thé ocr process all the pages with strategy auto.
Thks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tika server 2.9.1 Pdf tesseract Ocr #406

Tika server 2.9.1 Pdf tesseract Ocr #406

Tarik37 commented Mar 30, 2024 •

edited

Loading

Tika server 2.9.1 Pdf tesseract Ocr #406

Tika server 2.9.1 Pdf tesseract Ocr #406

Comments

Tarik37 commented Mar 30, 2024 • edited Loading

Tarik37 commented Mar 30, 2024 •

edited

Loading