How to question: OCR only part of scanned image #34

LostAccount · 2021-01-01T20:58:25Z

Hello

I have only started using tesseract with ocrmypdf.

I issue a command like this ocrmypdf input_pdf_or_image output_pdf

This is not a ocrmypdf question.

Question
Is there any way to mask or draw a bounding box around a scanned images that will be intercepted as a region in the image that tesseract will know to ignore? Some of my scanned images have graphics or tables in them that get OCR'd and although appreciated I would rather exclude these because the resulting PDF will contain selectable text which is unwanted.

Any ideas or potential solutions would be very much appreciated as I have been trying to find workarounds for days now.

➜ ~ tesseract --version
tesseract 4.1.1
leptonica-1.80.0
libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
Found AVX
Found SSE

Kind regards
—Alex

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to question: OCR only part of scanned image #34

How to question: OCR only part of scanned image #34

LostAccount commented Jan 1, 2021 •

edited

Loading

How to question: OCR only part of scanned image #34

How to question: OCR only part of scanned image #34

Comments

LostAccount commented Jan 1, 2021 • edited Loading

LostAccount commented Jan 1, 2021 •

edited

Loading