Failed to load language error when using multiple langages for recognition #4284

captain-yoshi · 2024-07-13T01:48:29Z

Current Behavior

I get an error when trying to read a text from this image :

$ tesseract 50uL.png - -l eng+ell

Error opening data file /usr/share/tesseract-ocr/5/tessdata/grc.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'grc'
Volume: 50 pl


$ ls /usr/share/tesseract-ocr/5/tessdata/
configs  ell.traineddata  eng.traineddata  pdf.ttf  tessconfigs

Using datasets from tessdata_best.

Expected Behavior

I would expect to be able to use multiple langages like stated in the Tesseract documentation.

Suggested Fix

Is there a better way to recognize the μ greek letter when used in English texts ? Maybe I have to train a new dataset...

tesseract -v

tesseract 5.4.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
Found libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3

Operating System

Ubuntu 24.04 Noble

Other Operating System

No response

uname -a

Linux captain-yoshi 5.15.0-113-generic #123~20.04.1-Ubuntu SMP Wed Jun 12 17:33:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Compiler

No response

CPU

Intel(R) Core(TM) i7-7700HQ

Virtualization / Containers

No response

Other Information

No response

captain-yoshi · 2024-07-13T02:24:27Z

Ok found my problem. The ell langage has a dependency for grc. All is good now :)

Using

$ tesseract micro-test.png - -l eng+grc

Is there a better way to recognize the μ greek letter when used in English texts ? Maybe | have to train a
new dataset...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to load language error when using multiple langages for recognition #4284

Failed to load language error when using multiple langages for recognition #4284

captain-yoshi commented Jul 13, 2024

captain-yoshi commented Jul 13, 2024

Failed to load language error when using multiple langages for recognition #4284

Failed to load language error when using multiple langages for recognition #4284

Comments

captain-yoshi commented Jul 13, 2024

Current Behavior

Expected Behavior

Suggested Fix

tesseract -v

Operating System

Other Operating System

uname -a

Compiler

CPU

Virtualization / Containers

Other Information

captain-yoshi commented Jul 13, 2024