Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load language error when using multiple langages for recognition #4284

Open
captain-yoshi opened this issue Jul 13, 2024 · 1 comment

Comments

@captain-yoshi
Copy link

Current Behavior

I get an error when trying to read a text from this image :

50uL

$ tesseract 50uL.png - -l eng+ell

Error opening data file /usr/share/tesseract-ocr/5/tessdata/grc.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'grc'
Volume: 50 pl


$ ls /usr/share/tesseract-ocr/5/tessdata/
configs  ell.traineddata  eng.traineddata  pdf.ttf  tessconfigs

Using datasets from tessdata_best.

Expected Behavior

I would expect to be able to use multiple langages like stated in the Tesseract documentation.

Suggested Fix

Is there a better way to recognize the μ greek letter when used in English texts ? Maybe I have to train a new dataset...

tesseract -v

tesseract 5.4.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
Found libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3

Operating System

Ubuntu 24.04 Noble

Other Operating System

No response

uname -a

Linux captain-yoshi 5.15.0-113-generic #123~20.04.1-Ubuntu SMP Wed Jun 12 17:33:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Compiler

No response

CPU

Intel(R) Core(TM) i7-7700HQ

Virtualization / Containers

No response

Other Information

No response

@captain-yoshi
Copy link
Author

Ok found my problem. The ell langage has a dependency for grc. All is good now :)

micro-test

Using

$ tesseract micro-test.png - -l eng+grc

Is there a better way to recognize the μ greek letter when used in English texts ? Maybe | have to train a
new dataset...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant