Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more information on LANG_TYPE in the documentation #87

Open
giri-kum opened this issue Aug 17, 2022 · 2 comments
Open

Provide more information on LANG_TYPE in the documentation #87

giri-kum opened this issue Aug 17, 2022 · 2 comments
Labels
question Further information is requested

Comments

@giri-kum
Copy link

Is there a lookup table of LANG_TYPE for all the languages that tesseract support?

@stweil
Copy link
Contributor

stweil commented Aug 17, 2022

Please give more details. What do you mean by "lookup table of LANG_TYPE"?

@stweil stweil added the question Further information is requested label Aug 17, 2022
@giri-kum
Copy link
Author

@stweil By that I meant, what LANG_TYPE is used for each languages during training. The documentation here says that https://github.com/tesseract-ocr/tesstrain defines the LANG_TYPE which can take Indic, RTL or blank.

I assume it is blank for English, Indic for Hindi, RTL for Arabic. It would be helpful while finetuning if we have this list as a lookup table for all the traineddata files that are present in the tessdata repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants