Skip to content

Doctag miss content within <formula> tags #1008

Discussion options

You must be logged in to vote

Hello,

We've updated our system to include a new Formula Model, which converts images of formulas into well-formatted LaTeX representations. This model is disabled by default. To enable it, you need to modify the pipeline options as follows:

from docling.datamodel.pipeline_options import PdfPipelineOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.generate_page_images = True
pipeline_options.do_formula_enrichment = True

With the Formula Model turned off, we no longer include any formula data in the exports, unlike before where we provided a highly inaccurate representation. The previous output came straight out of the PDF parser and often resulted in significant losses, su…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Matteo-Omenetti
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants