Replies: 7 comments
-
@aropb I've just added documentation, see here https://github.com/UglyToad/PdfPig/wiki/Images#additional-filters |
Beta Was this translation helpful? Give feedback.
-
Thanks. Please tell me, taking this into account, wouldn't it be more reliable to render such pdf files into an image? I'm doing this now, but which solution is more hopeful and universal? "Since PDF content may define many different ColorSpaces for rendering not all of these are yet supported by PdfPig. Where the ColorSpace is common, e.g. DeviceGray, DeviceRGB, DeviceCMYK decoding of the image to a PNG is supported. Other ColorSpaces are either not supported or only have partial support. IPdfImage defines the ColorSpace and ColorSpaceDetails properties for more information of the active ColorSpace when this image was rendered to the page." |
Beta Was this translation helpful? Give feedback.
-
@aropb it depends what you are trying to do? Are you trying to render the document pages as images? If yes, you can use https://github.com/BobLd/PdfPig.Rendering.Skia
I have removed this part of the documentation as this is not accurate anymore |
Beta Was this translation helpful? Give feedback.
-
I need to extract text with maximum quality and accuracy, and pdfs can be very different, I don't know which ones in advance. These can be text documents or even scans of books. |
Beta Was this translation helpful? Give feedback.
-
Am I guaranteed to get all the images using filters now? Next, I use Tesseract OCR to convert to text. Before that, there were cases when the image format was not suitable for him and he gave an error. In addition, I see that I need to extract text from a pdf if possible, if it is there, it is more reliable and faster. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot! https://github.com/UglyToad/PdfPig/wiki/Images#additional-filters I did as you wrote above, but this PDF's doesn't work: |
Beta Was this translation helpful? Give feedback.
-
@aropb I believe all the issues are not fixed (see BobLd/UglyToad.PdfPig.Filters.Jbig2.PdfboxJbig2#2). I will mark the discussion as closed. Feel free to reopen if need be |
Beta Was this translation helpful? Give feedback.
-
Here is an example document.
39. Trade Finance on the Blockchain.pdf
Page.GetImages() - does not return an image inside a pdf. Is there any way to extract it now?
Beta Was this translation helpful? Give feedback.
All reactions