Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searchability problem #27

Open
aditya-shrivastavv opened this issue Aug 21, 2024 · 1 comment
Open

Searchability problem #27

aditya-shrivastavv opened this issue Aug 21, 2024 · 1 comment

Comments

@aditya-shrivastavv
Copy link

I think I solves the problem but simultaneously creates one too. (correct me if I am wrong) This approach converts PDF pages to images and sends it to DLP API, Then DLP does its work and returns back the redacted images. Then we combine those images again to PDF. Right??

But the PDF is no longer searchable. It results in a loss of the original text data. The resulting PDF will not be readable by ATS, as ATS systems typically require text to be present, not images. Conversion to other formats like Word or plain text will not work as expected, as the text content is no longer available in its original form.

Am I right?

If this is a legit problem, I think I have a solution.

@felimartina
Copy link
Collaborator

Hey @aditya-shrivastavv - apologies for the slow reply here!

That is definetely one of the drawbacks of the current implementation. I'm curious to hear your thoughts on how to overcome this challenge.

Please feel free to share more here and we can start a discussion on the topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants