Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies not installed #552

Closed
DoomedJupiter opened this issue Mar 4, 2025 · 7 comments · Fixed by #554
Closed

Dependencies not installed #552

DoomedJupiter opened this issue Mar 4, 2025 · 7 comments · Fixed by #554
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@DoomedJupiter
Copy link
Collaborator

DoomedJupiter commented Mar 4, 2025

Describe the bug

It appears that installing the package camelot does not install one dependency (pillow/PIL) which prevents the package from working "out of the box". Installing camelot into a fresh venv and running the README example results in the following error error: Ghostscript is not installed.. This error appears to be the result of not having the pillow package installed.

Discussion

I read the installation instructions and the associated Installing the dependencies page which indicates that pdfium has replaced ghostscript as the default image conversion backend. I assume the intent here is to allow for using the camelot package without having to install ghoscript.

After some digging it appears that camelot defaults to the pypdfium backend, and if that doesn't work, falls back to ghostscript backend. It appears that pypdfium, as used by camelot, requires pillow but the pypdfium package does not install this dependency and instead relies on users to install it if they need it (see here).

It looks like adding pillow as a dependency for camelot will resolve this issue.

Note also that pillow is a dependency for matplotlib, so if there is a way to install camelot with matplotlib as a dependency, then this issue won't be observed.

Steps to reproduce the bug

  1. Create new venv
  2. run pip install camelot-py[base]
  3. run README example
  4. observe error error: Ghostscript is not installed.

Expected behavior

  1. Create new venv
  2. run pip install camelot-py[base]
  3. run README example
  4. observe results as described in the README (works "out of the box")

Code

from README example

PDF

from README example

Screenshots

N/A

Environment

  • OS: Windows 11
  • Python version: 3.12
  • Numpy version: 2.2.3
  • OpenCV version: 4.11.0.86
  • Ghostscript version: N/A
  • camelot version: 1.0.0

Additional context

@bosd
Copy link
Collaborator

bosd commented Mar 4, 2025

Added help wanted label, if someone wants to make a pr for this.

@DoomedJupiter
Copy link
Collaborator Author

@bosd Looks like you're the only one running around here. If you add me, I can scrub the issues and prs to see which ones can be closed. There is a lot of overlap between this repo and the previous pypdf-table-extraction and it appears that some stuff here has been merged through that project or is no longer applicable. I can also help with documentation cleanup. I'm new to package dev so not quite looking to make major changed but maybe if I become more familiar with the code I can work up to that.

this project was helpful for me and I'm looking to pay forward and learn some new things.

@bosd
Copy link
Collaborator

bosd commented Mar 6, 2025

@DoomedJupiter
Well, I'm not the only one running around here. But we are a very tiny team, with now and then contributors.
So really appreciate the help and extra hands.

I know it is a lot of work to sync the repos. When we transferred to the pypdf-table-extraction and had to sync with this and the old repo .

Housekeeping is a very important task.

I just discovered, that I don't have enough permissions yet to add maintainers.
See py-pdf#318
Will contact @vinayak-mehta because right now he is the single person having enough permissions over here.
Which is quite a risk for the project, in case he got's hit by the preverbial "bus".

Once that is settled, we will archive the pypdf-table-extraction repo.

@vinayak-mehta
Copy link
Member

Sorry for the late response here. Just made @bosd an owner in the organization so he should have all the required permissions, sorry for being the bottleneck here. I would love for the development to continue in this organization 🙏🏽

@bosd
Copy link
Collaborator

bosd commented Mar 10, 2025

@DoomedJupiter I've elevated your rights. You can now maintain issues and pr's.

@DoomedJupiter
Copy link
Collaborator Author

@bosd Thanks! Is it best to mention other maintainers in various issues/prs if I have questions about the history or past decisions? Also, is there an offline means of discussion regarding design decisions, roadmap, etc? I'd like to clean up some of the issues/prs but I don't want to just go start closing stuff without everyone else being in the loop.

@bosd
Copy link
Collaborator

bosd commented Mar 10, 2025

@DoomedJupiter
Yeah, you can go ahead and mention others in issues/pr's.
There is currently no offline means for discussions and decision. It's all open source so github is our main mains of communication.

Currently there is no roadmap I'm aware of. Good idea to start one.

Well, There is a lot of backlog. So maybe better ask for forgiveness then permission.
We can always re-open.

@bosd bosd closed this as completed in #554 Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
3 participants