Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Split Pantograph and PyPantograph #75

Open
RexWzh opened this issue Feb 5, 2025 · 8 comments
Open

Feature Request: Split Pantograph and PyPantograph #75

RexWzh opened this issue Feb 5, 2025 · 8 comments
Labels
help wanted Extra attention is needed

Comments

@RexWzh
Copy link

RexWzh commented Feb 5, 2025

Proposal

Split the project into two components by a caching mechanism instead of using git submodule:

  1. Core REPL (Pantograph)
  2. Python wrapper (PyPantograph)

Implementation details

Cache dir for different systems:

PANTOGRAPH_CACHE_DIR = Path(platformdirs.user_cache_dir("pantograph"))

Use the cached proc in self.proc_path:

pexpect.spawn(f"{self.proc_path} {self.args}", cwd=self.cwd,...)

Planned cli tools:

  • pantograph list: Show available versions
  • pantograph install: Install specific version
  • pantograph update: Update to the latest tag
  • pantograph default: Set default version if the proc_path is not specified
    ...

Looking for feedback and suggestions on implementation. I would be happy to contribute to this effort :)

@lenianiva
Copy link
Member

How would this work when users want new features and they have to be developed in lockstep on upstream Pantograph and downstream PyPantograph?

@lenianiva lenianiva added the help wanted Extra attention is needed label Feb 5, 2025
@RexWzh
Copy link
Author

RexWzh commented Feb 5, 2025

For new features:
We'll tag Pantograph versions to match Lean releases like other Lean tools (REPL/Jixia): old versions keep their features, while new stuff goes into the latest version only.

Addressing the lockstep concern:
Once things are stable, we should keep the development backward compatible. Users won't need to match versions exactly. The latest PyPantograph code should work with a broad range of Pantograph versions (like >=4.14.0). That is, one need only to use the latest PyPantograph.

@lenianiva
Copy link
Member

lenianiva commented Feb 10, 2025

For new features: We'll tag Pantograph versions to match Lean releases like other Lean tools (REPL/Jixia): old versions keep their features, while new stuff goes into the latest version only.

Addressing the lockstep concern: Once things are stable, we should keep the development backward compatible. Users won't need to match versions exactly. The latest PyPantograph code should work with a broad range of Pantograph versions (like >=4.14.0). That is, one need only to use the latest PyPantograph.

What's the advantage of this system compared to pinning the Git commit on the user's end? I feel like this will involve a lot of testing overhead (i.e. each commit of PyP would need to be tested against multiple versions of P)

@RexWzh
Copy link
Author

RexWzh commented Feb 11, 2025

What's the advantage of this system compared to pinning the Git commit on the user's end?

Git submodules might not be the most convenient approach when publishing to PyPI, since the submodule code would be installed to site-packages/. Perhaps we could separate the non-Python code into a cached directory structure?
LeanDojo offers an example of this pattern - it stores traced repos in ~/.cache/lean_dojo/ for interactions.

each commit of PyP would need to be tested against multiple versions of P

In the current setup with submodules, each PyPantograph version is only tested against its corresponding Pantograph version. For version 4.x.x, users only need the latest Pantograph of that version, rather than maintaining exact commit matches.

This means we only need to ensure:

  • Latest PyP is compatible with current P
  • Tagged PyP versions work with their corresponding tagged P versions

Compared to submodules (which enforce one-to-one version mapping), this separate caching approach offers more flexibility. Users can still opt for tag-to-tag correspondence if desired.

As for the latest PyPantograph code should work with a broad range of Pantograph versions - this is perhaps better viewed as a development best practice rather than a strict requirement.

For instance, LeanDojo is version-agnostic and compatible with a wide range of Lean versions. While Pantograph may undergo significant changes, we can still aim for compatibility at the tagged version level as a minimum standard.

@lenianiva
Copy link
Member

lenianiva commented Feb 18, 2025

so we would have a manifest/ folder storing the commit hash for each pinned version, and the build.py script would download it by reading some environment variable?

In the current setup the compiled upstream executable is bundled together with the rest of the Python code in this library, so I don't know how you can get pantograph {install,list,update,default} implemented unless one of these happens:

  1. We have a meta tool (like elan is to lean) to manage pantograph versions
  2. We bundle the manifests in the built wheel, and require the user to have both lean and lake and likely elan installed on the machine that uses Pantograph
  3. We bundle the compiled executables of all manifest versions in the wheel.

None of these options seem particularly appealing

@RexWzh
Copy link
Author

RexWzh commented Feb 18, 2025

This is more like a requirement for version-free features.

The main thought is that users can use PyPantograph without being tied to a specific Lean version. They would then have two options:

  1. Specify a Pantograph version to load from cache
  2. Specify an exact path for development setup

I think it is workable since I've written sth similar using Click for installation scripts.

@lenianiva
Copy link
Member

This is more like a requirement for version-free features.

The main thought is that users can use PyPantograph without being tied to a specific Lean version. They would then have two options:

1. Specify a Pantograph version to load from cache

2. Specify an exact path for development setup

I think it is workable since I've written sth similar using Click for installation scripts.

Does this mean the version can be switched at runtime? If the only requirement is that the user can build wheels with different Lean versions, the simplest solution is to use an environment variable containing the preferred manifest version. If you want multiple versions to coexist in the same installation its harder and I don't see the benefit of doing so

@lenianiva
Copy link
Member

The opinions of people on Lean zulipchat seem to contradict this feature request: https://leanprover.zulipchat.com/#narrow/channel/219941-Machine-Learning-for-Theorem-Proving/topic/Incentives.20.26.20sorry-filling.20leaderboard/near/501438850

REPL, Mathlib, Aesop, etc. use tag based versioning, where each tag supports just one version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants