Avoid extra indirections in Python bindings #233

Ahajha · 2025-03-09T22:06:18Z

This is a neat project! I have a personal interest in Python bindings, and I see some room for improvement that should give a performance bump and memory savings. View this issue as more of a planning doc for me to dump these ideas I have - they're all vaguely connected to the same core idea of reducing indirection.

In the current state of things, we have several levels of indirection when we want to call a method bound in C++. I will use Grammar.union as an interesting example, which in Python is defined as follows:

grammar_handles = [grammar._handle for grammar in grammars]
return Grammar._create_from_handle(_core.Grammar.union(grammar_handles))

I mainly use this as an example of some low-hanging fruit, various other methods might be more complicated (and thus more difficult to improve), or less complicated (and thus have less headroom to improve). Though just about the entire API surface can have most of these improvements apply, if not all.

Call the method written in Python
Look up all of the underlying _handles of the input
Create an intermediate list of these results
Call the underlying C++ function - this involves converting the intermediate list to a std::vector<Grammar>. The underlying objects are "copied" (which are really just std::shared_ptrs).
This then calls the underlying PIMPL function
The result is wrapped in a shared_ptr and returned as a C++ Grammar
The C++ result is wrapped in a Python object and returned to Python
The Python handle is then passed to _create_from_handle to create another Python object, which is returned as the final output.

I am proposing that this can be cut down to:

Call the underlying C++ function - This involves creating an intermediate vector of references to all of the Grammars in the input list
The function (core logic) is called (no PIMPL)
The result is wrapped in a Python object, which is returned to Python

The memory layout is also worth looking at. Currently, to get from the top level Python Grammar object to the underlying C++ data, we need the following steps:

Access the _handle field of the Python object, which references another Python object
Look into the pybind11 internals of the object - by default this holds a std::unique_ptr to the underlying C++ object
Dereference the unique_ptr to get the C++ class
Dereference the shared_ptr in the C++ class to get to the PIMPL class, which contains the actual data.

I proposing that this can be cut down to:

Access the data directly in the Python object

I also propose that these changes can be made with minimal to no impact on the Python API, though with some modification to the C++ API, mainly with respect to ownership.

The main steps that are involved to achieve this are as follows (these can likely be done in any order, aside from the conversion to nanobind which should be done first)

Convert to nanobind. The main win that this gives is that it removes the unique_ptr indirection that pybind11 adds. This in theory should be a small performance bump - see Convert from pybind11 to nanobind #230 where I am implementing this step (so far performance is about on par with current, but that is okay)
Remove the wrapper classes written in Python and allow the C++ bindings to be used directly. The main challenge here comes with documentation - pybind11 has some tools for generating stubfiles but my knowledge is that nanobind's is better (and built-in/purpose-built). The documentation (docstrings, typing, default arguments, etc.) will all be written in C++. We would probably need an initial step to add stubfile generation, and then we can incrementally move classes or functions to C++, and then just re-export them where they are now. Note that I am not proposing to rewrite all of the Python code in C++ - just the wrapper classes where there isn't any real logic other than calling into the bindings.
Remove use of PIMPL. This one could be controversial - I currently see the use of PIMPL in these classes as a crutch to work around ownership issues, especially since these are shared_ptrs instead of unique_ptrs. We can reshape things so that the underlying PIMPL class becomes the 'real' C++ class, and then where we would need some sort of reference counting (to avoid copies), we would use nanobind::intrusive_base. This is comparable to std::enable_shared_from_this, in that it bakes a reference count into the class, but it goes one step further and actually allows this reference count to be used by both C++ and Python - this allows the C++ data to live directly in the Python object instead of within a shared_ptr. We can access this on the C++ side through a nanobind::ref, which is similar to a shared_ptr.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid extra indirections in Python bindings #233

Avoid extra indirections in Python bindings #233

Ahajha commented Mar 9, 2025

Avoid extra indirections in Python bindings #233

Avoid extra indirections in Python bindings #233

Comments

Ahajha commented Mar 9, 2025