Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative fix for bug when pickling function from unloaded module #536

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

leogama
Copy link
Contributor

@leogama leogama commented Aug 8, 2022

This patch modifies _locate_function() and save_function() to allow dill to save a function from a module that was unloaded, i.e. deleted from the sys.modules dictionary, by reference (if it would be saved as global had the module not been unloaded). Note: this is independent of the shared namespace problem.

It may be possible to do the same for classes and methods, but these cases are more complex.

Fixes: #532

@anivegesana
Copy link
Contributor

anivegesana commented Aug 9, 2022

I am confused about the purpose of caching the namespace. I have seen situations similar to #532 in popular packages (torch.package) but never for standard library modules, probably because you shouldn't change them, so there is no reason to reload them in the first place.

Plus, what if someone changes a file and calls importlib.reload to forcibly reload a package with new source code. The cached namespace wouldn't be updated and would cause confusion. Is there any reason that this should be included? If so, there should be a way to turn it off.

@leogama
Copy link
Contributor Author

leogama commented Aug 9, 2022

I am confused about the purpose of caching the namespace. I have seen situations similar to #532 in popular packages (torch.package) but never for standard library modules, probably because you shouldn't change them, so there is no reason to reload them in the first place.

I used a stdlib module for demonstration, but this mechanism would be triggered mainly by third-party packages, as in the example you mentioned. Reloaded modules are not covered by this patch, only "unloaded" ones. With a reloaded module, modified or not, dill (pickle) would still raise an "object is not the same as XXX.YYY" error.

Plus, what if someone changes a file and calls importlib.reload to forcibly reload a package with new source code. The cached namespace wouldn't be updated and would cause confusion.

This conflict would only happen with a series of events like:

import importlib
import sys
import dill
from module import func
dill.dumps(func)  # save as global
del sys.modules['module']
dill.dumps(func)  # add 'module' to _modules_copy's cache, save as global (this is the difference in behavior)
#--- Modify func in module.py ---#
import module  # re-import into sys.modules (creates a new module instance)
importlib.reload(module)  # required to update module.func?
dill.dumps(func)  # func is not module.func, raise error
del sys.modules['module']
dill.dumps(func)  # func is different from cached _modules_copy('module').func, save by value

In this last statement, the functions wouldn't match in the _locate_function() test and the logic would fall back to the old behavior (the function would be saved by value). The namespaces are cached because they are copies independent of the loaded modules and so that a new copy of the unloaded module isn't created for every object. Not caching would matter only in that case of "load -> import object -> unload -> pickle object -> reload with changes -> unload -> pickle object". The cache could be associated with the Pickler instance, although it wouldn't have effect for a sequence of calls to dill.dump[s] then.

Note: I have no preference for this or any of the other proposed solutions, I'm mostly throwing things against the wall.

@mmckerns mmckerns changed the title Alternative fix for bug when pickling function from unloaded module (fixes #532) Alternative fix for bug when pickling function from unloaded module Aug 13, 2022
@mmckerns mmckerns self-requested a review August 13, 2022 16:17
@mmckerns mmckerns added this to the dill-0.3.7 milestone Oct 23, 2022
@mmckerns mmckerns modified the milestones: dill-0.3.7, dill-0.3.8 Feb 5, 2023
@mmckerns mmckerns modified the milestones: dill-0.3.8, dill-0.3.9 Nov 14, 2023
@mmckerns mmckerns modified the milestones: dill-0.3.9, dill-0.4.0 Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__globals__ of function in deleted module restored inconsistently
3 participants