Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving the original model file metadata might be needed #41

Open
paboum opened this issue Oct 11, 2023 · 3 comments
Open

Saving the original model file metadata might be needed #41

paboum opened this issue Oct 11, 2023 · 3 comments

Comments

@paboum
Copy link

paboum commented Oct 11, 2023

I like the idea of pruning models but I've got into some trouble because of running Autoprune:

  1. Since Civitai doesn't autoprune all their models, I can't now easily use https://github.com/zixaphir/Stable-Diffusion-Webui-Civitai-Helper.git to update Loras.
  2. Since Civitai only uses a hash of the whole file in their search API (see [Bug]: LoRA hashes are not using correct hash function civitai/civitai#742 ), now I can't also simply recollect the original model file to fix issue no 1.
  3. Since nobody enforces a standard, in which the model would include a manifest with the original filename, original weights hash or even the author's homepage, I can't really rely on any automated method of recovering the original files.

For these reasons I suppose Autoprune should at least store original size, hash etc. into a separate directory (or even have an option of saving the original files, if not the "delta" information that could be used to un-prune models) so that nobody runs into such issues again. Perhaps the dev teams could also collaborate on integrating the plugins better, e.g. if Autoprune somehow "marked" the pruned model file, then the Civitai Helper would know where to look for the original file hash and succeed with its search.

@arenasys
Copy link
Owner

The hash changes if you replace the VAE, or fix clip positions, remove an embedded controlnet, remove random pytorch lightning keys, remove EMA, convert to safetensors, etc, etc. In all these cases the model remains the same to the user, its the same model just with less junk, yet the hash is completely different. So really model hash's are a ridiculous method of identifying models, the model name should be enough (posters just need to take 2 seconds to name their models).

On a note, metadata can be embedded into safetensor files, this is a feature of the safetensor standard, though its not used at all in the SD community. Instead we opt to do stupid things like make .yaml config files with the same name as the model that users have to make sure to download or its completely broken, etc.

@paboum
Copy link
Author

paboum commented Oct 12, 2023

Community will always do whatever is easiest and works unless they are forced to do things right. Could then Model Toolkit offer to embed the metadata saying "Pruned by Model Toolkit version cf82458; original size: x bytes; original hash: xyz"? And create the new file with "_pruned" infix?

File names are apparently not a good enough method of identifying Loras, as two authors can name their files the same way, right and put them in different repositories so the filename conflict isn't detected automatically? Or they reuse the name when creating new version of the same Lora. And Civitai doesn't allow filename search in their API (https://github.com/orgs/civitai/discussions/183) so I'm having a hard time trying to find the Loras I use in Civitai after pruning.

Civitai isn't the only model repository, the same models happen to coexist on other sites under different filenames. Since neither hash or filename are perfect, the tools for creating models should just put UUID inside while generating. I've suggested it (bmaltais/kohya_ss#1601) and hope Model Toolkit will preserve it while pruning.

@arenasys
Copy link
Owner

UUID is probably the best technical solution, but i wouldn't count on training software. If Civitai started to embed a UUID derived from the model page ID it would very quickly solve this issue. Though naming could also naturally fix itself as the struggle becomes more common (like how pruning became more common). Posters can just include their name (and version if applicable). I'll patch the toolkit to preserve safetensor metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants