Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

past_key_value(s) name inconsistency causing problems #36290

Open
2 of 4 tasks
HDCharles opened this issue Feb 19, 2025 · 0 comments
Open
2 of 4 tasks

past_key_value(s) name inconsistency causing problems #36290

HDCharles opened this issue Feb 19, 2025 · 0 comments
Labels

Comments

@HDCharles
Copy link

HDCharles commented Feb 19, 2025

System Info

  • transformers version: 4.50.0.dev0
  • Platform: Linux-6.4.3-0_fbk14_zion_2601_gcd42476b84e9-x86_64-with-glibc2.34
  • Python version: 3.12.9
  • Huggingface_hub version: 0.28.1
  • Safetensors version: 0.5.2
  • Accelerate version: 1.4.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0.dev20241112+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: yes
  • GPU type: NVIDIA H100

Who can help?

@ArthurZucker probably others

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

run https://huggingface.co/docs/transformers/main/en/quantization/torchao

Expected behavior

no error


this error is related to #36289

a bunch of models use past_key_value and past_key_values interchangeably, this causes issues since kwarg names are hardcoded to be skipped by the _skip_keys_device_placement attribute. This causes issues any time torch.compile is used with a model that has this issue.
The above PR fixes the issue for llama but other models like src/transformers/models/moonshine/modeling_moonshine.py
src/transformers/models/mistral/modeling_mistral.py
src/transformers/models/emu3/modeling_emu3.py
...etc, have the same issue which is actually breaking CI for that PR.

this is also the cause of pytorch/ao#1705 which is where this was first surfaced.

is there a reason for these two names to be used instead of just one? If not it seems like they should be entirely consolidated to avoid such issues, if so, then _skip_keys_device_placement needs to include both across all models or

@HDCharles HDCharles added the bug label Feb 19, 2025
@HDCharles HDCharles changed the title past_key_value name consistency past_key_value(s) name inconsistency causing problems Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant