Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. #33498

Open
1 of 4 tasks
asmith26 opened this issue Sep 15, 2024 · 8 comments · Fixed by #33509 · May be fixed by #35753
Labels

Comments

@asmith26
Copy link

asmith26 commented Sep 15, 2024

System Info

  • transformers version: 4.44.2
  • Platform: Linux-6.8.0-44-generic-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • Huggingface_hub version: 0.24.7
  • Safetensors version: 0.4.5
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.1+cu121 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No

Who can help?

speech models: @ylacombe, @eustlb
pipelines: @Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch 
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-base.en",
    device="cpu",
    torch_dtype=torch.float32,
)

# https://github.com/openai/whisper/blob/main/tests/jfk.flac
pipe("./jfk.flac")

Expected behavior

This does return the expected:

{'text': ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.'}

But it also prints the following, so would be nice to fix/suppress:

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Thanks!

@asmith26 asmith26 added the bug label Sep 15, 2024
@asmith26
Copy link
Author

Related: openai/whisper#2335

@Rocketknight1
Copy link
Member

@asmith26 thanks for the issue! I've reproduced it here, will open a PR to fix in a sec.

@ritwikmishra
Copy link

I observed this when I was finetuning a LLM with ppo trainer. To resolve this warning I passed the attention mask as a named parameter to the generate function following this.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask,
  pad_token_id=tokenizer.eos_token_id
)

But then I observed an error which stated, "IndexError: too many indices for tensor of dimension 1" on the line of

lib/python3.9/site-packages/transformers/models/gemma/modeling_gemma.py
position_ids_expanded = position_ids[:, None, :].float() # let us call this line_e

I turned off the attention mask and using print statements before that line_e I inspected what is the ideal behavior of this line_e. The original warning was coming but i ignored it. I saw that position ids are being fed one by one. So to resolve this error I just unsqueezed the attention mask.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask.unsqueeze(0),
  pad_token_id=tokenizer.eos_token_id
)

and it worked fine.

@asmith26
Copy link
Author

asmith26 commented Nov 4, 2024

Thanks for your help with this @Rocketknight1. Just thought I'd mention I still seem to be getting the same warning (I'm currently running transformers == 4.47.0.dev0).

Thanks again!

@Rocketknight1
Copy link
Member

@asmith26 I'm not getting that warning when I run the code sample above anymore. Did you change anything about it?

@asmith26
Copy link
Author

asmith26 commented Nov 5, 2024

Interesting, thanks for the info @Rocketknight1

I've determined that if I add a chunk_length_s=30 (i.e. outputs = pipe("./jfk.flac", chunk_length_s=30) following this tutorial), I get The attention mask is not set and....

Happy to remove this argument for my need. Thanks again! :)

@Rocketknight1
Copy link
Member

That's still potentially an issue we should address, though! Even though you've found a fix, I'll reopen to make sure we don't lose track

@lolbus
Copy link

lolbus commented Jan 11, 2025

I observed this when I was finetuning a LLM with ppo trainer. To resolve this warning I passed the attention mask as a named parameter to the generate function following this.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask,
  pad_token_id=tokenizer.eos_token_id
)

But then I observed an error which stated, "IndexError: too many indices for tensor of dimension 1" on the line of

lib/python3.9/site-packages/transformers/models/gemma/modeling_gemma.py
position_ids_expanded = position_ids[:, None, :].float() # let us call this line_e

I turned off the attention mask and using print statements before that line_e I inspected what is the ideal behavior of this line_e. The original warning was coming but i ignored it. I saw that position ids are being fed one by one. So to resolve this error I just unsqueezed the attention mask.

outputs = model.generate(
  inputs['input_ids'], 
  attention_mask=attention_mask.unsqueeze(0),
  pad_token_id=tokenizer.eos_token_id
)

and it worked fine.

This is for LLM, for ASR, I dont think we declare the same tokenizer as yours as the tokenizer are well defined within the ASR such as whisper.

@eustlb eustlb reopened this Jan 15, 2025
@huggingface huggingface deleted a comment from github-actions bot Jan 15, 2025
@eustlb eustlb linked a pull request Jan 17, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants