Add method to return attn-mask for HF Tokenizer. #60

cptspacemanspiff · 2025-02-11T23:06:12Z

So, I have been doing batched inference of things that use HF tokenizer. Your library is great, but does not expose the attention masks, which are useful when some of the inputs/outputs are padding.

This adds an additional tokenize method to the HF Tokenizer that returns both the token_ids and the attention masks.

It depends on my previous pull request, which separates out the hf cpp header declarations and the implementations.

#57

cptspacemanspiff added 4 commits January 25, 2025 15:26

Add HFTokenizerHeader

2e4f353

Moved the hf tokenizer defs to the header.

434e6b2

Added factories to HFTokenizer, Tokenizer factories call them.

6be4671

Added additional api for attn with masks.

81b1ca7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to return attn-mask for HF Tokenizer. #60

Add method to return attn-mask for HF Tokenizer. #60

cptspacemanspiff commented Feb 11, 2025

Add method to return attn-mask for HF Tokenizer. #60

Are you sure you want to change the base?

Add method to return attn-mask for HF Tokenizer. #60

Conversation

cptspacemanspiff commented Feb 11, 2025