Skip to content

How does llama.cpp use UNK token in generation? #4468

Answered by cmp-nct
m1chae1bx asked this question in Q&A
Discussion options

You must be logged in to vote

Did you ever see it used ?

Mistral, llama2, Falcon they all use BPE tokenization so they are not really short of expression.
UNK is supposed to be used for unknown words that can not be tokenized, with BPE you can tokenize everything and if something can not be tokenized llama.cpp currently crashes :) So no UNK there.

I don't know for certain but my guess is that UNK is mostly a relict of older smaller language models.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by m1chae1bx
Comment options

You must be logged in to vote
1 reply
@ponikar
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants