Figure 4.17 Explanation (4.7 Generating text) #308
-
Hi @rasbt , Could you please explain why do we choose the last vector in the output matrix from the model (box with label "GPT") in Figure 4.17 as the vector "which corresponds to the next token that the GPT model is supposed to generate"? Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Probably Section 5.1.2 explains my question in more detail, thanks a lot for it! |
Beta Was this translation helpful? Give feedback.
These are good points, and it sounds like there are two related questions. Let's talk about inference first ("generate"), which you mentioned at the top of this thread. Here, we take the last token only, because we already have the other input tokens via the provided input. E.g., consider the input
"Sunday is my favorite day of the week, because"
In this case, it would be wasteful (and error prone) to have the model regenerate the input shifted by +1 token as we do during training. Instead, we are only interested in the token that comes after "because".
Then, you mentioned
For example, as I understand, for the first token in the training sample we have corresponding target (next token), b…