Skip to content

Why instruction fine-tune do not need to freeze the paraments #528

Answered by rasbt
azraelxuemo asked this question in Q&A
Discussion options

You must be logged in to vote

Hi there, this is commonly done due to modeling performance reasons. For classification finetuning, which is a simpler task, you don't need to update the previous layers as you assume that the previous layers are already good for extracting general information from text. For instruction finetuning, you are changing the behavior of the LLM, hence you update more layers (and often all layers).

Appendix E discusses LoRA, which is a variant where you actually keep main model parameters frozen during instruction finetuning.

Replies: 4 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by rasbt
Comment options

You must be logged in to vote
1 reply
@rasbt
Comment options

Comment options

You must be logged in to vote
1 reply
@rasbt
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants