Why instruction fine-tune do not need to freeze the paraments #528
-
Hi, I have a simple question. I am not sure why instruction fine-tune here do not freeze the basic paraments. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
Hi there, this is commonly done due to modeling performance reasons. For classification finetuning, which is a simpler task, you don't need to update the previous layers as you assume that the previous layers are already good for extracting general information from text. For instruction finetuning, you are changing the behavior of the LLM, hence you update more layers (and often all layers). Appendix E discusses LoRA, which is a variant where you actually keep main model parameters frozen during instruction finetuning. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response. |
Beta Was this translation helpful? Give feedback.
-
In short, thank you very much for your help. Also, I have another question, could you please help me? |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer, it has been very helpful to me. Thank you very much |
Beta Was this translation helpful? Give feedback.
Hi there, this is commonly done due to modeling performance reasons. For classification finetuning, which is a simpler task, you don't need to update the previous layers as you assume that the previous layers are already good for extracting general information from text. For instruction finetuning, you are changing the behavior of the LLM, hence you update more layers (and often all layers).
Appendix E discusses LoRA, which is a variant where you actually keep main model parameters frozen during instruction finetuning.