huggingface · hesamsheikh · Mar 4, 2025
diff --git a/chapters/en/chapter12/5.mdx b/chapters/en/chapter12/5.mdx
@@ -41,7 +41,7 @@ import wandb
 wandb.login()
 ```
 
-You can do this exercise without logging in to Weights & Biases, but it's recommended to do so to track your experiments and interpret the results.
+You can do this exercise without logging in to Weights & Biases, but it's recommended so you can track your experiments and interpret the results.
 
 ## Load the dataset
 
@@ -164,7 +164,7 @@ As you can see, the reward from the reward function moves closer to 0 as the mod
 ![Reward from reward function](https://huggingface.co/reasoning-course/images/resolve/main/grpo/13.png)
 
 <!-- @qgallouedec @mlabonne could you review this section please!? -->
-You might notice that the loss starts at zero and then increases during training, which may seem counterintuitive. This behavior is expected in GRPO and is directly related to the mathematical formulation of the algorithm. The loss in GRPO is proportional to the KL divergence (the cap relative to original policy) . As training progresses, the model learns to generate text that better matches the reward function, causing it to diverge more from its initial policy. This increasing divergence is reflected in the rising loss value, which actually indicates that the model is successfully adapting to optimize for the reward function.
+You might notice that the loss starts at zero and then increases during training, which may seem counterintuitive. This behavior is expected in GRPO and is directly related to the mathematical formulation of the algorithm. The loss in GRPO is proportional to the KL divergence (the cap relative to original policy). As training progresses, the model learns to generate text that better matches the reward function, causing it to diverge more from its initial policy. This increasing divergence is reflected in the rising loss value, which actually indicates that the model is successfully adapting to optimize for the reward function.
 
 ![Loss](https://huggingface.co/reasoning-course/images/resolve/main/grpo/14.png)