From 492515b75ccaad1434b26678e7ed4f2d1f3d25f7 Mon Sep 17 00:00:00 2001 From: hamidrajabi Date: Sat, 20 Nov 2021 01:32:09 +0330 Subject: [PATCH] Fix Typo "resujlt" to "result" --- lab3/RL.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lab3/RL.ipynb b/lab3/RL.ipynb index 05abf42d..92266481 100644 --- a/lab3/RL.ipynb +++ b/lab3/RL.ipynb @@ -417,7 +417,7 @@ "source": [ "## 3.4 Learning algorithm\n", "\n", - "Now we can start to define the learing algorithm which will be used to reinforce good behaviors of the agent and discourage bad behaviours. In this lab, we will focus on *policy gradient* methods which aim to **maximize** the likelihood of actions that result in large rewards. Equivalently, this means that we want to **minimize** the negative likelihood of these same actions. We achieve this by simply **scaling** the probabilities by their associated rewards -- effectively amplifying the likelihood of actions that resujlt in large rewards.\n", + "Now we can start to define the learing algorithm which will be used to reinforce good behaviors of the agent and discourage bad behaviours. In this lab, we will focus on *policy gradient* methods which aim to **maximize** the likelihood of actions that result in large rewards. Equivalently, this means that we want to **minimize** the negative likelihood of these same actions. We achieve this by simply **scaling** the probabilities by their associated rewards -- effectively amplifying the likelihood of actions that result in large rewards.\n", "\n", "Since the log function is monotonically increasing, this means that minimizing **negative likelihood** is equivalent to minimizing **negative log-likelihood**. Recall that we can easily compute the negative log-likelihood of a discrete action by evaluting its [softmax cross entropy](https://www.tensorflow.org/api_docs/python/tf/nn/sparse_softmax_cross_entropy_with_logits). Like in supervised learning, we can use stochastic gradient descent methods to achieve the desired minimization. \n", "\n", @@ -1109,4 +1109,4 @@ ] } ] -} \ No newline at end of file +}