Used machin library to solve the Continuous Mountain Car Problem using PPO and TD3. Implemented the right actor-critic networks and found the right hyper-parameters.
Expected Return per iteration.
Used RobotDART, OpenAI Gym spaces, created reward function and used PPO and TD3. Used Frame Skipping technique. The initial position is defined as x0 = [π], the observation space is the vector: [cos θ, sin θ, torque], and the reward function uses the angle θ, torque, and the command given to the robot.
TD3 - Expected return per iteration.
PPO - Expected return per iteration.
Used RobotDART, OpenAI Gym spaces, created reward function and used PPO and TD3. The observation space is a vector that contains all the positions and velocities of the robot's joints, and the reward function is the norm of the difference between the final and current positions.