-
Notifications
You must be signed in to change notification settings - Fork 155
Open
Description
Hi, Rokas:
First of all thanks for your great tutorial on reinforcement learning, I went through all the series and learned a lot.
In the PPOAgent I think there may be something wrong with this line. When I vstack
the discounted_r (shape of (n,1)) and subtract it with predicted values (shape of (n,)), the advantages become shape of (n,n). So I think maybe we should not vstack
discounted_r, but vstack
the advantages in this line advantages = np.vstack(discounted_r - values)
, then the advantages are shape of (n,1), which is the expected result.
Thanks.
Metadata
Metadata
Assignees
Labels
No labels