Saturday, February 13, 2021

Data Science Study Notes: reinforcement learning

Terminology: State vs Action vs Policy vs Reward vs State Transition. Policy function is probabality density function(PDF), policy network: use a neural network to approxiamate the policy function.
Policy-based reinforcement learning:
Policy-based reinforcement learninga algorithm:
Policy-based method: if a good policy function pai is known, the agent can be controlled by the policy: randomly sample a_t from the policy function pai. However, in reality, we don't know that, in fact, we were trying to get that. So we can approximate the policy function pai by the policy network, which is why the deep learning neural network/convolution/dense coming to play. When searching for the best policy, use the policy gradient algorithm to maximum the expecation of the reward.

The relationship:
Actor=Athelete sports players. Critic=Referee. How do we train the player to get the champion? The athelete needs a lot of practice and training, how does the player know he is getting better? It has to go through by referee! for the immediate feedback.

However, the referee themselve in the beginning might not know exactly what are the best actions, so they also need to get trained, which are good actions with high performance, and which are not. After we have trained the value network(critic) for the referee, then use that network to train the player.
How do we train the athelete(the policy network=actor):
How do we train the referee/critic together:
Train two networks: policy network(player) and value network(critic), what's happening during the training and what's after:
Train the network -1:
Train the network -2:
Train the network -3:
Value-based reinforcement learning:
Gym is a toolkit for developing and comparing reinforment learning algorithms. Classic control problem: cart pole, Pendulum, MujoCo(continuous control task, Humanoid walk continuously etc)

The Tutorial is based on reinforcement learning grandmaster Dr Wang.

No comments:

Post a Comment

Python Study notes: how do we use Underscore(_) in Python

You will find max six different uses of underscore(_) . If you want you can use it for different purposes after you have an idea about unde...