您的位置 首页 > 腾讯云社区

熵增强的熵正规强化学习以及从策略梯度到Q学习的连续路径(CS ML)---用户7305506

已知增加熵以奖励将贪婪的argmax策略软化为softmax策略。 重新构造了熵增强,并导致有动机以KL散度的形式向目标函数引入附加的熵项,以使优化过程规则化。 结果是在当前策略和softmax贪婪策略之间进行策略插值。 该策略用于构建连续参数化的算法,该算法同时优化策略和Q函数,并且其极限分别对应于策略梯度和Q学习。 实验表明,使用中间算法可以提高性能。

原文标题:Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

原文:Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy interpolating between the current policy and the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.

原文作者:Donghoon Lee

原文地址:https://arxiv.org/abs/2005.08844

熵增强的熵正规强化学习以及从策略梯度到Q学习的连续路径(CS ML).pdf ---来自腾讯云社区的---用户7305506

关于作者: 瞎采新闻

这里可以显示个人介绍!这里可以显示个人介绍!

热门文章

留言与评论(共有 0 条评论)
   
验证码: