熵增强的熵正规强化学习以及从策略梯度到Q学习的连续路径（CS ML）---用户7305506--瞎采新闻

已知增加熵以奖励将贪婪的argmax策略软化为softmax策略。重新构造了熵增强，并导致有动机以KL散度的形式向目标函数引入附加的熵项，以使优化过程规则化。结果是在当前策略和softmax贪婪策略之间进行策略插值。该策略用于构建连续参数化的算法，该算法同时优化策略和Q函数，并且其极限分别对应于策略梯度和Q学习。实验表明，使用中间算法可以提高性能。

原文标题：Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

原文：Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy interpolating between the current policy and the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.

原文作者：Donghoon Lee

原文地址：https://arxiv.org/abs/2005.08844

熵增强的熵正规强化学习以及从策略梯度到Q学习的连续路径（CS ML）.pdf ---来自腾讯云社区的---用户7305506

给这篇文章的作者打赏

关于作者: 瞎采新闻

相关文章

热门文章

1渗透利器 | 常见的WebShell管理工具---Bypass

2美国新冠病毒确诊人数统计及预测---用户5908113

3什么时候使用 useMemo 和 useCallback---Nealyang

4Lua table 如何实现最快的 insert?---poslua

5Android开发 - NFC基础---zhangyunfeiVir