进化在地球上产生了人类和动物的智慧。我们认为,发展人工类人智能的道路将通过模拟自然界中的进化过程。在自然界中,驱动大脑发展的过程有两个:进化和学习。进化是缓慢的,跨代的,除其他外,它通过改变个体的内部奖励功能来定义个体所学的东西。学习在一个人的一生中起着很快的作用,它能迅速更新代理人的政策,使快乐最大化,痛苦最小化。奖赏函数通过进化缓慢地与适应度函数对齐,然而,随着代理进化环境,其适应度函数也随之改变,增加了奖赏与适应度之间的不一致性。在模拟中复制这两个过程在计算上非常昂贵。这项工作提出了通过进化奖励(EvER)的进化,允许学习通过确保奖励函数与适应度函数的一致性,单枪匹马地推动搜索具有日益进化适应度的策略。在这项研究中,要利用特工一生所经历的整个状态动作轨迹。与此相反,目前的进化算法丢弃了这些信息,从而限制了它们处理顺序决策问题的潜在效率。我们在两个简单的受生物启发的环境中测试了我们的算法,并与最先进的进化算法相比,展示了它在生成更具生存和繁殖基因能力的代理方面的优势。
原文题目:Mimicking Evolution with Reinforcement Learning
原文:Evolution gave rise to human and animal intelligence here on Earth. We argue that the path to developing artificial human-like-intelligence will pass through mimicking the evolutionary process in a nature-like simulation. In Nature, there are two processes driving the development of the brain: evolution and learning. Evolution acts slowly, across generations, and amongst other things, it defines what agents learn by changing their internal reward function. Learning acts fast, across one's lifetime, and it quickly updates agents' policy to maximise pleasure and minimise pain. The reward function is slowly aligned with the fitness function by evolution, however, as agents evolve the environment and its fitness function also change, increasing the misalignment between reward and fitness. It is extremely computationally expensive to replicate these two processes in simulation. This work proposes Evolution via Evolutionary Reward (EvER) that allows learning to single-handedly drive the search for policies with increasingly evolutionary fitness by ensuring the alignment of the reward function with the fitness function. In this search, EvER makes use of the whole state-action trajectories that agents go through their lifetime. In contrast, current evolutionary algorithms discard this information and consequently limit their potential efficiency at tackling sequential decision problems. We test our algorithm in two simple bio-inspired environments and show its superiority at generating more capable agents at surviving and reproducing their genes when compared with a state-of-the-art evolutionary algorithm.
原文作者:João Abrantes
原文地址:https://arxiv.org/abs/2004.00048
用强化学习模拟进化.pdf ---来自腾讯云社区的---用户7095611
微信扫一扫打赏
支付宝扫一扫打赏