您的位置 首页 > 腾讯云社区

强盗反馈下的机构设计(cs.GT)---用户7199428

我们研究了多轮福利最大化机制设计问题,其中,在每一轮,一个机制分配一个分配给一组代理,并收取他们的价格。然后,代理报告他们实现的(随机)值给机制。这是由云市场和在线广告中的应用所激励的,其中代理商只有在体验之后才知道分配的价值。这些值的分布对于代理事先是未知的,这需要在多个回合中学习它们,同时试图找到社会最优的分配集合。我们的重点是设计真实的和个别的理性机制,他从长远来看会模仿经典的VCG机制。为此,我们针对福利定义了三个遗憾指数,每个代理的个体效用(价值减去价格)和机制(收入减去成本)。借由经过T轮分配后,选取这三个概念的最大值后,产生的比Ω(T2/3)更低的边界,可知这三个指数是相互独立的。我们描述了一个系列的任何时间算法实现这一速率。所提出的框架来灵活地控制定价方案,以便在代理和卖方遗憾指数之间进行权衡,并另外控制真实性和个体合理性的程度。

原文题目:Mechanism Design with Bandit Feedback

原文:We study a multi-round welfare-maximising mechanism design problem, where, on each round, a mechanism assigns an allocation each to a set of agents and charges them a price. Then the agents report their realised (stochastic) values back to the mechanism. This is motivated by applications in cloud markets and online advertising where an agent may know her value for an allocation only after experiencing it. The distribution of these values is unknown to the agent beforehand which necessitates learning them over multiple rounds while simultaneously attempting to find the socially optimal set of allocations. Our focus is on designing truthful and individually rational mechanisms which imitate the classical VCG mechanism in the long run. To that end, we define three notions of regret for the welfare, the individual utilities of each agent (value minus price) and that of the mechanism (revenue minus cost). We show that these three terms are interdependent via an Ω(T2/3) lower bound for the maximum of these three terms after T rounds of allocations. We describe a family of anytime algorithms which achieve this rate. The proposed framework provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets, and additionally to control the degree of truthfulness and individual rationality.

原文作者:Kirthevasan Kandasamy, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica

原文地址:https://arxiv.org/abs/2004.08924

强盗反馈下的机构设计(cs.GT).pdf ---来自腾讯云社区的---用户7199428

关于作者: 瞎采新闻

这里可以显示个人介绍!这里可以显示个人介绍!

热门文章

留言与评论(共有 0 条评论)
   
验证码: