首页> 外文学位 >Efficient market mechanisms and simulation-based learning for multi-agent systems.
【24h】

Efficient market mechanisms and simulation-based learning for multi-agent systems.

机译:多代理系统的有效市场机制和基于仿真的学习。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation has two independent theses.; In the first part, we study the design auction-based distributed mechanisms for resource allocation in multi-agent systems such as bandwidth allocation, network routing, electronic marketplaces, robot teams, and air-traffic control.; The work is motivated by a resource allocation problem in communication networks where there are buyers and sellers of bandwidth, each of them being independent and selfish. Buyers want routes while sellers offer bandwidth on individual links (we call such markets combinatorial). We first investigate the existence of competitive equilibrium in combinatorial markets. We first show how network topology affects existence of competitive equilibrium. We then adopt Aumann's continuum exchange economy as a model of perfect competition and show the existence of competitive equilibrium in it when money is also a good. We assume that preferences are continuous and monotonic in money. The existence of competitive equilibrium in the continuum combinatorial market is then used to show the existence of various enforceable and non-enforceable approximate competitive equilibria in finite markets.; We then propose a combinatorial market mechanism c-SeBiDA. We study the interaction between buyers and sellers when they act strategically and may not be truthful. We show that a Nash equilibrium exists in the c-SeBiDA auction game with complete information, and more surprisingly, the resulting allocation is efficient. In reality, the players may have incomplete information. So we consider the Bayesian-Nash equilibrium. When there is only one type of good, we show that the mechanism is asymptotically Bayesian incentive-compatible under the ex post individual rationality constraint and hence asymptotically efficient.; In the second part, we consider the multi-agent pursuit-evasion game as the motivating problem and study simulation-based learning for partially observable Markov decision processes (MDP) and games.; The value function of a Markov decision process assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We derive bounds on the number of runs needed for the uniform convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the Vapnik-Chervonenkis or Pseudo-dimension of the policy class. These results are extended for partially observed processes, and for Markov games. Uniform convergence results are also obtained for the average reward case, the only such known results in the literature. (Abstract shortened by UMI.)
机译:本文有两个独立的论点。在第一部分中,我们研究了基于设计拍卖的分布式机制,用于在多代理系统中进行资源分配,例如带宽分配,网络路由,电子市场,机器人团队和空中交通管制。这项工作是受通信网络中资源分配问题的推动的,在通信网络中,有带宽的买卖双方,每个人都是独立和自私的。买方需要路线,而卖方则在各个链路上提供带宽(我们称此类市场为组合市场)。我们首先研究组合市场中竞争均衡的存在。我们首先显示网络拓扑如何影响竞争均衡的存在。然后,我们采用Aumann的连续货币交换经济作为完全竞争的模型,并证明当金钱也很不错时,竞争均衡的存在。我们假设偏好是连续的并且在金钱上是单调的。然后,在连续组合市场中竞争均衡的存在被用来表明有限市场中各种可执行和不可执行的近似竞争均衡的存在。然后,我们提出了一种组合市场机制c-SeBiDA。我们研究买卖双方在采取战略行动时可能并不真实的互动。我们表明,具有完整信息的c-SeBiDA拍卖游戏中存在纳什均衡,更令人惊讶的是,所得分配是有效的。实际上,玩家可能拥有不完整的信息。因此,我们考虑了贝叶斯-纳什均衡。当只有一种商品时,我们证明了该机制在事后个人理性约束下是渐近的贝叶斯激励相容的,因此渐近有效。在第二部分中,我们将多主体追逃游戏视为动机问题,并针对部分可观察到的马尔可夫决策过程(MDP)和博弈研究基于仿真的学习。马尔可夫决策过程的价值函数为每个保单分配其预期的折现报酬。可以将预期的奖励估算为许多独立模拟运行中奖励的经验平均值。我们根据政策类别的Vapnik-Chervonenkis或伪维度,得出了将经验平均值统一收敛到一类政策所需的预期报酬所需的运行次数的界限。这些结果扩展到部分观察到的过程以及Markov游戏。对于平均奖励情况,也获得了一致的收敛结果,这是文献中仅有的已知结果。 (摘要由UMI缩短。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号