首页> 外文会议>IEEE Conference on Computer Communications >Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach
【24h】

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

机译:使用未知实用程序最大化随机网络实用程序:多武装强盗方法

获取原文

摘要

In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as ‘good’ as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.
机译:在本文中,我们研究了一种新的随机网络实用程序最大化(NUM)问题,其中代理的实用程序是未知的。每个代理的实用程序取决于它从网络运营商/控制器接收的资源量。操作员希望执行最大化网络的预期总实用程序的资源分配。我们考虑阈值类型的实用程序函数,如果它接收的资源量高于某个阈值,则每个代理获取非零实用程序。否则,其实用程序为零(实时硬状态)。我们将此NUM设置与未知的实用程序构成为后悔最小化问题。我们的目标是确定作为知道代理商的公用事业的Oracle策略作为“良好”的策略。我们将此问题设置绘制为强盗设置,其中每轮中获得的反馈取决于分配给代理的资源。我们使用来自多重播放多武装匪和组合半爆炸的想法提出了这种新颖设置的算法。我们表明,当所有代理具有相同的实用程序时,所提出的算法是最佳的。我们通过数值实验验证我们所提出的算法的性能保证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号