首页> 外文会议>IEEE Global Communications Conference >Adaptive proportional fair parameterization based LTE scheduling using continuous actor-critic reinforcement learning
【24h】

Adaptive proportional fair parameterization based LTE scheduling using continuous actor-critic reinforcement learning

机译:使用连续参与者批评强化学习的基于自适应比例公平参数化的LTE调度

获取原文

摘要

Maintaining a desired trade-off performance between system throughput maximization and user fairness satisfaction constitutes a problem that is still far from being solved. In LTE systems, different tradeoff levels can be obtained by using a proper parameterization of the Generalized Proportional Fair (GPF) scheduling rule. Our approach is able to find the best parameterization policy that maximizes the system throughput under different fairness constraints imposed by the scheduler state. The proposed method adapts and refines the policy at each Transmission Time Interval (TTI) by using the Multi-Layer Perceptron Neural Network (MLPNN) as a non-linear function approximation between the continuous scheduler state and the optimal GPF parameter(s). The MLPNN function generalization is trained based on Continuous Actor-Critic Learning Automata Reinforcement Learning (CACLA RL). The double GPF parameterization optimization problem is addressed by using CACLA RL with two continuous actions (CACLA-2). Five reinforcement learning algorithms as simple parameterization techniques are compared against the novel technology. Simulation results indicate that CACLA-2 performs much better than any of other candidates that adjust only one scheduling parameter such as CACLA-1. CACLA-2 outperforms CACLA-1 by reducing the percentage of TTIs when the system is considered unfair. Being able to attenuate the fluctuations of the obtained policy, CACLA-2 achieves enhanced throughput gain when severe changes in the scheduling environment occur, maintaining in the same time the fairness optimality condition.
机译:在系统吞吐量最大化和用户公平性满意度之间保持期望的折衷性能构成了仍未解决的问题。在LTE系统中,可以通过使用广义比例公平(GPF)调度规则的适当参数化来获得不同的权衡级别。我们的方法能够找到最佳的参数化策略,从而在调度程序状态施加的不同公平性约束下最大化系统吞吐量。所提出的方法通过使用多层感知器神经网络(MLPNN)作为连续调度程序状态和最佳GPF参数之间的非线性函数逼近,对每个传输时间间隔(TTI)的策略进行调整和改进。 MLPNN功能泛化是基于连续演员-批判性学习自动机强化学习(CACLA RL)进行训练的。通过使用具有两个连续动作的CACLA RL(CACLA-2),可以解决双重GPF参数化优化问题。将五种强化学习算法作为简单的参数化技术与新技术进行了比较。仿真结果表明,CACLA-2的性能要比仅调整一个调度参数的其他候选产品(例如,CACLA-1)要好得多。当系统被认为不公平时,CACLA-2通过降低TTI的百分比来胜过CACLA-1。 CACLA-2能够减轻获得的策略的波动,当调度环境发生严重变化时,可以提高吞吐量,同时保持公平性最佳状态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号