...
首页> 外文期刊>Aerospace science and technology >Self-learned suppression of roll oscillations based on model-free reinforcement learning
【24h】

Self-learned suppression of roll oscillations based on model-free reinforcement learning

机译:基于无模型增强学习的滚动振荡自学习抑制

获取原文
获取原文并翻译 | 示例
           

摘要

The high-angle-of-attack uncommanded roll oscillations are dangerous and cause significant challenges in flight control. This paper focuses on investigating the feasibility and performance of an artificial-intelligence method - "model-free reinforcement learning" (MFRL) on this issue, in both simulation and experiment. In simulation, two algorithms TD3 and SAC were used to learn the policies to suppress the roll oscillations of a widely used mathematical model. The agents only utilized current states as observation vector, and achieved perfect results. In the experiments, these two algorithms were used to learn the policies to suppress the roll oscillations of a flying-wing model which utilized spanwise blowing as its roll effectors. It is worth noting that unlike in simulation, the experiments investigated the influence of the observation vectors' memory size on training results. The results show that for both algorithms, the agents cannot learn good-enough policies when their observation spaces were constructed only by the current sensor data. This phenomenon, which is a big difference between experiments and simulations of MFRL, is due to the non-Markovian characteristic of the real-world dynamics caused by inevitable latencies. However, constructing the observation space using current and past sensor data help both the TD3 and SAC agents to learn great policies to suppress the oscillations using spanwise blowing in real world. Surprisingly, the trained agents showed some counterintuitive "smart" behaviors in tests. (C) 2021 Elsevier Masson SAS. All rights reserved.
机译:高角度的攻击不加价的滚动振荡是危险的,并且在飞行控制中引起重大挑战。本文侧重于调查人工智能方法的可行性和性能 - 在仿真和实验中,在此问题上的“无模型加固学习”(MFRL)。在仿真中,使用两种算法TD3和SAC来学习抑制广泛使用数学模型的滚动振荡的策略。代理仅利用当前状态作为观察载体,并实现了完美的结果。在实验中,这些算法用于学习抑制飞行翼模型的辊振荡的策略,该曲线模型用作其辊效器。值得注意的是,与模拟不同,实验研究了观察向量的内存大小对训练结果的影响。结果表明,对于这两个算法,代理商在其观察空间仅由当前传感器数据构建时,不能学习足够的政策。这种现象是MFRL的实验和模拟之间的巨大差异,是由于不可避免的延迟引起的现实世界动态的非马洛维维亚特征。然而,使用电流和过去的传感器数据构建观察空间,帮助TD3和SAC代理商学习使用现实世界中的跨越跨越振荡的巨大政策。令人惊讶的是,训练有素的代理商在测试中展示了一些违反直觉的“智能”行为。 (c)2021 Elsevier Masson SAS。版权所有。

著录项

  • 来源
    《Aerospace science and technology》 |2021年第9期|106850.1-106850.13|共13页
  • 作者单位

    Nanjing Univ Aeronaut & Astronaut Key Lab Unsteady Aerodynam & Flow Control Minist Ind & Informat Technol Yudao St 29 Nanjing 210016 Jiangsu Peoples R China;

    Nanjing Univ Aeronaut & Astronaut Key Lab Unsteady Aerodynam & Flow Control Minist Ind & Informat Technol Yudao St 29 Nanjing 210016 Jiangsu Peoples R China;

    Nanjing Univ Sci & Technol Natl Key Lab Transient Phys Nanjing Jiangsu Peoples R China;

    Nanjing Univ Aeronaut & Astronaut Key Lab Unsteady Aerodynam & Flow Control Minist Ind & Informat Technol Yudao St 29 Nanjing 210016 Jiangsu Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Reinforcement learning; Real-world; Uncommanded roll oscillation; Closed-loop blowing;

    机译:加强学习;现实世界;不称言的滚动振荡;闭环吹;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号