1. 中国兵器工业试验测试研究院 技术中心, 陕西 西安 710116
2. 南京理工大学 机械工程学院, 江苏 南京 210094
3. 哈尔滨工业大学 航天学院, 黑龙江 哈尔滨 150001
收稿:2022-03-21,
网络出版:2023-07-19,
纸质出版:2023-06-30
移动端阅览
李超, 王瑞星, 黄建忠, 等. 稀疏奖励下基于强化学习的无人集群自主决策与智能协同[J]. 兵工学报, 2023,44(6):1537-1546.
Chao LI, Ruixing WANG, Jianzhong HUANG, et al. Autonomous Decision-making and Intelligent Collaboration of UAV Swarms Based on Reinforcement Learning with Sparse Rewards[J]. Acta Armamentarii, 2023, 44(6): 1537-1546.
李超, 王瑞星, 黄建忠, 等. 稀疏奖励下基于强化学习的无人集群自主决策与智能协同[J]. 兵工学报, 2023,44(6):1537-1546. DOI: 10.12382/bgxb.2022.0177.
Chao LI, Ruixing WANG, Jianzhong HUANG, et al. Autonomous Decision-making and Intelligent Collaboration of UAV Swarms Based on Reinforcement Learning with Sparse Rewards[J]. Acta Armamentarii, 2023, 44(6): 1537-1546. DOI: 10.12382/bgxb.2022.0177.
无人集群将深刻地塑造战争样式
为提升无人集群自主决策算法能力
对异构无人集群攻防对抗自主决策方法进行研究。对无人集群对抗模型设计进行总体概述
并对无人集群攻防对抗场景进行模型设计;针对无人集群自主决策采用强化学习技术广泛存在的稀疏奖励问题
提出基于局部回报重塑的奖励机制设定方法;在此基础上叠加优先经验回放
有效地改善稀疏奖励问题;通过程序仿真和演示系统设计
验证该方法的优越性。该方法的研究将加速基于强化学习技术的无人集群自主决策算法网络收敛过程
对无人集群自主决策算法研究具有重要意义。
UAV swarms will profoundly shape the pattern of warfare. In order to improve the autonomous decision-making algorithm capability of UAV swarms
the autonomous decision-making method for heterogeneous UAV swarm attack-defense confrontation scenarios is studied. An overview of the design of the UAV swarm confrontation model and the model design of the UAV swarm attack-defense confrontation scenario are carried out. To solve the sparse reward problem which widely exists in the reinforcement learning technology in the autonomous decision-making of the UAV swarm
a reward mechanism setting method based on local reward reshaping is proposed. And then
the prioritized experience replay is superimposed
which effectively improves the sparse reward problem. Finally
the superiority of this method is verified by simulation and demonstration system design. This study will accelerate the network convergence process of the autonomous decision-making algorithm for UAV swarms based on reinforcement learning technology
which is of great significance to the research on autonomous decision-making algorithms of UAV swarms.
王莉 . 人工智能在军事领域的渗透与应用思考 [J ] . 科技导报 , 2017 , 35 ( 15 ): 15 - 19 .
WANG L . The penetration and application of artificial intelligence in the military field [J ] . Science & Technology Review , 2017 , 35 ( 15 ): 15 - 19 . (in Chinese)
罗德林 , 徐扬 , 张金鹏 . 无人机集群对抗技术新进展 [J ] . 科技导报 , 2017 , 35 ( 7 ): 26 - 31 .
LUO D L , XU Y , ZHANG J P . New progresses on UAV swarm confrontation [J ] . Science & Technology Review , 2017 , 35 ( 7 ): 26 - 31 . (in Chinese)
梁晓龙 , 侯岳奇 , 胡利平 , 等 . 无人集群试验评估研究现状分析及理论方法 [J ] . 南京航空航天大学学报 , 2020 , 52 ( 6 ): 846 - 854 .
LIANG X L , HOU Y Q , HU L P , et al . Review on evaluation and theoretical methods of un-manned swarm test [J ] . Journal of Nanjing University of Aeronautics & Astronautics , 2020 , 52 ( 6 ): 846 - 854 . (in Chinese)
朱建文 , 赵长见 , 李小平 , 等 . 基于强化学习的集群多目标分配与智能决策方法 [J ] . 兵工学报 , 2021 , 42 ( 9 ): 2040 - 2048 .
ZHU J W , ZHAO C J , LI X P , et al . Multi-target assignment and intelligent decision based on reinforcement learning [J ] . Acta Armamentarii , 2021 , 42 ( 9 ): 2040 - 2048 . (in Chinese) DOI: 10.3969/j.issn.1000-1093.2021.09.025 http://doi.org/10.3969/j.issn.1000-1093.2021.09.025 A reinforcement learning-based swarm intelligent decision-making method of cooperative multi-target attack under high-dynamic situation is proposed. The composite evaluation criteria of attack performance is established, including the evaluation of attack superiority based on relative motion information and the threat evaluation based on the inherent information of target. To evaluate the attack-defence effectiveness, a cost-effectiveness ratio index is designed by combining attack performance, penetration probability and attack cost together. In addition, a multi-target decision-making architecture based on reinforcement learning is constructed, and an action space with allocation vectors as basic elements and a state space based on quantified performance indicators are designed. Q-Learning is employed to make intelligent decisions on cooperative attack plans, including missile selection and target assignment. The simulated results show that reinforcement learning can achieve multi-target online decision-making with the optimal offensive and defensive effectiveness, and its computational efficiency has more obvious advantages than that of particle swarm optimizer.
杜威 , 丁世飞 . 多智能体强化学习综述 [J ] . 计算机科学 , 2019 , 46 ( 8 ): 1 - 8 . DOI: 10.11896/j.issn.1002-137X.2019.08.001 http://doi.org/10.11896/j.issn.1002-137X.2019.08.001 多智能体系统是一种分布式计算技术,可用于解决各种领域的问题,包括机器人系统、分布式决策、交通控制和商业管理等。多智能体强化学习是多智能体系统研究领域中的一个重要分支,它将强化学习技术、博弈论等应用到多智能体系统,使得多个智能体能在更高维且动态的真实场景中通过交互和决策完成更错综复杂的任务。文中综述了多智能体强化学习的最新研究进展与发展动态,首先介绍了多智能体强化学习的基础理论背景,回顾了文献中提出的多智能体强化学习的学习目标和经典算法,其被分别应用于完全合作、完全竞争和更一般(不合作也不竞争)的任务。其次,综述了多智能体强化学习的最新进展,近年来随着深度学习技术的成熟,在越来越多的复杂现实场景任务中,研究人员利用深度学习技术来自动学习海量输入数据的抽象特征,并以此来优化强化学习问题中智能体的决策。近期,研究人员结合深度学习等技术,从可扩展性、智能体意图、奖励机制、环境框架等不同方面对算法进行了改进和创新。最后,对多智能体强化学习的应用前景和发展趋势进行了总结与展望。目前多智能体强化学习在机器人系统、人机博弈、自动驾驶等领域取得了不错的进展,未来将被更广泛地应用于资源管理、交通系统、医疗、金融等各个领域。
DU W , DING S F . Overview on multi-agent reinforcement learning [J ] . Computer Science , 2019 , 46 ( 8 ): 1 - 8 . (in Chinese) DOI: 10.11896/j.issn.1002-137X.2019.08.001 http://doi.org/10.11896/j.issn.1002-137X.2019.08.001 Multi-agent system is a distributed computing technology,which can be used to solve problems in various fields,including robot system,distributed decision-making,traffic control and business management.Multi-agent reinforcement learning is an important branch in the field of multi-agent system research.It applies reinforcement learning technology and game theory to multi-agent systems,enabling multiple agents to complete more complicated tasks through interaction and decision-making in higher-dimensional and dynamic real scenes.This paper reviewed the recent research progress and development of multi-agent reinforcement learning.Firstly,the theoretical background of multi-agent reinforcement learning was introduced,and the learning objectives and classical algorithms of multi-agent reinforcement learning proposed in the literature were reviewed,which are respectively applied to complete cooperation,complete competition and more general (neither cooperation nor competition) tasks.Secondly,the latest development of multi-agent reinforcement learning was summarized.With the maturity of deep learning technology in recent years,in more and more complex realistic scene tasks,researchers use deep learning technology to automatically learn abstract features of massive input data,and then use these data to optimize the decision-making of agents in reinforcement lear-ning.Recently,researchers have combined deep learning and other technologies to improve and innovate algorithms in different aspects,such as scalability,agent intent,incentive mechanism,and environmental framework.At the end of this paper,the prospect of the application of multi-agent reinforcement learning were summarized.Multi-agent reinforcement learning has made good progress in the fields of robot system,man-machine game and autonomous driving,and will be applied in the fields of resource management,transportation system,medical treatment and finance in the future
郭宪 , 方勇纯 . 深入浅出强化学习 [M ] . 北京 : 电子工业出版社 , 2018 : 1 - 10 .
GUO X , FANG Y C . Reinforcement learning in a simple and in-depth way [M ] . Beijing : Publishing House of Electronics Industry , 2018 : 1 - 10 . (in Chinese)
陈智超 . 基于深度强化学习的无人潜航器智能对抗决策 [D ] . 哈尔滨 : 哈尔滨工业大学 , 2020 .
CHEN Z C . UUV intelligent countermeasure decision making based on deep reinforcement learning [D ] . Harbin : Harbin Institute of Technology , 2020 . (in Chinese)
JAGODNIK K M , THOMAS P S , VAN DEN BOGERT A J , et al . Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards [J ] . IEEE Transactions on Neural Systems and Rehabilitation Engineering , 2017 , 25 ( 10 ): 1892 - 1905 . DOI: 10.1109/TNSRE.2017.2700395 http://doi.org/10.1109/TNSRE.2017.2700395 Functional Electrical Stimulation (FES) employs neuroprostheses to apply electrical current to the nerves and muscles of individuals paralyzed by spinal cord injury to restore voluntary movement. Neuroprosthesis controllers calculate stimulation patterns to produce desired actions. To date, no existing controller is able to efficiently adapt its control strategy to the wide range of possible physiological arm characteristics, reaching movements, and user preferences that vary over time. Reinforcement learning (RL) is a control strategy that can incorporate human reward signals as inputs to allow human users to shape controller behavior. In this paper, ten neurologically intact human participants assigned subjective numerical rewards to train RL controllers, evaluating animations of goal-oriented reaching tasks performed using a planar musculoskeletal human arm simulation. The RL controller learning achieved using human trainers was compared with learning accomplished using human-like rewards generated by an algorithm; metrics included success at reaching the specified target; time required to reach the target; and target overshoot. Both sets of controllers learned efficiently and with minimal differences, significantly outperforming standard controllers. Reward positivity and consistency were found to be unrelated to learning success. These results suggest that human rewards can be used effectively to train RL-based FES controllers.
HARE J . Dealing with sparse rewards in reinforcement learning: arXiv:1910.09281v2 [R ] . Ithaca, NY, US : Cornell University , 2019 .
BENGIO Y , LOURADOUR J , COLLOBERT R , et al . Curriculum learning [C ] //Proceedings of the 26th annual international conference on machine learning. Montreal, Canada : International Machine Learning Society , 2009 : 41 - 48 .
ANDRYCHOWICZ M , WOLSKI F , RAY A , et al . Hindsight experience replay: arXiv:1707.01495v3 [R ] . Ithaca, NY, US : Cornell University , 2018 .
RAFATI J , NOELLE D C . Learning representations in model-free hierarchical reinforcement learning: arXiv:1810.10096v3 [R ] . Ithaca, NY, US : Cornell University , 2019 .
杨瑞 , 严江鹏 , 李秀 . 强化学习稀疏奖励算法研究——理论与实验 [J ] . 智能系统学报 , 2020 , 15 ( 5 ): 888 - 899 .
YANG R , YAN J P , LI X . Summary of sparse reward algorithms in reinforcement learning—theory and experiment [J ] . CAAI Transactions on Intelligent Systems , 2020 , 15 ( 5 ): 888 - 899 . (in Chinese)
方嘉良 . 基于强化学习的稀疏奖励问题研究 [D ] . 北京 : 中国地质大学 , 2020 : 29 - 39 .
FANG J L . Research on Sparse Reward Based on Reinforcement Learning [D ] . Beijing : China University of Geosciences , 2020 : 29 - 39 . (in Chinese)
杨惟轶 , 白辰甲 , 蔡超 , 等 . 深度强化学习中稀疏奖励问题研究综述 [J ] . 计算机科学 , 2020 , 47 ( 3 ): 182 - 191 . DOI: 10.11896/jsjkx.190200352 http://doi.org/10.11896/jsjkx.190200352 强化学习作为机器学习的重要分支,是在与环境交互中寻找最优策略的一类方法。强化学习近年来与深度学习进行了广泛结合,形成了深度强化学习的研究领域。作为一种崭新的机器学习方法,深度强化学习同时具有感知复杂输入和求解最优策略的能力,可以应用于机器人控制等复杂决策问题。稀疏奖励问题是深度强化学习在解决任务中面临的核心问题,在实际应用中广泛存在。解决稀疏奖励问题有利于提升样本的利用效率,提高最优策略的水平,推动深度强化学习在实际任务中的广泛应用。文中首先对深度强化学习的核心算法进行阐述;然后介绍稀疏奖励问题的5种解决方案,包括奖励设计与学习、经验回放机制、探索与利用、多目标学习和辅助任务等;最后对相关研究工作进行总结和展望。
YANG W Y , BAI C J , CAI C , et al . Survey on sparse reward in deep reinforcement learning [J ] . Computer Science , 2020 , 47 ( 3 ): 182 - 191 . (in Chinese)
王瑞星 . 含有稀疏奖励的异构多智能体强化学习对抗方法研究 [D ] . 哈尔滨 : 哈尔滨工业大学 , 2021 .
WANG R X . Research on reinforcement learning countermeasures for heterogeneous multi-agents with sparse rewards [D ] . Harbin : Harbin Institute of Technology , 2021 . (in Chinese)
王瑞星 , 董诗音 , 江飞龙 , 等 . 稀疏奖励下基于强化学习的异构多智能体对抗 [J ] . 信息技术 , 2021 ( 5 ): 12 - 20 .
WANG R X , DONG S Y , JIANG F L , et al . Heterogeneous multi-agent confrontation based on reinforcement learning under the sparse reward [J ] . Information Technology , 2021 ( 5 ): 12 - 20 . (in Chinese)
李理 , 李旭光 , 郭凯杰 , 等 . 国产化环境下基于强化学习的地空协同作战仿真 [J ] . 兵工学报 , 2022 , 43 ( 增刊1 ): 74 - 81 .
LI L , LI X G , GUO K J , et al . Simulation of ground-air cooperative combat based on reinforcement learning in localization environment [J ] . Acta Armamentarii , 2022 , 43 ( S1 ): 74 - 81 . (in Chinese) DOI: 10.12382/bgxb.2022.A005 http://doi.org/10.12382/bgxb.2022.A005 For the actual problems of lack of actual combat scenes and insufficient training data in the military field, the deep reinforcement learning method is used to realize a multi-agent decision-making model for the unmanned ground and air cooperative combat simulation. A virtual simulation environment is built using Phytium CPU and Kunlun K200 NPU as hardware platform and Kylin V10 operating system as software environment. The simulation environment state representation,agents’ action space and rewards mechanism are set, and a multi-agent decision-making model based on the deep deterministic policy gradient (MADDPG) algorithm is established. Simulation experiments verified that MADDPG algorithm can make reward value converge gradually in ground-air cooperative combat simulation scenarios, thus proving the effectiveness of MADDPG algorithm in the simulation scene of the ground-air cooperative combat.
HE Y M , XING L N , CHEN Y W , et al . A generic Markov decision process model and reinforcement learning method for scheduling agile earth observation satellites [J ] . IEEE Transactions on Systems, Man, and Cybernetics: Systems , 2020 .
LIU H , LI X M , WU G H , et al . An iterative two-phase optimization method based on divide and conquer framework for integrated scheduling of multiple UAVs [J ] . IEEE Transactions on Intelligent Transportation Systems , 2020 , 22 ( 9 ): 5926 - 5938 . DOI: 10.1109/TITS.2020.3042670 http://doi.org/10.1109/TITS.2020.3042670 https://ieeexplore.ieee.org/document/9296564/ https://ieeexplore.ieee.org/document/9296564/
LI B J , WU G H , HE Y M , et al . An overview and experimental study of learning-based optimization algorithms for vehicle routing problem: arXiv:2107.07076v2 [R ] . Ithaca, NY, US : Cornell University , 2022 .
WANG R X , LI Y Q , ZHANG H L , et al . Satellite mission support efficiency evaluation based on cascade decomposition and Bayesian network [C ] //Proceedings of International Conference on Wireless and Satellite Systems. Nanjing, China : Springer , 2020 : 46 - 60 .
SCHAUL T , QUAN J , ANTONOGLOU I , et al . Prioritized experience replay: arXiv:1511.05952v4 [R ] . Ithaca, NY, US : Cornell University , 2016 .
MNIH V , KAVUKCUOGLU K , SILVER D , et al . Playing atari with deep reinforcement learning: arXiv:1312.5602v1 [J ] . Ithaca, NY, US:Cornell University , 2013 .
0
浏览量
1445
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号