Simulation of Ground-air Cooperative Combat Based on Reinforcement Learning in Localization Environment

doi:10.12382/bgxb.2022.A005

Abstract

Abstract: For the actual problems of lack of actual combat scenes and insufficient training data in the military field， the deep reinforcement learning method is used to realize a multi-agent decision-making model for the unmanned ground and air cooperative combat simulation. A virtual simulation environment is built using Phytium CPU and Kunlun K200 NPU as hardware platform and Kylin V10 operating system as software environment. The simulation environment state representation，agents’ action space and rewards mechanism are set， and a multi-agent decision-making model based on the deep deterministic policy gradient (MADDPG) algorithm is established. Simulation experiments verified that MADDPG algorithm can make reward value converge gradually in ground-air cooperative combat simulation scenarios， thus proving the effectiveness of MADDPG algorithm in the simulation scene of the ground-air cooperative combat.

Key words: ground-aircooperativecombat, reinforcementlearning, deepdeterministicpolicygradientalgorithm, multi-agentmodel, localizationenvironment

CLC Number:

E919
TP181

LI Li， LI Xuguang， GUO Kaijie， SHI Chao， CHEN Zhaowen. Simulation of Ground-air Cooperative Combat Based on Reinforcement Learning in Localization Environment[J]. Acta Armamentarii, 2022, 43(S1): 74-81.

References

［1］孟红，朱森.地面无人系统的发展及未来趋势[J].兵工学报，2014，35(增刊1):1-7.
MENG H，ZHU S.The development and future trends of unmanned ground systems[J]. Acta Armamentarii，2014，35(S1):1-7.（in Chinese）
[2］张宇，张琰，邱绵浩，等.地空无人平台协同作战应用研究[J].火力与指挥控制，2021，46(5):1-5，11.
ZHANG Y，ZHANG Y，QIU M H，et al. Research on the ground-air unmanned platform cooperative combat application[J].Fire Control & Command Control，2021，46(5):1-5，11.(in Chinese)
[3］李航，刘代金，刘禹.军事智能博弈对抗系统设计框架研究[J].火力与指挥控制，2020，45(9):116-121.
LI H，LIU D J，LIU Y.Architecture design research of military intelligent wargame system[J].Fire Control & Command Control，2020，45(9):116-121.(in Chinese)
［4］ MNIH V，KAVUKCUOGLU K，SILVER D，et al. Playing Atari with deep reinforcement learning［J/OL］. Computer Science， 2013.arXiv preprint arXiv:1312.5602.
［5］ LILLICRAP T P，HUNT J J，PRITZEL A，et al. Continuous control with deep reinforcement learning［J］. Computer Science，2016，8(6): A187.
［6］ SILVER D，HUANG A，MADDISON C J，et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature，2016，529:484-489.
［7］ HERNANDEZ-LEAL P，KARTAL B，TAYLOR M E. Is multiagent deep reinforcement learning the answer or the question? a brief survey［J/OL］.Computer Science，2019.rXiv:1810.05587v3.
［8］ TAMPUU A，MATIISEN T，KODELJA D，et al. Multiagent cooperation and competition with deep reinforcement learning［J］.PLoS ONE，2017，12(4):e0172395.
［9］ LOWE R，WU Y，TAMAR A，et al.Multi-agent actor-critic for mixed cooperative competitive environments［C］∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook，NY，US:Curran Associates Inc.，2017:6382-6393.
［10］ HANSEN S. Using deep Q-learning to control optimization hyperparameters［J/OL］.2016. arXiv preprint arXiv:1602.04062.
［11］ AKKAYA I，ANDRYCHOWICZ M，CHOCIEJ M，et al.Solving rubik's cube with a robot hand［J］.2019.arXiv preprint arXiv:1910.07113.
［12］ VINYALS O，BABUSCHKIN I，CZARNECKI W M，et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning［J］.Nature，2019，575:350-354.
［13］ KENDALL A，HAWKE J，JANZ D，et al.Learning to drive in a day［C］∥Proceedings of the IEEE International Conference on Robotics and Automation.Montreal，QC，Canada: IEEE，2019:8248-8254.
[14］陶九阳，吴琳，胡晓峰.AlphaGo技术原理分析及人工智能军事应用展望[J].指挥与控制学报，2016，2(2):114-120.
TAO J Y，WU L，HU X F. Principle analysis on AlphaGO and perspective in military application of artifical intelligence[J]. Journal of Command and Control，2016，2(2):114-120.(in Chinese)
[15］马悦，吴琳，许霄，等.智能化作战任务规划需求分析[J].指挥控制与仿真，2021，43(4):61-67.
MA Y，WU L，XU X，et al.Requirement analysis of intelligent operation task planning[J].Command Control & Simulation，2021，43(4):61-67.(in Chinese)
[16］徐志雄，曹雷，陈希亮，等.基于Double BP神经网络的分层强化学习方法[J/OL].解放军理工大学学报(自然科学版):1-6. [2021-07-12]. http:∥kns.cnki.net/kcms/detail/32.1430.N.20170417.1611.008.html.
XU Z X，CAO L，CHEN X L，et al.Hierarchical reinforcement learning method based on double BP neural network[J/OL].Journal of PLA University of Science and Technology (Natural Science Edition):1-6.[2021-07-12]. http:∥kns.cnki.net/kcms/detail/32.1430.N.20170417.1611.008.html.（in Chinese）
[17］徐志雄，曹雷，陈希亮.基于强化学习的无人坦克对战仿真研究[J].计算机工程与应用，2018，54(8):166-171.
XU Z X，CAO L，CHEN X L. Research on unmanned tank battle simulation based on reinforcement learning[J]. Computer Engineering and Applications，2018，54(8):166-171.(in Chinese)
[18］卢锐轩，孙莹，杨奇，等.基于人工智能技术的智能自博弈平台研究[J].战术导弹技术，2019(2):47-52，98.
LU R X，SUN Y，YANG Q，et al. Research on intelligent self-game platform based on artificial intelligence technology[J].Tactical Missile Technology，2019(2):47-52，98.(in Chinese)
[19］黄晓冬，苑海涛，毕敬，等.基于DQN的海战场舰船路径规划及仿真[J].系统仿真学报，2021，33(10):2440-2448.
HUANG X D，YUAN H T，BI J，et al. DQN-based path planning method and simulation for submarine and warship in naval battlefield[J].Journal of System Simulation，2021，33(10):2440-2448.(in Chinese)

[1]	WEI Ning, WANG Guan. Application of Reinforcement Learning in Decision-Making Management of Intelligent Unmanned System [J]. Acta Armamentarii, 2022, 43(S2): 164-169.
[2]	WEI Lianzhen, GONG Jianwei, CHEN Huiyan, LI Zirui, GONG Cheng. Tracking and Aiming Adaptive Control for Unmanned Combat Ground Vehicle on the Move Based on Reinforcement LearningCompensation [J]. Acta Armamentarii, 2022, 43(8): 1947-1955.
[3]	MA Ye， FAN Wenhui， CHANG Tianqing. Optimization Method of Unmanned Swarm Defensive Combat Scheme Based on Intelligent Algorithm [J]. Acta Armamentarii, 2022, 43(6): 1415-1425.
[4]	LI Qingbo, LI Fang, DONG Ruixing, FAN Ruishan, XIE Wenlong. Navigation Ratio Design of Proportional Navigation Law Using Reinforcement Learning [J]. Acta Armamentarii, 2022, 43(12): 3040-3047.
[5]	ZHU Jianwen, ZHAO Changjian, LI Xiaoping, BAO Weimin. Multi-target Assignment and Intelligent Decision Based on Reinforcement Learning [J]. Acta Armamentarii, 2021, 42(9): 2040-2048.
[6]	CHEN Zhongyuan, WEI Wenshu, CHEN Wanchun. Reinforcement Learning-based Intelligent Guidance Law for Cooperative Attack of Multiple Missiles [J]. Acta Armamentarii, 2021, 42(8): 1638-1647.
[7]	ZHANG Wanqing, YU Wenbin, LI Jinglin, CHEN Wanchun. Cooperative Reentry Guidance for Intelligent Lateral Maneuver of Hypersonic Vehicle Based on Downrange Analytical Solution [J]. Acta Armamentarii, 2021, 42(7): 1400-1411.
[8]	GAO Ang， DONG Zhiming， YE Hongbing， SONG Jinghua， GUO Qisheng. Loitering Munition Penetration Control Decision Based on Deep Reinforcement Learning [J]. Acta Armamentarii, 2021, 42(5): 1101-1110.
[9]	LENG Peng-fei， XU Chao-yang. Specific Emitter Identification Based on Deep Reinforcement Learning [J]. Acta Armamentarii, 2018, 39(12): 2420-2426.