基于强化学习的集群多目标分配与智能决策方法

doi:10.3969/j.issn.1000-1093.2021.09.025

摘要/Abstract

摘要： 为提升高动态协同攻击条件下的攻防效能，研究基于强化学习的集群多目标智能分配与决策方法。建立综合攻击性能评估准则，包括基于相对运动信息的攻击优势度评估以及基于目标固有信息的威胁度评估。综合攻击性能、突防概率以及攻击消耗，设计攻防效费比性能指标。构建基于强化学习的多目标决策架构，设计以分配向量为基本元素的动作空间，以及基于量化性能指标的状态空间，利用Q-Learning方法对协同攻击方案，包括导弹选取以及分配形式进行智能决策。仿真结果表明，强化学习能够实现攻防效能最优的多目标在线决策，其计算效率相对于粒子群优化算法具有更明显的优势。

关键词: 目标分配, 协同攻击, 攻防效能, 智能决策, 强化学习

Abstract: A reinforcement learning-based swarm intelligent decision-making method of cooperative multi-target attack under high-dynamic situation is proposed. The composite evaluation criteria of attack performance is established, including the evaluation of attack superiority based on relative motion information and the threat evaluation based on the inherent information of target. To evaluate the attack-defence effectiveness, a cost-effectiveness ratio index is designed by combining attack performance, penetration probability and attack cost together. In addition, a multi-target decision-making architecture based on reinforcement learning is constructed, and an action space with allocation vectors as basic elements and a state space based on quantified performance indicators are designed. Q-Learning is employed to make intelligent decisions on cooperative attack plans, including missile selection and target assignment. The simulated results show that reinforcement learning can achieve multi-target online decision-making with the optimal offensive and defensive effectiveness, and its computational efficiency has more obvious advantages than that of particle swarm optimizer.

Key words: targetassignment, cooperativeattack, attack-defenseeffectiveness, intelligentdecision, reinforcementlearning

中图分类号:

TJ761.1⁺4

朱建文，赵长见，李小平，包为民. 基于强化学习的集群多目标分配与智能决策方法[J]. 兵工学报, 2021, 42(9): 2040-2048.

ZHU Jianwen, ZHAO Changjian, LI Xiaoping, BAO Weimin. Multi-target Assignment and Intelligent Decision Based on Reinforcement Learning[J]. Acta Armamentarii, 2021, 42(9): 2040-2048.

参考文献

［1］任章，郭栋，董希旺. 飞行器集群协同制导控制方法及应用研究［J］.导航定位与授时，2019, 6(5): 1-9.
REN Z, GUO D, DONG X W. Research on the cooperative gui-dance and control method and application for aerial vehicle swarm systems［J］. Navigation Position & Timing, 2019, 6(5): 1-9. (in Chinese)
［2］ BOGDANOWICZ Z R, TOLANO A, PATEL K, et al. Optimization of weapon-target pairings based on kill probabilities［J］. IEEE Transactions on Cybernetics, 2013, 43(6): 1835-1844.
［3］卢森堂.导弹自主编队协同制导控制技术［M］. 北京：国防工业出版社，2015: 88-96.
LU S T. Cooperative guidance & control of missiles autonomous formation［M］. Beijing: National Defense Industry Press, 2015: 88-96. (in Chinese)
［4］刘树衎, 王航宇, 卢发兴. 多枚反舰导弹协同攻击在线目标分配［J］. 指挥控制与仿真, 2016, 38(1): 38-40,52.
LIU S K, WANG H Y, LU F X. Online target assignment for cooperative attack of anti-ship of multiple missiles［J］. Command Control & Simulation, 2016, 38(1): 38-40,52. (in Chinese)
［5］ ZHAO M, ZHAO L L, SU X H, et al. Improved discrete mapping differential evolution for multi-unmanned aerial vehicles cooperative multi-targets assignment under unified model［J］. International Journal of Machine Learning & Cybernetics, 2017, 8(3): 765- 780.
［6］ DING Y F, YANG L Q, HOU J Y, et al. Multi-target collaborative combat decision-making by improved particle swarm optimizer［J］. Transactions of Nanjing University of Aeronautics and Astronautics, 2018, 35(1): 181-187.
［7］ SUN J J, LIU C S. Finite-horizon differential games for missile-target interception system using adaptive dynamic programming with input constraints［J］. International Journal of System Science, 2018, 49(2): 264-283.
［8］吴蔚楠. 多无人飞行器分布式任务规划技术研究［D］. 哈尔滨：哈尔滨工业大学，2018: 20-32.
WU W N. Research on distributed mission planning for multiple unmanned aerial vehicles ［D］. Harbin: Harbin Institute of Technology，2018: 20-32. (in Chinese)
［9］ CHEN W N, ZHANG J, CHUNG H S H, et al. A novel-based particle swarm optimization model for discrete optimization problems［J］. IEEE Transactions on Evolutionary Computation, 2010, 14(2): 278-300.
［10］费爱国，张陆游，刘刚，等. 基于粒子群拍卖混合算法的空空导弹制导权移交技术［J］. 宇航学报，2013, 34(3): 340-346.
FEI A G, ZHANG L Y, LIU G, et al. The technique for air-to-air missile guidance superiority handover based on particle swarm auction hybrid algorithm［J］. Journal of Astronautics, 2013, 34(3): 340-346. (in Chinese)
［11］ PRASHANT B, FARUK K, NAVDEEP S. Reinforcement learning based obstacle avoidance for autonomous underwater vehicle［J］. Journal of Marine Science and Application, 2019, 18(2): 228-238.
［12］ JUNELL J J, VAN KAMPENY E J, VISSER C D, et al. Reinforcement learning applied to a quadrotor guidance law in autonomous flight ［C］∥Proceedings of AIAA Guidance, Navigation, and Control Conference. Kissimmee, FL, US: AIAA, 2015.
［13］ GAUDET B, FURFARO R. Missile homing-phase guidance law design using reinforcement learning ［C］∥Proceedings of AIAA Guidance, Navigation, and Control Conference. Minneapolis, MN, US: AIAA, 2012.
［14］ GAUDET B, FURFARO R, LINARES R. Reinforcement learning for angle-only intercept guidance of maneuvering targets［C］∥Proceedings of AIAA SciTech Forum. Orlando, FL, US: AIAA, 2020.
［15］张秦浩，敖百强，张秦雪. Q-learning强化学习制导律［J］. 系统工程与电子技术, 2020, 42(2): 414-419.
ZHANG Q H, AO B Q, ZHANG Q X. Reinforcement learning guidance law of Q-learning［J］. Systems Engineering and Electronics, 2020, 42(2): 414-419. (in Chinese)
［16］刘冰雁, 叶雄兵, 岳智宏, 等. 基于多组并行深度Q网络的连续空间追逃博弈算法［J］. 兵工学报, 2021, 42(3): 663-672.
LIU B Y, YE X B, YUE Z H, et al. Continuous space pursuit-evasion game algorithm based on multi-group deep Q-network［J］. Acta Armamentarii, 2021, 42(3): 663-672. (in Chinese)



下2篇留版

[1]	李松, 麻壮壮, 张蕴霖, 邵晋梁. 基于安全强化学习的多智能体覆盖路径规划[J]. 兵工学报, 2023, 44(S2): 101-113.
[2]	曹子建, 孙泽龙, 闫国闯, 傅妍芳, 杨博, 李秦洁, 雷凯麟, 高领航. 基于强化学习的无人机集群对抗策略推演仿真[J]. 兵工学报, 2023, 44(S2): 126-134.
[3]	刘兴宇, 郭荣化, 任成才, 闫超, 常远, 周晗, 相晓嘉. 基于身份匈牙利算法的无人机蜂群分布式目标分配方法[J]. 兵工学报, 2023, 44(9): 2824-2835.
[4]	杨加秀, 李新凯, 张宏立, 王昊. 基于积分强化学习的四旋翼无人机鲁棒跟踪[J]. 兵工学报, 2023, 44(9): 2802-2813.
[5]	张安, 徐双飞, 毕文豪, 徐晗. 空地多目标攻击武器-目标分配与制导序列优化[J]. 兵工学报, 2023, 44(8): 2233-2244.
[6]	褚凯轩, 常天庆, 张雷. 基于改进人工蜂群算法的地面作战武器-目标分配[J]. 兵工学报, 2023, 44(7): 2171-2183.
[7]	李超, 王瑞星, 黄建忠, 江飞龙, 魏雪梅, 孙延鑫. 稀疏奖励下基于强化学习的无人集群自主决策与智能协同[J]. 兵工学报, 2023, 44(6): 1537-1546.
[8]	张建东, 王鼎涵, 杨啟明, 史国庆, 陆屹, 张耀中. 基于分层强化学习的无人机空战多维决策[J]. 兵工学报, 2023, 44(6): 1547-1563.
[9]	郑泽新, 李伟, 邹鲲, 李艳福. 基于强化学习的对空雷达抗干扰波形设计[J]. 兵工学报, 2023, 44(5): 1422-1430.
[10]	赵文飞, 陈健, 王, 滕克难. 基于强化学习的海上要地群协同防空动态火力分配[J]. 兵工学报, 2023, 44(11): 3516-3528.
[11]	蒋岩, 丁语嫣, 张兴龙, 徐昕. 基于模型预测与策略学习的智能车辆人机协同控制算法[J]. 兵工学报, 2023, 44(11): 3465-3477.
[12]	李烨, 郑纯, 马长胜, 邱荣贤. 基于拦截效率最大化的高功率微波武器系统与中近程防空武器协同作战目标分配模型[J]. 兵工学报, 2023, 44(11): 3489-3497.
[13]	李佳键, 史彦军, 杨雨, 李波, 赵熙俊. 无人集群作战任务的多智能体强化学习卸载决策[J]. 兵工学报, 2023, 44(11): 3295-3309.
[14]	沈宇婷, 孟新, 高跃清. 面向无人集群目标分配的层次化信息传播方法[J]. 兵工学报, 2023, 44(10): 3006-3025.
[15]	卫宁，王冠. 强化学习在智能无人系统决策管理中的应用[J]. 兵工学报, 2022, 43(S2): 164-169.