面向OODA作战流程的防空火力网端对端智能构建算法

doi:10.12382/bgxb.2023.1010

摘要/Abstract

摘要：

针对防空战场环境下目标数量多、装备协同难、体系反应拙的问题,提出一种面向侦测-整编-决策-打击(Observe-Orient-Decide-Act, OODA)作战流程的防空火力网端对端智能构建算法。围绕OODA作战流程,构建由情报网、指控网和火力网组成的防空体系框架,并基于此框架着力解决影响战场胜负关键的火力网智能构建;将拦截武器损毁目标建模为马尔可夫决策过程,并给出相应的状态空间、动作空间与奖励策略等;在此基础上,通过对标准端对端近端策略优化算法进行改进,提高模型精度和减少训练时间。以某防空反导联合区域作战场景为例,开展所提算法的评估验证。实验结果表明:所提方法相比于规则和启发式算法能够快速准确地生成防空火力网设计方案,尤其在同等大规模作战场景中的计算效率和作战成本方面具有更突出的优势,为作战体系全流程的杀伤网构建提供了基础。

关键词: 防空火力网, 改进近端策略优化算法, OODA作战流程, 战场态势, 端对端训练

Abstract:

Aiming at the problems of a large number of targets, difficult equipment coordination, and poor system response in an air-defense battlefield environment, an end-to-end intelligent construction algorithm of air-defense firepower network for the observe-orient-decide-act (OODA) operation process is proposed. An air-defense system-of-systems framework composed of an intelligence network, command network, and firepower network is constructed for the OODA operation process. Based on this framework, the intelligent construction of firepower network, which is the key to the success or failure on a battlefield, is solved. The process of intercepting the weapon attacking a target is modeled as a Markov decision process, and the corresponding state space, action space, and reward strategy are given. On this basis, the standard end-to-end proximal policy optimization (PPO) algorithm is optimized to improve the model accuracy and reduce the training time. The proposed algorithm is evaluated and verified by taking a joint regional operation scenario of air-defense and antimissile missiles as an example. The results show that the proposed algorithm can quickly and accurately generate the design scheme of air-defense firepower network compared with the rule-based and heuristic algorithm. Especially in terms of computational efficiency and operational cost in the same large-scale combat scenario, it provides the basis for the construction of a kill network in the whole process of the operation system-of-systems.

Key words: air-defense firepower network, improved PPO algorithm, OODA operation process, battlefield situation, end-to-end training

中图分类号:

E917

罗雨雨, 丁伟, 明振军, 李传浩, 王国新, 阎艳, 王玉茜. 面向OODA作战流程的防空火力网端对端智能构建算法[J]. 兵工学报, 2024, 45(12): 4231-4245.

LUO Yuyu, DING Wei, MING Zhenjun, LI Chuanhao, WANG Guoxin, YAN Yan, WANG Yuqian. End-to-end Intelligent Construction Algorithm of Air-defense Firepower Network for OODA Operation Process[J]. Acta Armamentarii, 2024, 45(12): 4231-4245.

图/表 19

表1 各种算法特点及其适用范围

Table 1 Features and application scopes of various algorithms

算法	类别	特点	局限性	适用范围	典型算法
精确求解算法		能够求解出精确最优解,便于理解与实现。	难以求解大规模问题;一般需要问题的精确数学模型。	可适用于约束简单的中小规模问题。	枚举法、分支定界法、匈牙利算法等。
近似求解算法	基于规则	充分利用领域知识,可缩短求解时间,可信度较高。	求解时间较长,解的质量依赖于规则的设定。	可适用于约束不多的静态中小规模问题。	基于博弈、基于拍卖规则的求解算法。
	启发式	算法框架固定,可实现大范围解空间探索。	算法性能不够稳定,容易陷入局部最优,很难得到最优解。	可适用于静态大规模问题。	遗传算法、蚁群算法、鸟群算法、粒子群算法等。
	强化学习	可扩展性强,算法通用性好,训练好的模型执行效率高、求解快速。	需依赖强大算力。	可适用于动态大规模问题。	深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)、异步优势执行者/评论家(Asynchronous Advantage Actor-Critic, A3C)算法、近端策略优化(Proximal Policy Optimization, PPO)算法。

图1 面向OODA作战流程的防空体系设计框架

Fig.1 Design framework of air-defense system-of-systems for OODA operation process

图2 防空火力网构建需满足的3大性能

Fig.2 Three major performances to be satisfied in the construction of air-defense firepower network

图3 改进的网络结构和动作输出模块

Fig.3 Improved policy network structure and action output module

算法1:IPPO策略网络训练核心伪代码

1.定义状态空间S、动作空间A、奖励函数R和策略π_θ_old(a|s,in

v a t

);
2.定义IPPO算法的超参数:学习率、折扣因子等;
3.初始化策略π(a|s)的训练参数θ;
4.for epoch=1,2,…,epoch do
5. for steps=1,2,…,step do
6. 输入状态s_t和不可行动作inv_a_t,通过策略π_θ_old(a|s,in

v a t

)得到动作a_t;
7. 执行动作a_t,得到奖励r_t、下一个状态s_t₊₁和不可行动作inv_a_t+₁;
8. 将(s_t,inv_a_t, a_t,r_t,s_t+₁,inv_a_t+₁)加入经验池;
9. if经验池存满,update
10. for k=1,2,…,do
11. 通过最大化PPO-Clip目标来更新策略π;
12. r_t(θ)=

∏ t = 1 n π θ (a i t | s t, i n v a t π θ o l d (a | s, i n v_a t)

13. J^C(θ)=E_t[min(r_t(θ)A_t,clip(r_t(θ),1-ε,1+ε)A_t]
14.通过Adam随机梯度上升,更新θ
15.π_θ_old(a|s,in

v a t

)=π_θ(a_it|s_t,in

v a t

)
16. end for
17. end for
18.end for

算法1:IPPO策略网络训练核心伪代码

1.定义状态空间S、动作空间A、奖励函数R和策略π_θ_old(a|s,in

v a t

v a t

∏ t = 1 n π θ (a i t | s t, i n v a t π θ o l d (a | s, i n v_a t)

13. J^C(θ)=E_t[min(r_t(θ)A_t,clip(r_t(θ),1-ε,1+ε)A_t]
14.通过Adam随机梯度上升,更新θ
15.π_θ_old(a|s,in

v a t

)=π_θ(a_it|s_t,in

v a t

)
16. end for
17. end for
18.end for

表2 基于装备类型的分配情况统计

Table 2 Distribution statistical table based on equipment type

目标编号和总计	目标类型	武器类型1	武器类型2	武器类型3	武器类型4
1	1	1	0	0	0
2	3	0	0	2	0
3	3	0	1	1	0
4	2	1	2	0	0
5	4	1	0	0	1
6	1	1	1	0	0
7	4	2	0	0	0
8	1	1	0	0	0
9	2	0	0	3	1
10	1	1	1	0	0
总计		8	5	6	2

表3 基于装备编号的分配情况统计

Table 3 Distribution statistical table based on equipment number

目标编号与总计	目标类型	武器类型1	武器类型2		武器类型3	武器类型4
目标编号与总计	目标类型	武器1	武器2	武器3	武器4	武器5	武器6
1	1	1	0	0	0	0	0
2	3	0	0	0	0	2	0
3	3	0	0	0	1	1	0
4	2	1	0	0	2	0	0
5	4	1	0	0	0	0	1
6	1	1	0	1	0	0	0
7	4	0	2	0	0	0	0
8	1	0	1	0	0	0	0
9	2	0	0	0	0	3	1
10	1	0	1	1	0	0	0
总计		4	4	2	3	6	2

图4 考虑导弹要素量和距离最近原则的防空火力网构建

Fig.4 The construction of air-defense firepower network considering missile factor quantity and distance nearest principle

图5 防空反导联合区域作战场景搭建

Fig.5 The construction of air-defense and anti-missile joint regional operation scenario

表4 来袭目标的作战信息(蓝方)

Table 4 Operation information of the attacking target (Blue)

类型	来袭目标	威胁度	飞行速度/(km·h^-1)
1	隐身战斗机	0.85~0.95	800~1000
2	巡航导弹	0.80~0.90	800~900
3	中型无人机	0.50~0.60	100~250
4	小型无人机	0.40~0.60	80~100

表5 拦截装备的作战信息(红方)

Table 5 Operation information of interception equipment(Red)

类型	红方装备	成本/ 百万元	总要素量	通道数	最大航路捷径/km	可拦截最大速度/(m·s^-1)	最小发射距离/km	打击目标类型
1	导弹1	5	24	6	6	6	6	巡航导弹、隐身战斗机
2	导弹2	3	12	6	6	6	6	巡航导弹、隐身战斗机、中型无人机
3	导弹3	1	96	8	8	8	8	中型无人机、小型无人机
4	高功率微波	0.3	40	1	1	1	1	小型无人机
5	高能激光	0.1	20	1	1	1	1	小型无人机

表6 拦截装备对目标的毁伤概率

Table 6 Damage probability of interception equipment to target

装备名称	隐身战斗机	巡航导弹	中型无人机	小型无人机
导弹1	0.85	0.90	0	0
导弹2	0.65	0.75	0.85	0
导弹3	0	0	0.70	0.65
高功率微波	0	0	0	0.55
高能激光	0	0	0	0.45

表7 实验对比算法的超参数设定

Table 7 Hyper-parameter initialization of different algorithms

对比算法	参数	数值
基于合作博弈规则算法	迭代次数	500
	打击效率与成本权衡比	0.6
	迭代次数	500
启发式NSGA-Ⅱ算法	种群规模	120
	交叉算子	0.6
	变异算子	0.05
	最大训练次数	20000
	最大单步回合数	10
	全连接隐含层维度	64
DRLIPPO算法	策略更新一次的步数	2000
	Actor网络学习率	0.001
	Critic网络学习率	0.002
	选择率	0.95
	折扣因子	0.9

表8 统计分析

Table 8 Distribution statistical table based on equipment number

目标数量	火力单元数量	算法	响应时间/s	打击效能	作战成本/百万元	综合计算指标
		Ruled-based	0.0068	0.7971	19.1	9.403986
	10	NSGA-Ⅱ	0.0235	0.7434	18.8	2.578329
		IPPO	0.0318	0.7412	18.4	1.941029
		Ruled-based	0.0067	0.8551	29.7	8.334818
10(小规模)	15	NSGA-Ⅱ	0.0234	0.8053	24.4	2.735662
		IPPO	0.0314	0.8128	23.9	2.100713
		Ruled-based	0.0069	0.884	33.7	7.798923
	20	NSGA-Ⅱ	0.0237	0.8313	24.5	2.937005
		IPPO	0.0317	0.8296	25.6	2.097157
		Ruled-based	0.0071	0.7103	90.96	6.900191
	50	NSGA-Ⅱ	0.0359	0.7012	88.35	1.386976
		IPPO	0.0083	0.7091	80.03	6.697376
		Ruled-based	0.0073	0.8263	138.32	6.745441
50(中规模)	75	NSGA-Ⅱ	0.0364	0.8015	129.89	1.397355
		IPPO	0.0084	0.828	120.33	6.752391
		Ruled-based	0.0076	0.8559	158.24	6.209361
	100	NSGA-Ⅱ	0.0359	0.8213	142.38	1.401884
		IPPO	0.0087	0.8521	122.13	6.996852
		Ruled-based	0.0088	0.6956	540.12	6.014687
	300	NSGA-Ⅱ	0.0389	0.6045	525.39	1.215602
		IPPO	0.0037	0.6639	504.87	14.606554
		Ruled-based	0.0089	0.8111	717.36	5.673290
300(大规模)	450	NSGA-Ⅱ	0.0383	0.7432	698.45	1.240679
		IPPO	0.0039	0.8018	643.21	14.273695
		Ruled-based	0.0091	0.8489	751.34	5.589298
	600	NSGA-Ⅱ	0.0381	0.7298	734.33	1.174265
		IPPO	0.0036	0.8327	651.23	15.989331

图6 3种算法的响应时间指标对比

Fig.6 Comparison of response timeindexes of three algorithms

图7 3种算法的打击效能指标对比

Fig.7 Comparison of strike effect indexes of three algorithms

图8 3种算法的作战成本指标对比

Fig.8 Comparison of operation cost indexes of three algorithms

图9 3种算法的综合指标对比

Fig.9 Comparison of comprehensive indexes of three algorithms

图10 遵守不同数量规则时的策略网络训练收敛情况

Fig.10 The convergence of policy network training when complying with different number of rules

参考文献 32

[1]	万斯来, 王国新, 明振军, 等. 基于知识推理的杀伤网智能设计方法[J]. 兵工学报, 2024, 45(4):1025-1037. doi: 10.12382/bgxb.2022.1077
	WAN S L, WANG G X, MING Z J, et al. Knowledge reasoning-based intelligent design method of kill-web[J]. Acta Armamentarii, 2024, 45(4):1025-1037. (in Chinese) doi: 10.12382/bgxb.2022.1077
[2]	唐毓燕, 李芳芳, 张振宇, 等. 美国弹道导弹防御系统中的杀伤链与杀伤网解析[J]. 现代防御技术, 2023, 51(1): 1-10. doi: 10.3969/j.issn.1009-086x.2023.01.001
	TANG Y Y, LI F F, ZHANG Z Y, et al. Analysis of kill chain and kill net inside the US ballistic missile defense system[J]. Modern Defence Technology, 2023, 51(1): 1-10. (in Chinese)
[3]	刘麦笛, 夏博远, 杨志伟, 等. 考虑集群协同特性的马赛克战体系能力需求满足度评估方法[J]. 系统工程理论与实践, 2023, 43(8):2447-2466. doi: 10.12011/SETP2022-0477
	LIU M D, XIA B Y, YANG Z W, et al. Capability requirement satisfaction degree evaluation considering cluster collaboration characteristics for mosaic warfare system of Systems[J]. Systems Engineering—Theory & Practice, 2023, 43(8):2447-2466. (in Chinese)
[4]	李强, 王飞跃. 马赛克战概念分析和未来陆战场网信体系及其智能对抗研究[J]. 指挥与控制学报, 2020, 6(2): 87-93.
	LI Q, WANG F Y. Conceptual analysis of mosaic warfare and systems of network-information systems for intelligent countermeasures and future land battles[J]. Journal of Command and Control, 2020, 6(2): 87-93. (in Chinese)
[5]	陈登, 陈楚湘, 周春华. 基于OODA环的杀伤网节点重要性评估[J]. 兵工学报, 2024, 45(2):363-372. doi: 10.12382/bgxb.2022.0623
	CHEN D, CHEN C X, ZHOU C H. Importance evaluation of kill network nodes based on OODA ring[J]. Acta Armamentarii, 2024, 45(2):363-372. (in Chinese) doi: 10.12382/bgxb.2022.0623
[6]	张传良, 丁浩淼. 从杀伤链到杀伤网—全域作战视角下的杀伤链战略[J]. 军事文摘, 2021(3): 7-12.
	ZHANG C L, DING H M. From killing chain to killing net—killing chain strategy from the perspective of global warfare[J]. Military Digest, 2021(3): 7-12. (in Chinese)
[7]	WANG Z C, PU J, CAO L L, et al. A parallel biological optimization algorithm to solve the unbalanced assignment problem based on DNA molecular computing[J]. International Journal of Molecular Sciences, 2015, 16(10): 25338-25352. doi: 10.3390/ijms161025338 pmid: 26512650
[8]	LU Y P, CHEN D Z. A new exact algorithm for the weapon-target assignment problem[J]. Omega, 2021, 98: 102138.
[9]	KLINE A G, AHNER D K, LUNDAY B, et al. Real-time heuristic algorithms for the static weapon target assignment problem[J]. Journal of Heuristics, 2019, 25(3): 377-397.
[10]	张进, 郭浩, 陈统. 基于可适应匈牙利算法的武器-目标分配问题[J]. 兵工学报, 2021, 42(6): 1339-1344. doi: 10.3969/j.issn.1000-1093.2021.06.025
	ZHANG J, GUO H, CHEN T. Weapon-target assignment based on adaptable hungarian algorithm[J]. Acta Armamentarii, 2021, 42(6): 1339-1344. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.06.025
[11]	张先剑. 空陆攻防博弈的动态武器目标分配[J]. 国防科技大学学报, 2019, 41(2): 185-190.
	ZHANG X J. Land defense weapon versus target assignment against air attack[J]. Journal of National University of Defense, 2019, 41(2): 185-190. (in Chinese)
[12]	ZHAO P, WANG J Z, KONG L R. Decentralized algorithms for weapon-target assignment in swarming combat system[J]. Mathematical Problems in Engineering, 2019.
[13]	GAO Y, LI D S, ZHONG H. A novel target threat assessment method based on three-way decisions under intuitionistic fuzzy multi-attribute decision making environment[J]. Engineering Applications of Artificial Intelligence, 2020, 87: 103276.
[14]	LI X Y, ZHOU D Y, YANG Z, et al. A novel genetic algorithm for the synthetical sensor-weapon-target assignment problem[J]. Applied Sciences, 2019, 9(18): 3803.
[15]	HU X W, LUO P C, ZHANG X N, et al. Improved ant colony optimization for weapon-target assignment[J]. Mathematical Problems in Engineering, 2018, 2018: 6481635.
[16]	CHANG T Q, KONG D P, HAO N, et al. Solving the dynamic weapon target assignment problem by an improved artificial bee colony algorithm with heuristic factor initialization[J]. Applied Soft Computing, 2018, 70: 845-863.
[17]	HE S, YUE S H, WANG G, et al. Target assignment algorithm for joint air defense operation based on spatial crowdsourcing mode[J]. Electronics, 2022; 11(11): 1779.
[18]	YUAN Y L, YU Z L, HUA L, et al. Hierarchical dynamic movement primitive for the smooth movement of robots based on deep reinforcement learning[J]. Applied Intelligence, 2023, 53(2): 1417-1434.
[19]	李璐璐, 朱睿杰, 隋璐瑶, 等. 智能集群系统的强化学习方法综述[J]. 计算机学报, 2023, 46(12):2573-2596.
	LI L L, ZHU R J, SUI L Y, et al. The reinforcement learning methods for intelligent collective system: a survey[J]. Chinese Journal of Computers, 2023, 46(12):2573-2596. (in Chinese)
[20]	HU C Y. A confrontation decision-making method with deep reinforcement learning and knowledge transfer for multi-agent system[J]. Symmetry-Basel, 2020, 12(4):631.
[21]	朱建文, 赵长见, 李小平, 等. 基于强化学习的集群多目标分配与智能决策方法[J]. 兵工学报, 2021, 42(9): 2040-2048.
	ZHU J W, ZHAO C J, LI X P, et al. Multi-target assignment and intelligent decision based on reinforcement learning[J]. Acta Armamentarii, 2021, 42(9): 2040-2048. (in Chinese) doi: 10.3969/j.issn.1000-1093.2021.09.025
[22]	LUO W L, LÜ J H, LIU K X, et al. Learning-based policy optimization for adversarial missile-target assignment[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(7): 4426-4437.
[23]	肖友刚, 金升成, 毛晓, 等. 基于深度强化学习的舰船导弹目标分配方法[J]. 控制理论与应用, 2024, 41(6):990-998.
	XIAO Y G, JIN S C, MAO X, et al. Missile-target assignment method of naval ship based on deep reinforcement learning[J]. Control Theory & Applications, 2024, 41(6):990-998. (in Chinese)
[24]	马悦, 吴琳, 许霄. 基于多智能体强化学习的协同目标分配[J]. 系统工程与电子技术, 2023, 45(9):2793-2801. doi: 10.12305/j.issn.1001-506X.2023.09.18
	MA Y, WU L, XU X. Cooperative targets assignment based on multi-agent reinforcement learning[J]. Systems Engineering and Electronics, 2023, 45(9):2793-2801. (in Chinese) doi: 10.12305/j.issn.1001-506X.2023.09.18
[25]	WANG T, FU L Y, WEI Z X, et al. Unmanned ground weapon target assignment based on deep Q-learning network with an improved multi-objective artificial bee colony algorithm[J]. Engineering Applications of Artificial Intelligence, 2023, 117: 105612.
[26]	VANVUCHELEN N, GIJSBERCHTS J, BOUTE R. Use of proximal policy optimization for the joint replenishment problem[J]. Computers in Industry, 2020, 119:103239.
[27]	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms: arXiv:1707.06347[R/OL]. Ithaca, NY, US: Cornell University, 2017(2017-07-20). https://arxiv.org/abs/1707.06347.
[28]	HUANG S Y, ONTAON S. A closer look at invalid action masking in policy gradient algorithms: arXiv:2006.14171[R/OL]. Ithaca, NY, US: Cornell University, 2022(2022-05-31). https://arxiv.org/abs/2006.14171.
[29]	LEE J H, KIM H J. Reinforcement learning for robotic flow shop scheduling with processing time variations[J]. International Journal of Production Research, 2022, 60(7): 2346-2368.
[30]	曹占广, 陶帅, 胡晓峰, 等. 国外兵棋推演及系统研究进展[J]. 系统仿真学报, 2021, 33(9): 2059-2065. doi: 10.16182/j.issn1004731x.joss.20-0726
	CAO Z G, TAO S, HU X F, et al. Abroad wargaming deduction and system research[J]. Journal of System Simulation, 2021, 33(9): 2059-2065. (in Chinese) doi: 10.16182/j.issn1004731x.joss.20-0726
[31]	金林. 弹道导弹防御系统综述[J]. 现代雷达, 2012, 34(12): 1-7.
	JIN L. Overview of ballistic missile defense system[J]. Modern Radar, 2012, 34(12): 1-7. (in Chinese)
[32]	BAILEY E T, CALDAS L. Operative generative design using non-dominated sorting genetic algorithm Ⅱ (NSGA-II)[J]. Automation in Construction, 2023, 155: 105026.

目标编号和总计	目标类型	武器类型1	武器类型2	武器类型3	武器类型4
1	1	1	0	0	0
2	3	0	0	2	0
3	3	0	1	1	0
4	2	1	2	0	0
5	4	1	0	0	1
6	1	1	1	0	0
7	4	2	0	0	0
8	1	1	0	0	0
9	2	0	0	3	1
10	1	1	1	0	0
总计		8	5	6	2

目标编号和总计	目标类型	武器类型1	武器类型2	武器类型3	武器类型4
1	1	1	0	0	0
2	3	0	0	2	0
3	3	0	1	1	0
4	2	1	2	0	0
5	4	1	0	0	1
6	1	1	1	0	0
7	4	2	0	0	0
8	1	1	0	0	0
9	2	0	0	3	1
10	1	1	1	0	0
总计		8	5	6	2