欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2021, Vol. 42 ›› Issue (9): 2040-2048.doi: 10.3969/j.issn.1000-1093.2021.09.025

• 论文 • 上一篇    

基于强化学习的集群多目标分配与智能决策方法

朱建文1, 赵长见2, 李小平1, 包为民1,3   

  1. (1.西安电子科技大学 空间科学与技术学院, 陕西 西安 710126; 2.中国运载火箭技术研究院, 北京 100076;3.中国航天科技集团有限公司, 北京 100048)
  • 上线日期:2021-10-20
  • 通讯作者: 李小平(1961—),女,教授,博士生导师 E-mail:xpli@xidian.edu.cn
  • 作者简介:朱建文(1987—),男,讲师,博士。E-mail: zhujianwen1117@163.com
  • 基金资助:
    国家自然科学基金项目(61703409);中国博士后科学基金项目(2019M66364)

Multi-target Assignment and Intelligent Decision Based on Reinforcement Learning

ZHU Jianwen1, ZHAO Changjian2, LI Xiaoping1, BAO Weimin1,3   

  1. (1.School of Aerospace Science and Technology, Xidian University, Xi'an 710126, Shaanxi, China;2.China Academy of Launch Vehicle Technology, Beijing 100076, China;3.China Aerospace Science and Technology Corporation, Beijing 100048, China)
  • Online:2021-10-20

摘要: 为提升高动态协同攻击条件下的攻防效能,研究基于强化学习的集群多目标智能分配与决策方法。建立综合攻击性能评估准则,包括基于相对运动信息的攻击优势度评估以及基于目标固有信息的威胁度评估。综合攻击性能、突防概率以及攻击消耗,设计攻防效费比性能指标。构建基于强化学习的多目标决策架构,设计以分配向量为基本元素的动作空间,以及基于量化性能指标的状态空间,利用Q-Learning方法对协同攻击方案,包括导弹选取以及分配形式进行智能决策。仿真结果表明,强化学习能够实现攻防效能最优的多目标在线决策,其计算效率相对于粒子群优化算法具有更明显的优势。

关键词: 目标分配, 协同攻击, 攻防效能, 智能决策, 强化学习

Abstract: A reinforcement learning-based swarm intelligent decision-making method of cooperative multi-target attack under high-dynamic situation is proposed. The composite evaluation criteria of attack performance is established, including the evaluation of attack superiority based on relative motion information and the threat evaluation based on the inherent information of target. To evaluate the attack-defence effectiveness, a cost-effectiveness ratio index is designed by combining attack performance, penetration probability and attack cost together. In addition, a multi-target decision-making architecture based on reinforcement learning is constructed, and an action space with allocation vectors as basic elements and a state space based on quantified performance indicators are designed. Q-Learning is employed to make intelligent decisions on cooperative attack plans, including missile selection and target assignment. The simulated results show that reinforcement learning can achieve multi-target online decision-making with the optimal offensive and defensive effectiveness, and its computational efficiency has more obvious advantages than that of particle swarm optimizer.

Key words: targetassignment, cooperativeattack, attack-defenseeffectiveness, intelligentdecision, reinforcementlearning

中图分类号: