欢迎访问《兵工学报》官方网站,今天是

兵工学报

• •    下一篇

面向未知动目标的多机协同动态搜索决策方法

傅晋博1,张栋1*(),王孟阳1,邓杰1,2   

  1. (1. 西北工业大学 航天学院, 陕西 西安710072; 2. 四川九洲电器集团有限责任公司,四川 绵阳 621000)
  • 收稿日期:2025-01-16 修回日期:2025-03-21
  • 通讯作者: *邮箱:zhangdong@nwpu.edu.cn
  • 基金资助:
    国家自然科学基金项目(52472417)

Multi-Agent Collaborative Dynamic Search Decision Method for Unknown Moving Targets

FU Jinbo 1, ZHANG Dong 1*(), WANG Mengyang 1, DENG Jie 1,2   

  1. (1. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China; 2. Sichuan Jiuzhou Electric Appliance Group CO., LTD, Mianyang 621000, China)
  • Received:2025-01-16 Revised:2025-03-21

摘要: 为高效指引无人机群搜索指定区域内的多个未知动态目标,设计一种基于深度强化学习的预测驱动协同搜索决策方法(Deep Reinforcement Learning-Predictive Collaborative Search Decision Method,DRL-P-CSDM)。基于栅格化方法,综合环境信息和历史搜索信息构建环境信息图与信息确定性图,并通过设计时间衰减因子生成状态量,引导无人机进行区域重访以应对目标的主动规避,提升搜索效率。设计了一个功能分区的深度神经网络架构,能够自主对环境进行预测,避免了人工设计模型适配性差的问题。基于强化学习方法设计奖励函数,在稠密奖励中引入捕获概率,加速收敛过程,采用分布式架构,能够适应任意数量无人机的部署要求,并在通信距离受限和信息更新延迟的情况下仍能完成任务。通过算法对比、鲁棒性分析以及半实物仿真验证了方法的有效性。仿真结果表明: DRL-P-CSDM在目标检获率上较传统深度强化学习提高11.45%,任务完成时间减少48.02%,无人机生存概率提高10.31%;该方法具有较强的综合性、鲁棒性和通用性,能在多尺度复杂环境下稳定运行,不受集群规模限制,在安全监控、战场侦察、林区巡检和灾后救援等领域具有广泛的工程应用价值。

关键词: 多目标搜索, 区域侦察, 集群协同, 深度强化学习, 广域搜索

Abstract: In order to efficiently guide a drone swarm in searching multiple unknown dynamic targets within a designated area, a deep reinforcement learning-based predictive-driven collaborative search decision method (DRL-P-CSDM) is proposed. Using a grid-based approach, the method integrates environmental and historical search information to construct environmental information maps and information determinacy maps. A time-decaying window is designed to generate state variables, guiding drones to revisit regions and counteract the targets' active evasion, thereby improving search efficiency. A neural network architecture is designed, incorporating target location prediction, survival decision-making, and action judgment functionalities. This structure enables autonomous environmental prediction, eliminating the issue of poor adaptability commonly associated with manually designed models. A reward function is formulated based on reinforcement learning, where capture probability is introduced into a dense reward framework to accelerate the convergence process. A distributed architecture is employed, capable of accommodating arbitrary numbers of drones, and can still complete the task despite communication distance limitations and information update delays.Through algorithm comparison, robustness analysis, and semi-physical simulation, the effectiveness of the method is validated. Simulation results show that the DRL-P-CSDM improves target detection rate by 13.72%, reduces mission completion time by 48.65%, and increases drone survival probability by 14.86%. This method demonstrates strong comprehensiveness, robustness and versatility, capable of stable operation in multi-scale complex environments without limitations on swarm size. It holds broad engineering application potential in fields such as security monitoring, battlefield reconnaissance, forest area inspection, and post-disaster rescue operations.

Key words: multi-target search, area reconnaissance, swarm collaboration, deep reinforcement learning, wide-area search

中图分类号: