北京航空航天大学自动化科学与电气工程学院,北京,100191
收稿:2025-04-14,
网络首发:2026-03-09,
移动端阅览
艾雨豪,刘正华,王卫红. 基于迭代式师生训练的协同区域覆盖算法[J/OL]. 兵工学报, 2026(2026-03-09). https://doi.org/10.12382/bgxb.2025.0274.
. Collaborative area coverage algorithm based on iterative teacher-student training ai yuhao, liu zhenghua*, wang weihong[J/OL]. Acta Armamentarii, 2026(2026-03-09). https://doi.org/10.12382/bgxb.2025.0274. (in Chinese)
艾雨豪,刘正华,王卫红. 基于迭代式师生训练的协同区域覆盖算法[J/OL]. 兵工学报, 2026(2026-03-09). https://doi.org/10.12382/bgxb.2025.0274. DOI:
. Collaborative area coverage algorithm based on iterative teacher-student training ai yuhao, liu zhenghua*, wang weihong[J/OL]. Acta Armamentarii, 2026(2026-03-09). https://doi.org/10.12382/bgxb.2025.0274. (in Chinese) DOI:
针对无先验信息条件下的无人机协同区域覆盖路径规划问题,提出一种融合课程学习与异构智能体近端策略优化(Heterogeneous-Agent Proximal Policy Optimization
HAPPO)算法的迭代式师生训练框架。通过设计多阶段课程训练方法,将复杂覆盖任务分解为从简单到复杂的子任务序列,显著提升了HAPPO算法的训练效率与收敛速度。对于多场景中深度强化学习算法泛化能力不足的问题,在课程训练基础上引入迭代式师生学习机制,将初级课程训练收敛的模型作为教师网络,通过知识蒸馏指导学生网络进行二次课程训练,从而增强模型对不同环境的适应性。仿真实验表明,通过引入课程训练与迭代式师生学习机制,所提算法在训练过程中表现出更快的收敛速度,并在测试阶段实现了更短的区域覆盖时间与更低的单元重复访问率,显著提升了多无人机协同区域覆盖任务中的效率与鲁棒性。
To address the problem of cooperative area coverage path planning for unmanned aerial vehicles (UAVs) under conditions with no prior information
this paper proposes an iterative teacher-student training framework that integrates curriculum learning with the Heterogeneous-Agent Proximal Policy Optimization (HAPPO) algorithm. By designing a multi-stage curriculum training method
the complex coverage task is decomposed into a sequence of sub-tasks ranging from simple to complex
significantly improving the training efficiency and convergence speed of the HAPPO algorithm.To solve the poor generalization capability of deep reinforcement learning algorithms in multi-scenario applications
an iterative teacher-student learning mechanism is introduced based on curriculum training. Specifically
the model converged through primary curriculum training serves as the teacher network to guide the student network via knowledge distillation during secondary curriculum training
thereby enhancing the model's adaptability to different environments. Simulation results demonstrate that the curriculum training greatly accelerates the algorithm's convergence speed
and the proposed algorithm achieves shorter area coverage time and lower cell revisit rate
providing an efficient and robust solution for UAV cooperative area coverage tasks.
尹洪玉, 吴宇, 梁天骄.固定翼无人机巡逻覆盖协同路径规划方法[J]. 航空学报,2024,45(06):329-343.
YIN H Y, WU Y, LIANG T J. Cooperative path planning for patrol coverage of fixed wing UAV[J]. Acta Aeronautica et Astronautica Sinica,2024,45(06):329-343. (in Chinese)
马焱, 冯炜, 罗荣, 等. 复杂不规则海域下固定翼无人机覆盖搜潜路径规划[J]. 兵工学报,2022,43(增刊2):40-52.
MA Y, FENG W, LUO R, et al. Path planning for coverage search for submarines using fixed-wing UAVs in complex irregular sea area[J]. Acta Armamentarii, 2022,43(S2):40-52. (in Chinese)
李松, 麻壮壮, 张蕴霖, 等. 基于安全强化学习的多智能体覆盖路径规划[J]. 兵工学报,2023,44(增刊2):101-113.
LI S, MA Z Z, ZHANG Y L, et al. Multi-agent coverage path planning based on security reinforcement learning [J]. Acta Armamentarii, 2023,44(S2):101-113. (in Chinese)
QIU L . Research on hierarchical cooperative algorithm based on genetic algorithm and particle swarm optimization[C]//Proceedings of the International Symposium on Intelligence Computation and Applications. Singapore, Singapore: Springer, 2017: 16-25.
于全友, 徐止政, 段纳, 等. 基于改进ACO的带续航约束无人机全覆盖作业路径规划[J]. 航空学报,2023,44(12):303-315.
YU Q Y, XU Z Z, DUAN N, et al. Coverage operation path planning of UAV with endurance constraints based on improved ACO [J]. Acta Aeronautica et Astronautica Sinica, 2023,44(12):303-315. (in Chinese)
吴越安, 杜昌平, 杨睿, 等. 基于改进遗传算法的倾转旋翼无人机区域覆盖路径规划[J]. 浙江大学学报(工学版),2024,58(10):2031-2039.
WU Y A, DU C P, YANG R, et al. Area coverage path planning for tilt-rotor unmanned aerial vehicle based on enhanced genetic algorithm [J]. Journal of Zhejiang University(Engineering Science), 2024,58(10):2031-2039. (in Chinese)
MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing Atari with deep reinforcement learning: arXiv: 1312.5602 [R]. Ithaca, NY, US: Cornell University, 2013: 1312.5602.
KUMAR K, KUMAR N. Region coverage-aware path planning for unmanned aerial vehicles: A systematic review[J]. Physical Communication, 2023, 59(8):102073.1-102073.21.
XING B, WANG X, YANG L, et al. An algorithm of complete coverage path planning for unmanned surface vehicle based on reinforcement learning[J]. Journal of Marine Science & Engineering, 2023, 11(3): 645.
TANG J, GAO Y, LAM T. Learning to coordinate for a worker-station multi-robot system in planar coverage tasks[J]. IEEE Robotics and Automation Letters, 2022, 7(4): 12315-12322.
NASIRIAN B, MEHRANDEZH M, JANABI-SHARIFI F. Efficient coverage path planning for mobile disinfecting robots using graph-based representation of environment [J]. Frontiers in robotics and AI, 2021, 8:624333.
TAN X, HAN L, GONG H, et al. Biologically inspired complete coverage path planning algorithm based on Q-learning[J]. Sensors, 2023, 23(10): 14248220.
宋大雷, 吕昆岭, 陈小平, 等. 基于深度强化学习的无人船全覆盖路径规划[J]. 现代电子技术,2022,45(22):1-7.
SONG D L, LÜ K L, CHEN X P, et al. USV coverage path planning based on deep reinforcement learning[J]. Modern Electronics Technique,2022,45(22):1-7.(in Chinese)
SAHA O, GANAPATHY V, HEYDARI J, et al. Efficient coverage path planning in initially unknown environments using graph representation[C]// Proceedings of the 20th International Conference on Advanced Robotics . Ljubljana, Slovenia: IEEE, 2021: 1003-1010.
WANG Y, HE Z, CAO D, et al. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning[J]. Computers and Electronics in Agriculture, 2023, 205: 107593.
BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]//Proceedings of the 26th annual international conference on machine learning. Montreal, Canada: ACM, 2009: 41-48.
TESAUR O G. Temporal difference learning and TD-Gammon[J]. Communications of the ACM, 1995, 38(3): 58-68.
SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. nature, 2016, 529(7587): 484-489.
NARVEKAR S, PENG B, LEONETTI M, et al. Curriculum learning for reinforcement learning domains: A framework and survey[J]. Journal of Machine Learning Research, 2020, 21(181): 1-50.
HOU Y, LIANG X, LV M, et al. Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making[J]. Engineering Applications of Artificial Intelligence, 2023, 125: 106703.
LI Z, XIN J, LI N. Autonomous exploration and mapping for mobile robots via cumulative curriculum reinforcement learning[C]// Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2023: 7495-7500.
CAO X, LI M, TAO Y, et al. HMA-SAR: Multi-agent search and rescue for unknown located dynamic targets in completely unknown environments[J]. IEEE Robotics and Automation Letters, 2024, 9(6): 5567-5574.
KUBA J G, CHEN R, WEN M, et al. Trust region policy optimisation in multi-agent reinforcement learning[C]//Proceedings of the 10th International Conference on Learning Representations. [S.l.]: The International Conference on Learning Representations, 2022: 1046 .
SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation: arXiv: 1506.02438[R]. Ithaca, NY, US: Cornell University, 2015:1506.02438.
0
浏览量
0
下载量
0
CNKI被引量
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024360号