. Collaborative area coverage algorithm based on iterative teacher-student training ai yuhao, liu zhenghua*, wang weihong[J/OL]. Acta Armamentarii, 2026(2026-03-09). https://doi.org/10.12382/bgxb.2025.0274. (in Chinese)
. Collaborative area coverage algorithm based on iterative teacher-student training ai yuhao, liu zhenghua*, wang weihong[J/OL]. Acta Armamentarii, 2026(2026-03-09). https://doi.org/10.12382/bgxb.2025.0274. (in Chinese)DOI:
Collaborative Area Coverage Algorithm based on Iterative Teacher-Student Training Ai Yuhao, Liu Zhenghua*, Wang weihong
To address the problem of cooperative area coverage path planning for unmanned aerial vehicles (UAVs) under conditions with no prior information
this paper proposes an iterative teacher-student training framework that integrates curriculum learning with the Heterogeneous-Agent Proximal Policy Optimization (HAPPO) algorithm. By designing a multi-stage curriculum training method
the complex coverage task is decomposed into a sequence of sub-tasks ranging from simple to complex
significantly improving the training efficiency and convergence speed of the HAPPO algorithm.To solve the poor generalization capability of deep reinforcement learning algorithms in multi-scenario applications
an iterative teacher-student learning mechanism is introduced based on curriculum training. Specifically
the model converged through primary curriculum training serves as the teacher network to guide the student network via knowledge distillation during secondary curriculum training
thereby enhancing the model's adaptability to different environments. Simulation results demonstrate that the curriculum training greatly accelerates the algorithm's convergence speed
and the proposed algorithm achieves shorter area coverage time and lower cell revisit rate
providing an efficient and robust solution for UAV cooperative area coverage tasks.
YIN H Y, WU Y, LIANG T J. Cooperative path planning for patrol coverage of fixed wing UAV[J]. Acta Aeronautica et Astronautica Sinica,2024,45(06):329-343. (in Chinese)
MA Y, FENG W, LUO R, et al. Path planning for coverage search for submarines using fixed-wing UAVs in complex irregular sea area[J]. Acta Armamentarii, 2022,43(S2):40-52. (in Chinese)
LI S, MA Z Z, ZHANG Y L, et al. Multi-agent coverage path planning based on security reinforcement learning [J]. Acta Armamentarii, 2023,44(S2):101-113. (in Chinese)
QIU L . Research on hierarchical cooperative algorithm based on genetic algorithm and particle swarm optimization[C]//Proceedings of the International Symposium on Intelligence Computation and Applications. Singapore, Singapore: Springer, 2017: 16-25.
YU Q Y, XU Z Z, DUAN N, et al. Coverage operation path planning of UAV with endurance constraints based on improved ACO [J]. Acta Aeronautica et Astronautica Sinica, 2023,44(12):303-315. (in Chinese)
WU Y A, DU C P, YANG R, et al. Area coverage path planning for tilt-rotor unmanned aerial vehicle based on enhanced genetic algorithm [J]. Journal of Zhejiang University(Engineering Science), 2024,58(10):2031-2039. (in Chinese)
MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing Atari with deep reinforcement learning: arXiv: 1312.5602 [R]. Ithaca, NY, US: Cornell University, 2013: 1312.5602.
KUMAR K, KUMAR N. Region coverage-aware path planning for unmanned aerial vehicles: A systematic review[J]. Physical Communication, 2023, 59(8):102073.1-102073.21.
XING B, WANG X, YANG L, et al. An algorithm of complete coverage path planning for unmanned surface vehicle based on reinforcement learning[J]. Journal of Marine Science & Engineering, 2023, 11(3): 645.
TANG J, GAO Y, LAM T. Learning to coordinate for a worker-station multi-robot system in planar coverage tasks[J]. IEEE Robotics and Automation Letters, 2022, 7(4): 12315-12322.
NASIRIAN B, MEHRANDEZH M, JANABI-SHARIFI F. Efficient coverage path planning for mobile disinfecting robots using graph-based representation of environment [J]. Frontiers in robotics and AI, 2021, 8:624333.
TAN X, HAN L, GONG H, et al. Biologically inspired complete coverage path planning algorithm based on Q-learning[J]. Sensors, 2023, 23(10): 14248220.
SONG D L, LÜ K L, CHEN X P, et al. USV coverage path planning based on deep reinforcement learning[J]. Modern Electronics Technique,2022,45(22):1-7.(in Chinese)
SAHA O, GANAPATHY V, HEYDARI J, et al. Efficient coverage path planning in initially unknown environments using graph representation[C]// Proceedings of the 20th International Conference on Advanced Robotics . Ljubljana, Slovenia: IEEE, 2021: 1003-1010.
WANG Y, HE Z, CAO D, et al. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning[J]. Computers and Electronics in Agriculture, 2023, 205: 107593.
BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]//Proceedings of the 26th annual international conference on machine learning. Montreal, Canada: ACM, 2009: 41-48.
TESAUR O G. Temporal difference learning and TD-Gammon[J]. Communications of the ACM, 1995, 38(3): 58-68.
SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. nature, 2016, 529(7587): 484-489.
NARVEKAR S, PENG B, LEONETTI M, et al. Curriculum learning for reinforcement learning domains: A framework and survey[J]. Journal of Machine Learning Research, 2020, 21(181): 1-50.
HOU Y, LIANG X, LV M, et al. Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making[J]. Engineering Applications of Artificial Intelligence, 2023, 125: 106703.
LI Z, XIN J, LI N. Autonomous exploration and mapping for mobile robots via cumulative curriculum reinforcement learning[C]// Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2023: 7495-7500.
CAO X, LI M, TAO Y, et al. HMA-SAR: Multi-agent search and rescue for unknown located dynamic targets in completely unknown environments[J]. IEEE Robotics and Automation Letters, 2024, 9(6): 5567-5574.
KUBA J G, CHEN R, WEN M, et al. Trust region policy optimisation in multi-agent reinforcement learning[C]//Proceedings of the 10th International Conference on Learning Representations. [S.l.]: The International Conference on Learning Representations, 2022: 1046 .
SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation: arXiv: 1506.02438[R]. Ithaca, NY, US: Cornell University, 2015:1506.02438.