Welcome to Acta Armamentarii ! Today is

Acta Armamentarii ›› 2023, Vol. 44 ›› Issue (S2): 101-113.doi: 10.12382/bgxb.2023.0881

Special Issue: 群体协同与自主技术

Previous Articles     Next Articles

Multi-agent Coverage Path Planning Based on Security Reinforcement Learning

LI Song1, MA Zhuangzhuang1, ZHANG Yunlin1, SHAO Jinliang1,2,3,*()   

  1. 1 School of Automation Engineering,University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, China
    2 Research Center on Crowd Spectrum Intelligence, Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen 518054, Guangdong, China
    3 Laboratory of Electromagnetic Space Cognition and Intelligent Control, Beijing 100089, China
  • Received:2023-09-06 Online:2024-01-10
  • Contact: SHAO Jinliang

Abstract:

The purpose of coverage path planning is to find a safe path for an agent, which can not only effectively cover the task area, but also avoid obstacles and neighboring agents. Complex and large task areas are always unavoidable when the coverage tasks are performed, so it is worth exploring how to ensure the safety of agents and enhance the collaboration between agents to improve the task efficiency and capacity of cluster. Therefore, a discrete coverage path planning mathematical model is established using raster maps, a secure multi-agent reinforcement learning algorithm based on value decomposition network is proposed, and its reasonableness is theoretically demonstrated. The proposed algorithm helps to strengthen the learning of collaborative coverage strategies among the agents by decomposing the group value function to avoid the false rewards of the agents, thus improving the convergence speed of the algorithm. The safety of the agent during an entire task is guaranteed by introducing a shield in the training process to correct the behaviors of the agent, such as out-of-bounds and collision. The simulated and semi-physical experiment results show that the algorithm can not only ensure the coverage efficiency of the agents, but also effectively maintain the safety of the agents.

Key words: multi-agent system, coverage path planning, safe reinforcement learning, value decomposition network

CLC Number: