Izvorni znanstveni članak
https://doi.org/10.21278/brod77316
Reinforcement learning-driven continuous maneuvering decision system for maritime collision prevention using proximal deterministic policy gradient
Xiao Yang
; School of Information and Engineering, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China
Chunlei Wang
; School of Information and Engineering, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China
*
Lei Zhou
; Jiangsu Province Engineering Research Center of Smart Poultry Farming and Intelligent Equipment, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China
Haiyan Wang
; Jiangsu Province Engineering Research Center of Smart Poultry Farming and Intelligent Equipment, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China
Fengying Wang
; Jiangsu Province Engineering Research Center of Smart Poultry Farming and Intelligent Equipment, Suqian University, Huanghe Road, 23800, Suqian City, Jiangsu Province, China
* Dopisni autor.
Sažetak
Continuous ship steering control is a highly nonlinear and complex task, as it is subject to wave and wind disturbances. It is also crucial for timely obstacle avoidance and effective vessel maneuvering. Reinforcement learning (RL) combined with deep neural networks (DNNs) has demonstrated significant potential in controlling systems with nonlinear dynamics, making it well-suited for decision-making and planning in such complex scenarios. However, existing research struggles to ensure optimal control performance. To address this limitation, this paper proposes an improved deep reinforcement learning approach based on the Pathwise Derivative Policy Gradient (PDPG) algorithm to enable intelligent collision avoidance for continuous ship steering. The proposed method leverages the MMG model as the foundation for learning a steering control strategy using DNNs, comprehensively considers various control actions, and evaluates steering performance through a dedicated evaluation network. To enhance the policy network’s representational capacity and balance exploration and exploitation, the PDPG algorithm’s policy network structure is optimized. Additionally, an adaptive exploration rate and a dynamic balancing algorithm for random strategies are introduced to fine-tune the exploration-exploitation trade-off. The improved method’s performance is verified through simulations of continuous ship steering control.
Ključne riječi
Continuous ship steering control; deep reinforcement learning; Pathwise Derivative Policy Gradient; MMG model; policy network
Hrčak ID:
345657
URI
Datum izdavanja:
1.7.2026.
Posjeta: 266 *