Tehnički vjesnik, Vol. 33 No. 3, 2026.
Izvorni znanstveni članak
https://doi.org/10.17559/TV-20251211003197
Research on Intelligent Transformer Fault Diagnosis Model Based on Multimodal Data Fusion and Deep Learning
Cunjian Tian
; Extra-high Voltage Branch of State Grid Fujian Electric Power Co., Ltd., Fuzhou Fujian, 350011, China
*
* Dopisni autor.
Sažetak
To enhance the accuracy of transformer fault diagnosis, this study is dedicated to designing a hybrid intelligent mechanism for fault diagnosis, which organically integrates multimodal data fusion strategies with adaptive deep learning models. The key information required for diagnosis is derived from the analysis of dissolved gases in the oil and serves as the input feature of the deep model. The key to model training lies in the introduction of an adaptive mechanism that can dynamically calibrate the learning rate based on the real-time convergence trend. An adaptive learning mechanism that can dynamically adjust the learning rate during the iterative process is proposed, thereby enhancing the convergence accuracy of the model while improving its training efficiency. Through specific cases, important parameters such as the number of hidden layers and the learning rate adjustment coefficient in the adaptive deep learning model were determined. The experimental results show that the proposed method performs excellently in feature extraction and analysis, featuring a faster convergence speed and higher convergence accuracy, which can significantly improve the accuracy of transformer fault diagnosis. Aiming at the problem that data alignment and series fusion are often ignored in the traditional multimodal data fusion process, this paper further proposes a graph-text multimodal fusion model based on the cross-attention mechanism. This model first uses BERT and ConvNeXt to extract text and image features respectively. Subsequently, with the help of the attention mechanism in the Image Transformer, the detailed information in the feature map output by ConvNeXt is further extracted to obtain higher-level image features and ensure that the image and text features are consistent in dimension. Finally, the alignment and fusion of graphic and text features are achieved through the cross-attention module. Experiments on the three datasets of MSAW-Single, MSAW-Multiple and MMSD show that the classification accuracy of the image-text multimodal fusion model based on cross-attention reaches 75.21%, 73.15% and 85.85% respectively, verifying the effectiveness of this method.
Ključne riječi
adaptive deep learning model; fault diagnosis; learning rate; multimodal analysis; multimodal data fusion; transformer
Hrčak ID:
346732
URI
Datum izdavanja:
30.4.2026.
Posjeta: 0 *