Introduction: Ventricular late potentials (VLPs) are small, high-frequency signals found at the end of the QRS complex, often associated with areas of fibrosis in the heart. Detecting VLPs is crucial for identifying patients at risk of ventricular arrhythmias and sudden cardiac death. Traditional methods, such as signal-averaged electrocardiograms (SAECGs), rely on the aggregation of multiple beats, but these techniques have limitations in sensitivity and specificity. (1-3) Aim: To develop and evaluate statistical learning models capable of detecting VLPs from a single cardiac beat, potentially reducing the need for time-consuming signal-averaging methods and improving the accuracy of real-time VLP identification.
Methods: We employed a range of statistical learning models—Decision Tree, Random Forest, TimeSeries Forest, and MiniRocket—to classify ECG beats as containing VLPs or not. The dataset consisted of 4500 beats across three leads (II, V2, V6), with equal representation of beats containing synthetic VLPs. Feature extraction included both time-domain and frequency-domain features, with special emphasis on the Fast Fourier Transform (FFT) and feature engineering to enhance model performance. The interpretability of the models was assessed using SHapley Additive exPlanations (SHAP) to analyze feature importance across all models.
Results: The Random Forest model achieved the highest accuracy, outperforming other models across all leads, with particularly strong performance in lead V2, where it reached an accuracy of 97.9%. The Decision Tree and TimeSeries Forest models also demonstrated reasonable performance, with the Decision Tree achieving an accuracy of 84.7% on lead V2 and the TimeSeries Forest showing a lower but consistent performance, with an accuracy around 50% across all leads. MiniRocket, while fast, showed the least consistent results, especially in capturing the subtle features associated with VLPs. Feature importance analysis revealed that frequency-domain features, particularly those derived from the FFT and its first derivative, were the most influential in detecting VLPs across models. SHAP further confirmed that these features had the greatest impact on model predictions, particularly in distinguishing beats containing VLPs from those without. Models trained on individual leads consistently outperformed those trained on all leads combined, with lead-specific characteristics playing a significant role in improving classification accuracy.
Conclusion: Machine learning models, particularly Random Forest, show strong potential in detecting VLPs from single cardiac beats. Training models on individual leads yields better classification performance than using combined leads, confirming that lead-specific characteristics are critical for accurate VLP detection. The findings suggest that incorporating frequency-domain features, especially those derived from FFT, is essential for enhancing model performance. This study demonstrates the feasibility of moving from traditional signal averaging to real-time, single-beat VLP detection using advanced statistical learning approaches, offering a more efficient and accurate method for risk stratification in patients prone to ventricular arrhythmias.
