Goal: To determine the outcome predictor rank list in a population of pulmonary embolism (PE) patients with follow-up longer than one year using contemporary machine learning models.
Patients and Methods: Machine learning models (LightGBM variant of XGBoost) were used to analyse the outcome data of a PE cohort. Patients were recruited from November 2013 until November 2018 in two academic hospitals in metropolitan area and followed by a telephone interview or hospital visit. Primary outcome was all cause mortality. In all patients PE diagnosis was established by computed tomography. Two models were generated in both XGBoost and frequentistic analysis: 1) a model with 19 variables 2) a model with 8 variables. Both models were recreated from previously published results (1,2).
Results: The study population comprised of 761 patients (predominantly female (57.4%), aged 73 (61-81)) has been described previously (1,2). Median follow-up was 675 days (114-1331). Death within follow-up occurred in 335 cases (44.0%). In XGBoost algorhitm, Pulmonary Embolism Severity Index (PESI) score and body mass index (BMI) were the two strongest predictors of primary outcome. Overall, the models were accurate with area under curve of 0.840 and 0.864. For BMI, this is contrary to the results of frequentistic statistic inference, in which BMI failed to enter the Cox proportional hazards model.
Conclusion: In the XGBoost analysis, a machine learning framework more suitable to handle non-linear data, outcome analysis yielded different results as compared to frequentist statistical inference. Since such non-normally distributed data prevail in health care data bases, machine learning models may provide deeper insight in analysis of variables impact on outcome.
