Prediction of all-cause mortality for chronic kidney disease patients using four models of machine learning
The prediction tools developed from general population data to predict all-cause mortality are not adapted to chronic kidney disease (CKD) patients, because this population displays a higher mortality risk. This study aimed to create a clinical prediction tool with good predictive performance to predict the 2-year all-cause mortality of stage 4 or stage 5 CKD patients.
The performance of four different models (deep learning, random forest, Bayesian network, logistic regression) to create four prediction tools was compared using a 10-fold cross validation. The model that offered the best performance for predicting mortality in the Photo-Graphe 3 cohort was selected and then optimized using synthetic data and a selected number of explanatory variables. The performance of the optimized prediction tool to correctly predict the 2- year mortality of the patients included in the Photo-Graphe 3 database were then assessed.
Prediction tools developed using the Bayesian network and logistic regression tended to have the best performances. Although not significantly different from lo- gistic regression, the prediction tool developed using the Bayesian network was chosen because of its advantages and then optimized. The optimized prediction tool that was developed using synthetic data and the seven variables with the best predictive value (age, erythropoietin-stimulating agent, cardiovascular history, smoking status, 25-hydroxy vitamin D, parathyroid hormone and ferritin levels) had satisfactory internal performance.
A Bayesian network was used to create a seven-variable prediction tool to predict the 2-year all-cause mortality in patients with stage 4–5 CKD.