%0 Journal Article
%T Machine learning-based statistical prediction of disease progression in patients with diabetes mellitus
%A Bernabe Canqui Flores
%A Edward Torres-Cruz
%A Jose Panfilo Tito Lipa
%A Fredy Heric Villasante Saravia
%A Percy Huata Panca
%A Angel Javier Quispe Carita
%A Milton Vladimir Mamani Calizaya
%J Journal of Advanced Pharmacy Education and Research
%@ 2249-3379
%D 2026
%V 16
%N 2
%R 10.51847/RAxQLHoMH9
%P 128-138
%X Diabetes progression causes avoidable morbidity through worsening glycemic control, treatment intensification, kidney disease, cardiovascular events, and microvascular complications. Conventional clinical risk scores are useful but often provide only moderate predictive accuracy for individual patients. More realistic prediction requires models that can use routine clinical data while acknowledging missingness, treatment changes, and irregular follow-up. Traditional regression models are interpretable but can underperform when disease progression depends on nonlinear interactions among glycemia, kidney function, treatment adherence, obesity, and comorbidity. Real-world EHR data are also noisy, incomplete, and unevenly sampled across patients. These properties make diabetes progression prediction a practical machine learning problem rather than a purely theoretical modeling exercise. This article develops and validates machine learning models for predicting 2-year disease progression in adults with type 2 diabetes mellitus. The main models are elastic net logistic regression, random forest, and XGBoost, with optional survival modeling for time-to-event extensions. The goal is not to propose an idealized model, but a realistic clinical prediction workflow suitable for retrospective EHR data. A retrospective cohort of 3,000 adults with type 2 diabetes is specified, with at least 2 years of follow-up after an index encounter. Disease progression is defined as HbA1c worsening of at least 1% or initiation of insulin therapy within 2 years, with secondary microvascular outcomes considered where coding is reliable. Candidate predictors include demographics, HbA1c, fasting glucose, BMI, blood pressure, lipids, eGFR, albuminuria, medication classes, adherence proxies, smoking, physical activity, and area-level socioeconomic indicators. Conceptually, XGBoost achieves an AUROC of 0.82 with a 95% confidence interval of 0.79–0.85, compared with 0.74 with a 95% confidence interval of 0.71–0.77 for elastic net logistic regression. The strongest predictors are baseline HbA1c, diabetes duration, medication adherence, BMI, eGFR, albuminuria, and recent treatment intensification. Calibration and decision curve analysis support clinical usefulness only at intermediate risk thresholds, which is realistic for EHR-based prediction. Gradient boosting can improve 2-year prediction of diabetes progression compared with regularized logistic regression when applied to carefully preprocessed clinical data. SHAP explanations can make individual predictions more transparent by showing whether risk is driven by glycemia, duration, adherence, kidney function, or obesity. The model should be viewed as a risk stratification aid requiring external validation, not as a stand-alone clinical decision maker‎.
%U https://japer.in/article/machine-learning-based-statistical-prediction-of-disease-progression-in-patients-with-diabetes-melli-zwxnpurjvq07zjp