A Machine Learning Approach to Predicting Future Onset of Type II Diabetes
Abstract
Type 2 Diabetes is a critical global health concern, and this study aims to enhance its onset prediction using the National Health and Nutrition Examination Survey (NHANES) dataset. The research assesses machine learning models' accuracy in predicting Type 2 Diabetes onset in the U.S., utilizing NHANES data from 1988 to 2018 and a broad spectrum of factors such as examination, dietary, questionnaire, and demographic data. Employing Logistic Regression, Support Vector Machines (SVM), Random Forest, XGBoost, and an ensemble model that combines their strengths, the study meticulously integrates critical variables into feature selection.
The models, evaluated on ROC-AUC, Precision, Recall, and F1 Score, showed notable performance. In Case I, targeting Diabetic and Non-Diabetic patients, Logistic Regression achieved an AUC of 0.662649, SVM 0.739073, Random Forest 0.865298, XGBoost 0.856807, and the Ensemble model 0.856879. In Case II, emphasizing Undiagnosed Diabetic and Pre-Diabetic patients, Logistic Regression achieved an AUC of 0.837121, SVM 0.851885, Random Forest 0.891081, XGBoost 0.892435, and the Ensemble model 0.885736. When evaluated using 20% test data for Cases I and II, the models demonstrated high efficacy, particularly the Random Forest and XGBoost models, which exhibited nearly perfect ROC-AUC scores in Case I.
These results underscore the potential of machine learning in accurately predicting Type 2 Diabetes onset. The developed models, particularly the ensemble model, show high accuracy and offer a comprehensive view of risk factors. The study highlights the ongoing need for research in this area to refine predictive models and improve their applicability in real-world healthcare settings.
Downloads
Published
Data Availability Statement
Source of data used in this research paper: https://www.kaggle.com/datasets/nguyenvy/nhanes-19882018Issue
Section
License
Copyright (c) 2024 Intersect: The Stanford Journal of Science, Technology, and Society
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).