Change in Life Expectancy Across Countries
Abstract
This paper used machine learning (ML) techniques to examine which factors contribute the greatest to life expectancy levels. Firstly, through background research, life expectancy was found to be an effective representation of a country’s overall health. Next, initial data analysis was done to analyze which features of the data were relevant to this study by looking at the factors affecting life expectancy. After the features were selected, three ML models were fitted to the data: multiple linear regression, random forest regression, and decision tree regression. The ML models were instrumental in identifying how these features interact with each other and life expectancy. The random forest regression model returned the highest R-squared value so that is the model used for this study. The R-squared value communicates how accurately the model makes predictions compared to the actual test data. To decide which of the features affected life expectancy greatest, feature importance was used. Feature importance is a metric that shows how greatly features are affecting the output value in an ML model. After running feature importance on the random forest regression model, the graph showed that the gross domestic product (GDP) of the country most greatly affected life expectancy. GDP encompasses the value of total final output of goods and services produced by the economy of a nation in a year. This conveys the importance of economic involvement to a country’s overall health. When a resource-constrained country does better economically and improves its GDP, it increases output of goods and services resulting in job creation and more money in the nation. The additional financial resources will provide an opportunity for resource constrained nations to spend more money on institutions like health care and education which in turn impact life expectancy positively.
Downloads
Published
Data Availability Statement
The dataset used for this paper is available in Kaggle. The link to the dataset is: https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who. The dataset was originally published by the World Health organization.
Issue
Section
License
Copyright (c) 2025 Intersect: The Stanford Journal of Science, Technology, and Society
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).