Skip to content

The goal of this project was to answer the question: "How do life expectancy and governance interact with wellbeing trends?" To achieve this, we consolidated multiple datasets, prepared them for modeling, and then built and interpreted a predictive model for happiness scores.

Notifications You must be signed in to change notification settings

Dilharajay/global-happiness-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Happiness and Its Drivers

This report summarizes the findings from an analysis of global development indicators to predict national happiness scores and understand their underlying drivers.

1. Project Overview

The goal of this project was to answer the question: "How do life expectancy and governance interact with wellbeing trends?" To achieve this, we consolidated multiple datasets, prepared them for modeling, and then built and interpreted a predictive model for happiness scores.

2. Data Preparation

The initial phase involved collecting and cleaning data from seven different sources covering happiness, GDP, freedom, corruption, life expectancy, human development, and women's rights.

The key steps were:

  • Loading and Standardization: Each dataset was loaded, and column names were standardized for consistency.
  • Merging: The datasets were merged into a single master table based on country and year.
  • Imputation: Missing values were a significant issue. We used K-Nearest Neighbors (KNN) Imputation to fill these gaps. This method was validated by artificially masking some existing data and confirming that the imputed values were close to the true values (Mean Absolute Error of 0.04, which was only 6.22% of the mean value).
  • Filtering: The dataset was filtered to include only modern data (2010-2024).

The final, cleaned dataset contains a comprehensive set of indicators for modeling.

3. Modeling and Evaluation

To predict the happiness_score, we tested a variety of regression models, including Linear Regression, Ridge, Lasso, Decision Trees, and Random Forests. To specifically address the research question about interactions, we generated polynomial features (degree 2), which create terms representing the combined effect of two variables (e.g., life_expectancy * freedom_score).

  • Best Model: The Ridge Regression (α=1.0) model was selected as the best performer.
  • Performance: It achieved a Test R² of 0.885, meaning it can explain approximately 88.5% of the variance in happiness scores on unseen data. The model showed a very small "overfitting gap" (Train R² - Test R² = 0.005), indicating that it generalizes very well to new data.

4. Key Findings: What Drives Happiness?

The model interpretation, using SHAP values and permutation importance, revealed several key drivers of happiness.

Top Individual Predictors:

The most significant factors influencing a country's happiness score are:

  1. AHDI Score (Augmented Human Development Index): This was the single most powerful predictor, combining health, education, and standard of living.
  2. Life Expectancy: Directly and strongly correlates with higher happiness.
  3. GDP per Capita: A strong positive driver, indicating that higher national income is linked to higher wellbeing.
  4. Freedom Score: The degree of political and civil freedom is a significant positive factor.

The Interaction of Governance and Life Expectancy:

The analysis of interaction terms provided a direct answer to our research question:

  • life_expectancy * ahdi_score: This was the most powerful interaction term. It suggests that the positive effect of a long life on happiness is amplified in countries with high human development (good education, high income). A long life in a well-developed country is more beneficial than just a long life alone.
  • life_expectancy * freedom_score: This interaction also had a positive coefficient. It implies that the happiness boost from a long life is greater when citizens also enjoy high levels of freedom.
  • corruption_cpi * life_expectancy: This term showed a positive impact, which means that in countries with long life expectancies, lower corruption (a higher CPI score) has an even stronger positive association with happiness.

5. Conclusion

The analysis successfully identified the primary drivers of national happiness, with the AHDI score, life expectancy, and GDP being the most critical factors.

Crucially, the model confirms that governance and life expectancy do not act in isolation. Their combined effect is greater than the sum of their parts. A long and healthy life contributes more significantly to happiness when it is experienced in a country with high levels of human development, freedom, and low corruption. This highlights the importance of holistic development strategies that focus not just on health and wealth, but also on the quality of governance and civil liberties.

About

The goal of this project was to answer the question: "How do life expectancy and governance interact with wellbeing trends?" To achieve this, we consolidated multiple datasets, prepared them for modeling, and then built and interpreted a predictive model for happiness scores.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published