This project is a data analysis and visualization of Italy's well-being measurement dataset. The main goal was to determine which of the 20 regions in Italy is most suitable for visiting and which one is better for living in for about a year.
The data was collected from Istat (Instituto Nazionale di Statistica), the Italian National Statistics Institute, which can be accessed here.
This project was implemented using Python and Jupyter Notebook. The Python libraries used include Pandas, Seaborn, Matplotlib, and Numpy.
The project involved Data Cleaning, Data Manipulation, and Exploratory Data Analysis. The original dataset contained 152 variables, which were narrowed down to 26 significant variables based on the project's main questions.
The analysis found that Tuscany is the best region to visit, while Trentino is the best one to live in for about a year.
One of the challenges faced during this project was dealing with the large number of variables in the dataset. It was necessary to carefully consider each variable's significance and narrow them down to the 26 most relevant ones.
Through this project, I learned how to transpose rows and columns in complex datasets using Python.
In the future, I plan to enhance this project by incorporating machine learning algorithms.