Skip to content

nunesdev/data-science

Repository files navigation

📊 Data Science & AI Portfolio

Python
SQL
Pandas
Matplotlib
Status


👨‍💻 About Me

Hi, I’m Bruno Reis, a software developer with 15+ years of experience, currently pursuing a Bachelor’s Degree in Data Science & Artificial Intelligence at Instituto Infnet, Brazil.

This repository showcases my learning journey in data analysis, statistics, SQL, and machine learning foundations, combining my background in programming with a new focus on data-driven problem solving.


📂 Featured Projects

Description: Comprehensive data quality diagnosis of the kc_house_data.csv dataset. The project focuses on distinguishing between actual data errors and natural outliers within the real estate market.

  • Key Techniques: Comparative statistical outlier detection (IQR, MAD, Z-Score), referential integrity verification, and temporal logical consistency validation.

  • Stack: Python, Pandas, NumPy, Seaborn.

  • Highlight: Demonstrated that the MAD method is ~9x more sensitive than Z-Score for long-tail data distributions (real estate).

  • 📎 Project README


🏋️‍♂️ Data Analysis with Python – Gym Members

  • Description: Exploratory Data Analysis (EDA) of a fictional gym membership dataset (1,000 members).
  • Tools: Python, Pandas.
  • Highlights:
    • Descriptive statistics & distribution analysis.
    • Behavior analysis (training days, visits, group classes, services).
    • Key insights: avg. 2.68 training days/week, 36 favorite drinks, most popular class BodyPump.
  • 📎 Project README

🎓 Student Performance Analysis with SQL

  • Description: SQL-based analysis of student performance across Math, Reading, and Writing.
  • Tools: SQL, Matplotlib (for visualizations).
  • Highlights:
    • Score segmentation & demographic analysis.
    • Correlation between Math, Reading & Writing.
    • Test preparation courses improve performance across all subjects.
  • 📎 Project README

🎵 Spotify Tracks Dataset – Data Analysis

  • Description: Analysis of Spotify music tracks dataset from Kaggle.
  • Dataset: Spotify Tracks Dataset
  • Tools: Python, Pandas, Matplotlib, Seaborn.
  • Highlights:
    • Temporal insights: analysis of release years and popularity over time.
    • Genre-based comparisons (distribution and top genres).
    • Visualization of continuous variables (e.g., danceability, energy).
    • Key findings on how musical features vary by genre and popularity.
  • 📎 Project README

🏬 Superstore Sales Analysis with SQL

  • Description: Sales and profitability analysis of the Superstore dataset using SQL queries.
  • Tools: SQL (PostgreSQL), Pandas (for result handling), Matplotlib (for charts).
  • Highlights:
    • Time-based sales trends with DATE_TRUNC().
    • Regional sales ranking and performance breakdown.
    • Customer purchase history tracking with window functions.
    • Top 5 customers with highest profit growth (using LAG() and month-over-month comparison).
  • 📎 Project README

🛠 Tech Stack

Area Tools & Technologies
Programming Python (Pandas, Matplotlib, Seaborn), SQL
Data Handling Data cleaning, exploratory data analysis, descriptive statistics
Visualization Matplotlib, Seaborn
Soft Skills Analytical thinking, problem-solving, data storytelling
Background 15+ years in software development

🚀 Next Steps

  • Expand with projects in Machine Learning and Neural Networks.
  • Apply predictive modeling to datasets (e.g., customer churn, education outcomes).
  • Build interactive dashboards and advanced data visualizations.

✍️ Author: Bruno Reis
📅 Portfolio started in 2025

About

My data science projects / models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages