Hi, I’m Bruno Reis, a software developer with 15+ years of experience, currently pursuing a Bachelor’s Degree in Data Science & Artificial Intelligence at Instituto Infnet, Brazil.
This repository showcases my learning journey in data analysis, statistics, SQL, and machine learning foundations, combining my background in programming with a new focus on data-driven problem solving.
Description: Comprehensive data quality diagnosis of the kc_house_data.csv dataset. The project focuses on distinguishing between actual data errors and natural outliers within the real estate market.
-
Key Techniques: Comparative statistical outlier detection (IQR, MAD, Z-Score), referential integrity verification, and temporal logical consistency validation.
-
Stack: Python, Pandas, NumPy, Seaborn.
-
Highlight: Demonstrated that the MAD method is ~9x more sensitive than Z-Score for long-tail data distributions (real estate).
- Description: Exploratory Data Analysis (EDA) of a fictional gym membership dataset (1,000 members).
- Tools: Python, Pandas.
- Highlights:
- Descriptive statistics & distribution analysis.
- Behavior analysis (training days, visits, group classes, services).
- Key insights: avg. 2.68 training days/week, 36 favorite drinks, most popular class BodyPump.
- 📎 Project README
- Description: SQL-based analysis of student performance across Math, Reading, and Writing.
- Tools: SQL, Matplotlib (for visualizations).
- Highlights:
- Score segmentation & demographic analysis.
- Correlation between Math, Reading & Writing.
- Test preparation courses improve performance across all subjects.
- 📎 Project README
- Description: Analysis of Spotify music tracks dataset from Kaggle.
- Dataset: Spotify Tracks Dataset
- Tools: Python, Pandas, Matplotlib, Seaborn.
- Highlights:
- Temporal insights: analysis of release years and popularity over time.
- Genre-based comparisons (distribution and top genres).
- Visualization of continuous variables (e.g., danceability, energy).
- Key findings on how musical features vary by genre and popularity.
- 📎 Project README
- Description: Sales and profitability analysis of the Superstore dataset using SQL queries.
- Tools: SQL (PostgreSQL), Pandas (for result handling), Matplotlib (for charts).
- Highlights:
- Time-based sales trends with
DATE_TRUNC(). - Regional sales ranking and performance breakdown.
- Customer purchase history tracking with window functions.
- Top 5 customers with highest profit growth (using
LAG()and month-over-month comparison).
- Time-based sales trends with
- 📎 Project README
| Area | Tools & Technologies |
|---|---|
| Programming | Python (Pandas, Matplotlib, Seaborn), SQL |
| Data Handling | Data cleaning, exploratory data analysis, descriptive statistics |
| Visualization | Matplotlib, Seaborn |
| Soft Skills | Analytical thinking, problem-solving, data storytelling |
| Background | 15+ years in software development |
- Expand with projects in Machine Learning and Neural Networks.
- Apply predictive modeling to datasets (e.g., customer churn, education outcomes).
- Build interactive dashboards and advanced data visualizations.
✍️ Author: Bruno Reis
📅 Portfolio started in 2025