Skip to content

kraryal/construction_new

Repository files navigation

๐Ÿ—๏ธ ML-Based Class 5 Construction Cost Estimator

Predicting Early-Stage Construction Costs with Machine Learning

Python Flask scikit-learn License

CSE6748 - Applied Analytics Practicum
Georgia Institute of Technology | Fall 2025

๐Ÿš€ Quick Start โ€ข ๐Ÿ“Š Features โ€ข ๐Ÿ“– Documentation โ€ข ๐Ÿ‘ฅ Team


๐ŸŽฏ Achievement: 21.97% MAPE - Exceeds Target by 3.03%

Dashboard Preview

๐Ÿ“‹ Table of Contents


๐ŸŽฏ Overview

A production-ready web application that leverages Random Forest machine learning to predict early-stage construction costs with exceptional accuracy. Built for Construction Cost Database LLC, this tool provides Class 5 estimates (ยฑ25% accuracy) for infrastructure projects.

๐Ÿ† Key Achievement

Target Requirement: MAPE < 25% Actual Performance: MAPE = 21.97% Result: โœ… Exceeded target by 3.03%

๐ŸŽ“ Academic Context

  • Course: CSE6748 - Applied Analytics Practicum
  • Institution: Georgia Institute of Technology
  • Semester: Fall 2025
  • Client: Construction Cost Database LLC
  • Dataset: 17,025 historical projects (2010-2025)

โœจ Key Features

๐Ÿ“Š Analytics

Real-time exploratory data analysis with interactive visualizations

Features:

  • Cost distribution charts
  • Geographic heatmaps
  • Project type analysis
  • Statistical insights

๐Ÿ’ฐ Cost Prediction

Instant ML-powered cost estimates with confidence intervals

Features:

  • 13-feature input form
  • ยฑ25% confidence range
  • Similar project matching
  • Detailed breakdowns

๐Ÿ“ˆ Model Insights

Comprehensive model performance tracking and comparison

Features:

  • 4 algorithm comparison
  • Feature importance
  • Learning curves
  • Residual analysis

๐ŸŽฏ Model Performance

Current Production Model: Random Forest

Metric Value Status
Test MAPE 21.97% โœ… Target Met
Rยฒ Score 0.9463 ๐ŸŽฏ Excellent
Test MAE $271,543 ๐Ÿ“Š Strong
Test RMSE $412,583 ๐Ÿ“ˆ Reliable
Dataset Size 17,025 projects ๐Ÿ“ฆ Large Scale

Model Comparison Results

Model CV MAPE Test MAPE Rยฒ Score Status
๐ŸŒฒ Random Forest 23.30% 21.97% โœ… 0.9463 ๐Ÿš€ Deployed
โšก XGBoost 36.16% 36.37% 0.9258 ๐Ÿ“‹ Alternative
๐Ÿ’ก LightGBM 36.93% 37.62% 0.9232 ๐Ÿ“‹ Alternative
๐Ÿ“Š Gradient Boosting 42.48% 43.75% 0.9015 โš ๏ธ Above Target

Why Random Forest?

  • โœ… Best MAPE performance (21.97%)
  • โœ… Highest Rยฒ score (0.9463)
  • โœ… Excellent interpretability
  • โœ… Robust to outliers
  • โœ… Fast training and prediction

๐Ÿ› ๏ธ Technology Stack

Backend

๐Ÿ Python 3.11+         - Core programming language
๐ŸŒถ๏ธ Flask 3.0.0          - Web framework
๐Ÿค– scikit-learn 1.3.2   - Machine learning
๐Ÿผ Pandas 2.1.4         - Data manipulation
๐Ÿ”ข NumPy 1.26.2         - Numerical computing
๐Ÿ“Š Matplotlib 3.8.2     - Visualizations
๐ŸŽจ Seaborn 0.13.0       - Statistical plots

Frontend

๐Ÿ“„ HTML5 / CSS3         - Modern web standards
๐ŸŽจ Bootstrap 5          - Responsive UI framework
๐Ÿ“Š Chart.js             - Interactive charts
โšก Vanilla JavaScript   - Dynamic interactions

Machine Learning Pipeline

๐ŸŒฒ Random Forest        - Primary algorithm
๐Ÿ“ StandardScaler       - Feature normalization
๐Ÿท๏ธ OneHotEncoder        - Categorical encoding
๐ŸŽฏ K-Means Clustering   - Geographic regions
โœ… 5-Fold CV            - Model validation

๐Ÿš€ Quick Start

Prerequisites Checklist

  • Python 3.11 or higher installed
  • pip package manager available
  • 500MB free disk space
  • Modern web browser

One-Command Setup

# Clone, setup, and run in one go
git clone https://github.com/kraryal/construction_new.git && \
cd construction_new && \
python -m venv venv && \
source venv/bin/activate && \
pip install -r requirements.txt && \
python app.py

Windows Users: Replace source venv/bin/activate with venv\Scripts\activate

Access the Application

๐ŸŒ Open browser: http://localhost:5000

That's it! ๐ŸŽ‰


๐Ÿ“ฆ Installation Guide

Step 1: Clone Repository

git clone https://github.com/kraryal/construction_new.git
cd construction_new

Step 2: Create Virtual Environment

Windows:

python -m venv venv
venv\Scripts\Activate.ps1

Mac/Linux:

python3 -m venv venv
source venv/bin/activate

Step 3: Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Step 4: Verify Installation

python -c "import flask, pandas, sklearn; print('โœ… Setup complete!')"

Step 5: Prepare Data

# Ensure dataset is in correct location
ls data/base_data_for_model.csv

Step 6: Run Application

python app.py

Expected Output:

================================================================================
๐Ÿ—๏ธ  ML-BASED CLASS 5 CONSTRUCTION COST ESTIMATOR
================================================================================

โœ… Dataset loaded successfully: 17025 projects with 38 features
โœ… Model loaded successfully
โœ… System Ready!
๐ŸŒ Access at: http://localhost:5000
================================================================================

๐Ÿ’ก Usage Examples

Web Interface

  1. Navigate to Cost Estimator page
  2. Fill project details form
  3. Click "Calculate Cost Estimate"
  4. View prediction with confidence interval

Input Example

Field Value
Project Type Pavement Markers
Budget Range $3M-$6M
Complexity Category 4
State Michigan (MI)
County Alcona County
Area Type Rural
Inflation Factor 1.05
ACF 1.01

Output

Estimated Cost: $4,358,432.11
Confidence Range: $3,268,824 - $5,448,040
Similar Projects: 6,147 found

API Usage (Python)

import requests

# Endpoint
url = 'http://localhost:5000/estimate_cost'

# Project data
data = {
    'inflation_factor': 1.05,
    'official_budget_range': '$3M-$6M',
    'ciqs_complexity_category': 'Category 4',
    'cnt_division': 6,
    'cnt_item_code': 6,
    'county_name': 'Alcona County',
    'area_type': 'Rural',
    'acf': 1.01,
    'project_type': 'Pavement Markers',
    'project_category': 'Civil',
    'project_state': 'MI',
    'region': 'Region_3'
}

# Make request
response = requests.post(url, data=data)
result = response.json()

# Display results
if result['success']:
    print(f"๐Ÿ’ฐ Estimated Cost: {result['estimated_cost_formatted']}")
    print(f"๐Ÿ“Š Confidence Range: {result['confidence_interval']['lower_formatted']} - {result['confidence_interval']['upper_formatted']}")
    print(f"๐Ÿ” Similar Projects: {result['similar_projects']['count']}")

cURL Example

curl -X POST http://localhost:5000/estimate_cost \
  -d "inflation_factor=1.05" \
  -d "official_budget_range=\$3M-\$6M" \
  -d "ciqs_complexity_category=Category 4" \
  -d "cnt_division=6" \
  -d "cnt_item_code=6" \
  -d "county_name=Alcona County" \
  -d "area_type=Rural" \
  -d "acf=1.01" \
  -d "project_type=Pavement Markers" \
  -d "project_category=Civil" \
  -d "project_state=MI" \
  -d "region=Region_3"

๐Ÿ“ Project Structure

construction_new/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ app.py                       # Main Flask application (500+ lines)
โ”œโ”€โ”€ ๐Ÿ“‹ requirements.txt             # Python dependencies
โ”œโ”€โ”€ ๐Ÿ“– README.md                    # Project documentation (this file)
โ”œโ”€โ”€ ๐Ÿ“š INSTRUCTIONS.md              # Detailed setup guide
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ data/
โ”‚   โ””โ”€โ”€ ๐Ÿ“Š base_data_for_model.csv # Training dataset (17,025 projects)
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ models/
โ”‚   โ”œโ”€โ”€ ๐Ÿค– construction_cost_model.pkl  # Trained Random Forest model
โ”‚   โ””โ”€โ”€ ๐Ÿ“ˆ model_metrics.json           # Performance metrics
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ templates/                   # HTML templates
โ”‚   โ”œโ”€โ”€ ๐Ÿ  home.html               # Landing page with cards
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š eda.html                # Exploratory data analysis
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ˆ model_comparison.html   # Algorithm comparison
โ”‚   โ”œโ”€โ”€ ๐ŸŽฏ dashboard.html          # Performance dashboard
โ”‚   โ”œโ”€โ”€ ๐Ÿ’ฐ cost_estimator.html     # Prediction form (main feature)
โ”‚   โ”œโ”€โ”€ ๐Ÿ—‚๏ธ data_overview.html      # Dataset information
โ”‚   โ”œโ”€โ”€ ๐Ÿ“š documentation.html      # API docs & team info
โ”‚   โ”œโ”€โ”€ ๐Ÿงญ base.html               # Base template with navigation
โ”‚   โ””โ”€โ”€ โŒ error.html              # Error handling page
โ”‚
โ””โ”€โ”€ ๐Ÿ“‚ static/
    โ”œโ”€โ”€ ๐ŸŽจ css/
    โ”‚   โ””โ”€โ”€ styles.css             # Custom styles
    โ””โ”€โ”€ ๐Ÿ–ผ๏ธ images/                  # Generated plots & assets

๐Ÿ”ฌ Model Details

Features Used (13 Features)

๐Ÿ“Š Click to expand feature list

Economic Factors (2)

  • Inflation Factor - Range: 1.00 - 1.34 | Adjusts for year-over-year cost changes
  • Area Cost Factor (ACF) - Range: 0.80 - 1.19 | Geographic cost adjustment multiplier

Project Classification (4)

  • Project Type - Categorical | Specific construction work type (e.g., Pavement Markers)
  • Project Category - Categorical | General classification (e.g., Civil, Water & Sewer)
  • CIQS Complexity Category - Category 1-4 | Complexity rating from simple to complex
  • Official Budget Range - Categorical | Budget bracket (e.g., $3M-$6M, Less than 1M)

Geographic Location (4)

  • Project State - 50 US states | Location identifier
  • County Name - Varies by state | Specific county location
  • Area Type - Urban/Rural | Development density classification
  • Region - Region_0 to Region_3 | K-Means clustered geographic zones

Construction Details (3)

  • CNT Division Code - Range: 1 - 29 | Construction division taxonomy
  • CNT Item Code - Range: 1 - 61 | Specific item classification

Training Pipeline

graph LR
    A[Raw Data<br/>17,025 projects] --> B[Data Cleaning<br/>Fill missing values]
    B --> C[Feature Engineering<br/>K-Means clustering]
    C --> D[Train/Test Split<br/>80/20]
    D --> E[Preprocessing<br/>StandardScaler + OneHotEncoder]
    E --> F[Model Training<br/>Random Forest]
    F --> G[Validation<br/>5-Fold CV]
    G --> H[Production Model<br/>21.97% MAPE]
Loading

Preprocessing Steps

  1. Missing Value Imputation

    • Numerical: Median
    • Categorical: Mode
  2. Feature Engineering

    • K-Means clustering for geographic regions
    • Created 4 regional clusters from state coordinates
  3. Feature Scaling

    • StandardScaler for numerical features
    • OneHotEncoder for categorical features
  4. Train/Test Split

    • 80% training (13,620 samples)
    • 20% testing (3,405 samples)
    • Random state: 42 (reproducible)

Model Configuration

RandomForestRegressor(
    n_estimators=100,     # Number of decision trees
    random_state=42,      # Reproducibility seed
    n_jobs=-1            # Use all CPU cores
)

๐Ÿ“š API Documentation

Endpoints

1. Cost Estimation

POST /estimate_cost

Request Body (Form Data):

{
  "inflation_factor": 1.05,
  "official_budget_range": "$3M-$6M",
  "ciqs_complexity_category": "Category 4",
  "cnt_division": 6,
  "cnt_item_code": 6,
  "county_name": "Alcona County",
  "area_type": "Rural",
  "acf": 1.01,
  "project_type": "Pavement Markers",
  "project_category": "Civil",
  "project_state": "MI",
  "region": "Region_3"
}

Success Response (200):

{
  "success": true,
  "estimated_cost": 4358432.11,
  "estimated_cost_formatted": "$4,358,432.11",
  "confidence_interval": {
    "lower": 3268824.08,
    "upper": 5448040.14,
    "lower_formatted": "$3,268,824.08",
    "upper_formatted": "$5,448,040.14"
  },
  "similar_projects": {
    "count": 6147,
    "avg_cost_formatted": "$1,310,815.37",
    "median_cost_formatted": "$856,470.56",
    "match_type": "exact"
  },
  "timestamp": "2025-11-25 14:30:22"
}

2. Dataset Statistics

GET /api/dataset_stats

Response:

{
  "total_projects": 17025,
  "avg_cost": 1142356.78,
  "median_cost": 856470.56,
  "min_cost": 10500.00,
  "max_cost": 15200000.00
}

3. Model Metrics

GET /api/model_metrics

Response:

{
  "test_mape": 21.97,
  "r2_score": 0.9463,
  "mae": 271543.90,
  "rmse": 412583.00,
  "n_features": 13
}

๐Ÿ“ธ Screenshots

๐Ÿ–ผ๏ธ Click to view screenshots

Home Page

Home Page

Cost Estimator Form (Empty)

Cost Estimator Form

Clean input form for entering construction project details.

Cost Estimator Form (Filled Example)

Cost Estimator Filled

Example of the form with sample data entered for cost prediction.

Model Comparison

Model Comparison Table

Comparison of different machine learning models' performance metrics.

Performance Dashboard

Performance Dashboard

Interactive dashboard showing key project analytics and insights.


๐Ÿ‘ฅ Team

Project Contributors

Dashboard Preview

Krishna Aryal
Data Engineering & Model Development
๐Ÿ“ง Email โ€ข ๐Ÿ’ป GitHub

Dashboard Preview

Kumar Sawan
Feature Engineering & Optimization
๐Ÿ“ง Email

Dashboard Preview

Neema Kafwimi
Model Evaluation & Deployment
๐Ÿ“ง Email


๐Ÿ™ Acknowledgments

  • Construction Cost Database LLC - Dataset provider and project client
  • Georgia Tech CSE6748 - Course faculty and teaching assistants
  • scikit-learn Community - Open-source ML library
  • Flask Team - Web framework development
  • Stack Overflow Community - Problem-solving support

๐Ÿ“ License & Citation

This project is part of an academic practicum for Georgia Institute of Technology.

Citation

If you use this work, please cite:

@misc{construction_cost_estimator_2025,
  title={ML-Based Class 5 Construction Cost Estimator},
  author={Aryal, Krishna and Sawan, Kumar and Kafwimi, Neema},
  year={2025},
  institution={Georgia Institute of Technology},
  course={CSE6748 - Applied Analytics Practicum}
}

โš ๏ธ Important Disclaimers

Class 5 Estimates Only
This model provides conceptual estimates with ยฑ25% accuracy. Not suitable for detailed bidding, final estimates, or contractual commitments.

Historical Data Limitation
Model trained on 2010-2025 data. May not capture unprecedented market conditions, novel construction methods, or future trends.

Professional Validation Required
Always validate estimates with construction professionals and adjust for project-specific factors not captured by the model.


๐Ÿ”ฎ Future Enhancements

  • Real-time cost index updates
  • User authentication & project history
  • Export to PDF/Excel
  • Mobile responsive improvements
  • Additional ML models (Neural Networks)
  • Integration with external cost databases
  • API rate limiting & authentication
  • Multi-language support

๐Ÿ“ž Support & Contact

Need Help?

Project Links

  • ๐ŸŒ Live Demo: [Coming Soon]
  • ๐Ÿ“Š Dataset: PCS Historical Project Database
  • ๐ŸŽ“ Course: CSE6748 - Applied Analytics Practicum

โญ Star this repository if you find it helpful!

Built with โค๏ธ by the Georgia Tech Team

GitHub stars GitHub forks


"Accurate Early-Stage Cost Estimation Powered by Machine Learning"

๐Ÿ  Home โ€ข ๐Ÿ“Š Dashboard โ€ข ๐Ÿ’ฐ Estimate โ€ข ๐Ÿ“š Docs

```

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published