Feat metric logging #132

snigenigmatic · 2025-08-31T10:48:21Z

#129 - Add metrics logging with thread-safe collector and /metrics endpoint

📌 Description

This PR implements comprehensive metrics logging for the PESUAuth API to enable monitoring and observability of authentication requests, errors, and system performance.

What is the purpose of this PR?

Adds a thread-safe metrics collection system to track authentication successes, failures, and various error types
Implements a new /metrics endpoint to expose collected metrics in JSON format
Integrates metrics tracking throughout the FastAPI application lifecycle

What problem does it solve?

Provides visibility into API usage patterns and authentication success rates
Enables monitoring of error types and frequencies for better debugging and system health assessment
Tracks CSRF token refresh operations for background task monitoring
Facilitates performance analysis and capacity planning

Background:
The API previously had no metrics or monitoring capabilities, making it difficult to assess system health, debug issues, or understand usage patterns. This implementation provides comprehensive tracking without impacting performance.

ℹ️ Fixes / Related Issues
Fixes: #129

🧱 Type of Change

✨ New feature – Adds functionality without breaking existing APIs
📝 Documentation update – README, docstrings, OpenAPI tags, etc.
🧪 Test suite change – Adds/updates unit, functional, or integration tests
🕵️ Debug/logging enhancement – Adds or improves logging/debug support

🧪 How Has This Been Tested?

Unit Tests (tests/unit/)
Functional Tests (tests/functional/)
Integration Tests (tests/integration/)
Manual Testing

⚙️ Test Configuration:

OS: Windows 11

Python: 3.13.2 via uv

Docker build tested

Testing Details:

Unit Tests: 10 comprehensive tests for MetricsCollector covering thread safety, concurrent access, edge cases, and special character handling
Integration Tests: End-to-end FastAPI testing with mock authentication flows to verify metrics collection in real scenarios
Manual Testing: Live API testing confirmed correct metrics tracking for success/failure scenarios and endpoint functionality

✅ Checklist

🛠️ Affected API Behaviour

app/app.py – Modified /authenticate route logic

New API Endpoint:

/metrics - New GET endpoint that returns current application metrics in JSON format

🧩 Models

app/models/response.py – Used existing response model for metrics endpoint formatting

New Files Added:

app/metrics.py - Core MetricsCollector implementation with thread-safe operations
app/docs/metrics.py - OpenAPI documentation for the new /metrics endpoint
tests/unit/test_metrics.py - Comprehensive unit tests for MetricsCollector
tests/integration/test_metrics_integration.py - Integration tests for metrics collection

🐳 DevOps & Config

Dockerfile – No changes to build process
.github/workflows/*.yaml – No CI/CD pipeline changes required
pyproject.toml / requirements.txt – No new dependencies added
.pre-commit-config.yaml – No linting or formatting changes

📊 Benchmarks & Analysis

scripts/benchmark_auth.py – No changes to benchmark scripts
scripts/analyze_benchmark.py – No changes to analysis tools
scripts/run_tests.py – No changes to test runner

📸 Screenshots / API Demos

🎯 Metrics Endpoint in Action

Live metrics collection showing authentication success/failure tracking

Metrics Endpoint Response Example

{
  "status": true,
  "message": "Metrics retrieved successfully",
  "timestamp": "2025-08-28T15:30:45.123456+05:30",
  "metrics": {
    "auth_success_total": 150,
    "auth_failure_total": 12,
    "validation_error_total": 8,
    "pesu_academy_error_total": 5,
    "unhandled_exception_total": 0,
    "csrf_token_error_total": 2,
    "profile_fetch_error_total": 1,
    "profile_parse_error_total": 0,
    "csrf_token_refresh_success_total": 45,
    "csrf_token_refresh_failure_total": 1
  }
}

🔧 Testing Results Dashboard

Comprehensive test suite covering unit, integration, and functional scenarios

Updated API Endpoints Table

Endpoint	Method	Description
`/`	`GET`	Serves the interactive API documentation (Swagger UI).
`/authenticate`	`POST`	Authenticates a user using their PESU credentials.
`/health`	`GET`	A health check endpoint to monitor the API's status.
`/readme`	`GET`	Redirects to the project's official GitHub repository.
`/metrics`	`GET`	Returns current application metrics and counters.

Metrics Tracked:

auth_success_total - Successful authentication attempts
auth_failure_total - Failed authentication attempts
validation_error_total - Request validation failures
pesu_academy_error_total - PESU Academy service errors
unhandled_exception_total - Unexpected application errors
csrf_token_error_total - CSRF token extraction failures
profile_fetch_error_total - Profile page fetch failures
profile_parse_error_total - Profile parsing errors
csrf_token_refresh_success_total - Successful background CSRF refreshes
csrf_token_refresh_failure_total - Failed background CSRF refreshes

aditeyabaral · 2025-09-04T13:42:41Z

app/docs/metrics.py

+        200: {
+            "description": "Metrics retrieved successfully",
+            "content": {
+                "application/json": {
+                    "examples": {
+                        "metrics_response": {
+                            "summary": "Current Metrics",
+                            "description": (
+                                "All current application metrics including authentication counts and error rates"
+                            ),
+                            "value": {
+                                "status": True,
+                                "message": "Metrics retrieved successfully",
+                                "timestamp": "2025-08-28T15:30:45.123456+05:30",
+                                "metrics": {
+                                    "auth_success_total": 150,
+                                    "auth_failure_total": 12,
+                                    "validation_error_total": 8,
+                                    "pesu_academy_error_total": 5,
+                                    "unhandled_exception_total": 0,
+                                    "csrf_token_error_total": 2,
+                                    "profile_fetch_error_total": 1,
+                                    "profile_parse_error_total": 0,
+                                    "csrf_token_refresh_success_total": 45,
+                                    "csrf_token_refresh_failure_total": 1,
+                                },
+                            },
+                        }
+                    }
+                }
+            },
+        }
+    },


Create a model for this. The response model will also need an update.

aditeyabaral · 2025-09-04T13:43:56Z

app/docs/metrics.py

+                                    "auth_success_total": 150,
+                                    "auth_failure_total": 12,


We should also track how many auth requests are received, including a split for how many with and without profile data

aditeyabaral · 2025-09-04T13:44:44Z

app/__init__.py

+
+from app.metrics import metrics
+
+__all__ = ["metrics"]


Remove this; we should not want to initialize a global collector outside the entry point.

aditeyabaral · 2025-09-04T14:14:54Z

app/app.py

+    exc_type = type(exc).__name__.lower()
+    if "csrf" in exc_type:
+        metrics.inc("csrf_token_error_total")
+    elif "profilefetch" in exc_type:
+        metrics.inc("profile_fetch_error_total")
+    elif "profileparse" in exc_type:
+        metrics.inc("profile_parse_error_total")
+    elif "authentication" in exc_type:
+        metrics.inc("auth_failure_total")
+


There is a much cleaner solution. Look into a middleware layer. Here is some pseudo code to get you started:

@app.middleware("http") async def metrics_middleware(request: Request, call_next): metrics.inc("requests_total") start_time = time.time() try: response: Response = await call_next(request) latency = time.time() - start_time # Track successes vs failures if 200 <= response.status_code < 300: metrics.inc("requests_success") else: metrics.inc("requests_failed") metrics.inc(f"requests_failed_status_{response.status_code}") # Latency metrics metrics.inc("request_latency_sum", latency) # Also add route metrics: route = request.scope.get("route") return response except Exception as e: latency = time.time() - start_time metrics.inc("requests_failed") metrics.inc(f"requests_failed_exception_{type(e).__name__}") metrics.inc("request_latency_sum", latency) raise

Note, you will need to accordingly increment other metrics like how many with and without profile data by parsing the request.

achyu-dev

Please do the requested changes from @aditeyabaral and me

achyu-dev · 2025-09-06T14:39:28Z

pyproject.toml

Are these dependencies necessary to be added ?

scripts/benchmark/util.py

…d clean up dependencies

snigenigmatic added 3 commits August 31, 2025 15:06

Add metrics logging with thread-safe collector and /metrics endpoint

a224c01

Add metrics logging with thread-safe collector and /metrics endpoint

12caa8d

Add metrics logging with thread-safe collector and /metrics endpoint

30c61aa

snigenigmatic requested review from a team and aditeyabaral as code owners August 31, 2025 10:48

snigenigmatic and others added 2 commits September 2, 2025 19:03

style: apply ruff formatting and lint fixes

bf8f963

Merge branch 'dev' into feat-metric_logging

173b8e6

aditeyabaral requested changes Sep 4, 2025

View reviewed changes

Merge branch 'dev' into feat-metric_logging

e8c5175

aditeyabaral assigned snigenigmatic Sep 4, 2025

aditeyabaral requested a review from achyu-dev September 4, 2025 14:17

achyu-dev requested changes Sep 6, 2025

View reviewed changes

pyproject.toml

Copy link

Contributor

achyu-dev Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these dependencies necessary to be added ?

scripts/benchmark/util.py Show resolved Hide resolved

snigenigmatic and others added 3 commits September 13, 2025 11:42

add metrics response model, implement request tracking middleware, an…

29ac9bd

…d clean up dependencies

add metrics response model, implement request tracking middleware, an…

fc55a4b

…d clean up dependencies

Merge branch 'dev' into feat-metric_logging

04083b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat metric logging #132

Feat metric logging #132

Uh oh!

snigenigmatic commented Aug 31, 2025

Uh oh!

aditeyabaral Sep 4, 2025

Uh oh!

aditeyabaral Sep 4, 2025

Uh oh!

aditeyabaral Sep 4, 2025

Uh oh!

aditeyabaral Sep 4, 2025

Uh oh!

achyu-dev left a comment

Uh oh!

achyu-dev Sep 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat metric logging #132

Are you sure you want to change the base?

Feat metric logging #132

Uh oh!

Conversation

snigenigmatic commented Aug 31, 2025

📌 Description

🧱 Type of Change

🧪 How Has This Been Tested?

✅ Checklist

🛠️ Affected API Behaviour

🧩 Models

🐳 DevOps & Config

📊 Benchmarks & Analysis

📸 Screenshots / API Demos

🎯 Metrics Endpoint in Action

Metrics Endpoint Response Example

🔧 Testing Results Dashboard

Updated API Endpoints Table

Uh oh!

aditeyabaral Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

aditeyabaral Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

aditeyabaral Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

aditeyabaral Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

achyu-dev left a comment

Choose a reason for hiding this comment

Uh oh!

achyu-dev Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants