feat: add OpenTelemetry plugin for database contention detection #20

joamag · 2025-12-22T07:36:58Z

Implements a comprehensive OpenTelemetry instrumentation plugin for the
Colony entity manager to detect and monitor database transaction
contention and performance issues.

Features:

Automatic instrumentation of entity manager operations
Distributed tracing for transactions, queries, and locks
Database-specific contention detectors for PostgreSQL and MySQL
Real-time blocking query detection
Lock wait monitoring and analysis
Transaction statistics collection
Slow query identification
Deadlock and serialization failure detection

Components:

telemetry_plugin.py: Main plugin following Colony patterns
system.py: Core telemetry system with instrumentation logic
decorators.py: Instrumented transaction decorators
contention.py: Database-specific contention detection
test.py: Unit tests for telemetry functionality

Configuration:

Environment-based configuration (OTEL_* variables)
Support for OTLP exporters (Jaeger, Tempo, etc.)
Console exporter for development/debugging
Configurable slow query thresholds

Metrics:

db.transaction.duration: Transaction execution time
db.query.duration: Query execution time
db.lock.wait_duration: Lock acquisition time
db.transaction.active: Active transaction count
db.contention.events: Contention event counter

Documentation:

README.md: Comprehensive usage guide
QUICK_START.md: Quick setup instructions
example_usage.py: Practical examples

The implementation is non-intrusive and follows the Colony plugin
architecture, making it easy to integrate with existing applications.

Summary by CodeRabbit

New Features
- OpenTelemetry plugin added to instrument entity manager: transaction/query tracing, metrics, and engine-specific contention detectors (PostgreSQL, MySQL, SQLite).
- Transaction and query decorators for automatic tracing and contention-aware behavior.
- Example scripts demonstrating MySQL/SQLite/PostgreSQL telemetry, monitoring and periodic checks.
Documentation
- Comprehensive telemetry README, quick-start guide, and database-specific support docs with usage, best practices, and troubleshooting.
Tests
- New test suite covering plugin loading and decorator behaviors.
Chores
- Added OpenTelemetry runtime dependencies.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Implements a comprehensive OpenTelemetry instrumentation plugin for the Colony entity manager to detect and monitor database transaction contention and performance issues. Features: - Automatic instrumentation of entity manager operations - Distributed tracing for transactions, queries, and locks - Database-specific contention detectors for PostgreSQL and MySQL - Real-time blocking query detection - Lock wait monitoring and analysis - Transaction statistics collection - Slow query identification - Deadlock and serialization failure detection Components: - telemetry_plugin.py: Main plugin following Colony patterns - system.py: Core telemetry system with instrumentation logic - decorators.py: Instrumented transaction decorators - contention.py: Database-specific contention detection - test.py: Unit tests for telemetry functionality Configuration: - Environment-based configuration (OTEL_* variables) - Support for OTLP exporters (Jaeger, Tempo, etc.) - Console exporter for development/debugging - Configurable slow query thresholds Metrics: - db.transaction.duration: Transaction execution time - db.query.duration: Query execution time - db.lock.wait_duration: Lock acquisition time - db.transaction.active: Active transaction count - db.contention.events: Contention event counter Documentation: - README.md: Comprehensive usage guide - QUICK_START.md: Quick setup instructions - example_usage.py: Practical examples The implementation is non-intrusive and follows the Colony plugin architecture, making it easy to integrate with existing applications.

Enhances the OpenTelemetry telemetry plugin with full MySQL 5.7+ and 8.0+ compatibility for database contention detection and monitoring. MySQL-Specific Features: - InnoDB lock wait detection with fallback for MySQL 5.7/8.0+ - Deadlock detection via error code 1213 and InnoDB status parsing - Transaction statistics from information_schema.innodb_trx - Slow query detection from PROCESSLIST - SHOW ENGINE INNODB STATUS parsing for recent deadlocks - Automatic version detection (sys.innodb_lock_waits for 8.0+) Enhanced Error Detection: - MySQL deadlock error code 1213 detection - Improved contention type classification (deadlock, lock_timeout, serialization_failure, lock_wait) - Database-specific error code parsing - Enhanced span attributes for contention events MySQLContentionDetector Methods: - get_blocking_queries(): Detects InnoDB lock waits - get_lock_waits(): Returns lock wait statistics - get_transaction_stats(): InnoDB transaction metrics + deadlock count - get_slow_queries(): Queries from PROCESSLIST exceeding threshold - get_deadlock_info(): Parses SHOW ENGINE INNODB STATUS Updated Instrumentation (system.py): - MySQL-specific error code detection in execute_query wrapper - Contention type attribute (db.contention.type) in spans - Improved error categorization for both MySQL and PostgreSQL Documentation: - example_mysql.py: Complete MySQL usage examples - README.md: Added MySQL-specific examples and usage - MySQL best practices for contention avoidance Query Compatibility: - Uses sys.innodb_lock_waits for MySQL 8.0+ - Falls back to information_schema.innodb_lock_waits for 5.7 - CASE WHEN aggregation instead of FILTER for MySQL compatibility - GLOBAL_STATUS for deadlock count retrieval The implementation maintains full backward compatibility with existing PostgreSQL functionality while adding comprehensive MySQL support.

coderabbitai · 2025-12-22T07:37:08Z

Walkthrough

Adds a new Telemetry plugin with OpenTelemetry integration: instrumentation, decorators, a Telemetry system, cross-database contention detectors (PostgreSQL, MySQL, SQLite), examples, docs, tests, and requirements updates to include OpenTelemetry packages.

Changes

Cohort / File(s)	Summary
Dependencies `requirements.txt`	Added OpenTelemetry packages: `opentelemetry-api>=1.20.0`, `opentelemetry-sdk>=1.20.0`, `opentelemetry-exporter-otlp>=1.20.0`.
Documentation `telemetry/README.md`, `telemetry/QUICK_START.md`, `telemetry/MYSQL_SUPPORT.md`, `telemetry/SQLITE_SUPPORT.md`	New comprehensive telemetry docs and DB-specific support guides (MySQL, SQLite), quick-starts, examples, compatibility notes, and troubleshooting.
Examples `telemetry/example_usage.py`, `telemetry/example_mysql.py`, `telemetry/example_sqlite.py`	New example scripts demonstrating setup, instrumentation, transaction tracing, contention detection, deadlock handling, optimization, best practices, and periodic monitoring for PostgreSQL, MySQL, and SQLite.
Plugin Bridge `telemetry/src/telemetry_plugin.py`	New `TelemetryPlugin` class integrating with Colony plugin system; declares metadata, dependencies, load logic, and exposes instrumentation, decorator access, contention detector factory, and configuration entrypoints.
Core Telemetry System `telemetry/src/telemetry/system.py`	New `Telemetry` class: conditional OpenTelemetry initialization, tracer/meter setup, metric instruments, configure() entrypoint, and `instrument_entity_manager()` to wrap EM operations and emit spans/metrics.
Decorators / Instrumentation Helpers `telemetry/src/telemetry/decorators.py`	New decorators: `transaction(...)` and `monitored_query` that manage transaction lifecycle, create spans, record events/metrics, detect contention patterns, and degrade gracefully when OpenTelemetry absent.
Contention Detectors `telemetry/src/telemetry/contention.py`	New `ContentionDetector` base plus `PgSQLContentionDetector`, `MySQLContentionDetector`, and `SQLiteContentionDetector` implementations to collect blocking queries, lock waits, transaction stats, slow queries, deadlocks, DB info, and optimization helpers.
Module Init / Exports `telemetry/src/telemetry/__init__.py`	Adds module metadata and public exports: `ContentionDetector`, `PgSQLContentionDetector`, `MySQLContentionDetector`, `SQLiteContentionDetector`, `Telemetry`, `TelemetryTest`, and retains existing submodule imports.
Tests `telemetry/src/telemetry/test.py`	New test module with `TelemetryTest` and `TelemetryTestCase` validating plugin load, decorator behavior (with/without OTEL), exception handling, and base contention detector behavior.
Support / Notes `telemetry/MYSQL_SUPPORT.md`, `telemetry/SQLITE_SUPPORT.md`	DB-specific support documents outlining instrumentation, metrics, error detection, and recommended configs for MySQL and SQLite.

Sequence Diagram

sequenceDiagram
    actor App as Application
    participant Dec as Decorator
    participant Tele as Telemetry System
    participant EM as Entity Manager
    participant DB as Database
    participant OTEL as OpenTelemetry

    App->>Dec: Call decorated transactional function
    Dec->>Tele: Request tracer/span
    Tele->>OTEL: Start span (db.transaction.<name>)
    Dec->>EM: begin_transaction()
    EM->>DB: BEGIN
    DB-->>EM: OK
    Dec->>App: Execute function body (calls EM queries)
    App->>EM: execute_query() / lock()
    EM->>DB: SQL statements
    DB-->>EM: Results / errors
    EM->>Tele: Report metrics/events (query duration, lock wait)
    Tele->>OTEL: Emit metrics & events
    alt Success
        Dec->>EM: commit_transaction()
        EM->>DB: COMMIT
        DB-->>EM: OK
        Tele->>OTEL: Mark span OK, add commit event
    else Exception (lock/deadlock/timeout)
        EM->>Dec: raise exception
        Dec->>Tele: Classify contention, increment retry metadata
        Dec->>EM: rollback_transaction()
        EM->>DB: ROLLBACK
        DB-->>EM: OK
        Tele->>OTEL: Record error, retry_count, and contention event
    end
    Dec->>OTEL: End span
    OTEL-->>Dec: Closed
    Dec-->>App: Return or raise

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Focus review on:
- telemetry/src/telemetry/contention.py — DB-specific SQL, fallbacks, deadlock parsing, and safe defaults.
- telemetry/src/telemetry/system.py — instrumentation wrappers, metric creation, lifecycle and thread-safety.
- telemetry/src/telemetry/decorators.py — transaction state handling, exception classification for contention/retries.
- telemetry/src/telemetry_plugin.py — plugin dependency declarations and load lifecycle.
- Examples and tests for realistic usage and graceful degradation when OpenTelemetry is absent.

Poem

🐰 I hopped into traces, swift and spry,
Counting locks and spans that flutter by.
From WAL to InnoDB, each query I spy,
I nibble contention and watch metrics fly.
A telemetry carrot — observability pie. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.09% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main change: introducing an OpenTelemetry plugin for database contention detection. It is specific, concise, and directly reflects the primary objective of the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/opentelemetry-db-contention-AEbmu

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (17)

telemetry/src/telemetry/test.py (1)

142-142: Consider extracting the exception message.

The error message "Test error" could be extracted to a constant for better maintainability, though this is a test-only concern.

Based on static analysis hints.
telemetry/src/telemetry_plugin.py (1)
46-58: Consider adding ClassVar annotations for mutable class attributes.

Static analysis suggests annotating these mutable class attributes with typing.ClassVar to indicate they are class-level rather than instance-level. However, this may be a Colony framework convention that doesn't require type hints.
Proposed enhancement
+from typing import ClassVar
+
 class TelemetryPlugin(colony.Plugin):
     ...
-    platforms = [colony.CPYTHON_ENVIRONMENT]
-    capabilities = ["telemetry"]
-    dependencies = [
+    platforms: ClassVar = [colony.CPYTHON_ENVIRONMENT]
+    capabilities: ClassVar = ["telemetry"]
+    dependencies: ClassVar = [
         ...
     ]
-    main_modules = ["telemetry"]
+    main_modules: ClassVar = ["telemetry"]
Based on static analysis hints.
telemetry/src/telemetry/system.py (1)
321-324: Consider restructuring the return statement.

The return statement at line 324 could be moved to an else block for better code structure, though the current implementation is functionally correct.
Proposed refactor
                 try:
                     result = original_execute_query(query, close_cursor)
                     span.set_status(Status(StatusCode.OK))
-                    return result
-
-                except Exception as e:
+                except Exception as e:
+                    # error handling...
+                    raise
+                else:
+                    return result
Based on static analysis hints.
telemetry/src/telemetry/decorators.py (5)
81-84: Bare except swallows all exceptions silently.

Using a bare except with pass hides failures when obtaining the tracer. This can mask issues like misconfigured OpenTelemetry or unexpected errors. Consider catching specific exceptions or at minimum logging a debug message.
🔎 Proposed fix
             if OTEL_AVAILABLE and trace_enabled:
                 try:
                     tracer = trace.get_tracer("colony.entity_manager")
-                except:
-                    pass
+                except Exception:
+                    # Tracer initialization failed; proceed without tracing
+                    tracer = None
91-94: Bare except when retrieving engine name.

Similar to the tracer acquisition, this bare except silently catches all exceptions. The fallback to "unknown" is reasonable, but consider catching Exception explicitly for clarity.
🔎 Proposed fix
                 try:
                     engine_name = self_value.entity_manager.engine.get_engine_name()
-                except:
+                except Exception:
                     engine_name = "unknown"
106-106: Unused ctx_token variable.

The result of trace.set_span_in_context(span) is assigned but never used. If context propagation is intended, this token should be used to activate the context. Otherwise, consider removing it. Note: start_as_current_span context manager (used in monitored_query) handles this automatically.
🔎 Proposed fix - remove if not needed
-                ctx_token = trace.set_span_in_context(span)
+                # Note: span is manually managed; context propagation not required here
150-155: retry_count is misleading—increment without retry logic.

The retry_count is incremented on contention detection and recorded as a span attribute, but the decorator does not implement actual retry logic. This could mislead users into thinking retries are handled automatically. Consider renaming to contention_detected (boolean) or documenting that retry logic is the caller's responsibility.
🔎 Proposed fix - clarify intent
-                    if is_contention:
-                        span.set_attribute("db.transaction.contention", True)
-                        span.add_event("contention_detected", {"error.type": type(e).__name__})
-                        retry_count += 1
-
-                    span.set_attribute("db.transaction.retry_count", retry_count)
+                    if is_contention:
+                        span.set_attribute("db.transaction.contention", True)
+                        span.set_attribute("db.transaction.retry_suggested", True)
+                        span.add_event("contention_detected", {"error.type": type(e).__name__})
174-187: Unused *args, **kwargs in decorator signature.

The decorator function accepts *args, **kwargs but never uses them. This appears to be a remnant of boilerplate. Remove to avoid confusion.
🔎 Proposed fix
-    def decorator(function, *args, **kwargs):
+    def decorator(function):
         """
         The decorator function for the instrumented transaction decorator.

         :type function: Function
         :param function: The function to be decorated.
         :rtype: Function
         :return: The decorator interceptor function.
         """

         decorator_interceptor_function = create_decorator_interceptor(function)
         return decorator_interceptor_function
telemetry/example_mysql.py (4)
33-40: Hardcoded credentials in example code.

The example uses hardcoded credentials (root/root). While acceptable for demonstration purposes, consider adding a comment noting this is example-only and should use environment variables in production, or use placeholder values like "your_password".
🔎 Proposed improvement
     # Create MySQL entity manager
+    # NOTE: Use environment variables for credentials in production
     entity_manager = entity_manager_plugin.load_entity_manager_properties("mysql", {
         "id": "mysql_database",
         "host": "localhost",
         "port": 3306,
-        "user": "root",
-        "password": "root",
+        "user": os.environ.get("MYSQL_USER", "root"),
+        "password": os.environ.get("MYSQL_PASSWORD", ""),
         "database": "example_db",
         "isolation": "read committed",
     })
119-119: f-string without placeholders.

Line 119 uses an f-string but contains no interpolation. Remove the f prefix.
🔎 Proposed fix
-            print(f"\nDeadlock/Block detected:")
+            print("\nDeadlock/Block detected:")
164-164: f-string without placeholders.

Same issue as above—remove the f prefix.
🔎 Proposed fix
-            print(f"\nSlow query detected:")
+            print("\nSlow query detected:")
280-318: Monitor thread lacks graceful shutdown mechanism.

The monitor function runs an infinite while True loop with no exit condition. Consider adding a shutdown flag or using threading.Event for graceful termination. The broad exception handling is acceptable here to keep the monitoring thread alive.
🔎 Proposed improvement for graceful shutdown
-def monitor_mysql_periodically(entity_manager, telemetry_plugin, interval=30):
+def monitor_mysql_periodically(entity_manager, telemetry_plugin, interval=30, stop_event=None):
     """
     Sets up periodic MySQL monitoring for production environments.
     ...
+    :type stop_event: threading.Event
+    :param stop_event: Optional event to signal shutdown.
     """

     import time
     import threading

     detector = telemetry_plugin.get_contention_detector("mysql")
+    if stop_event is None:
+        stop_event = threading.Event()

     def monitor():
         print(f"\nStarted MySQL monitoring (checking every {interval}s)")

-        while True:
+        while not stop_event.is_set():
             try:
                 # ... monitoring logic ...
-                time.sleep(interval)
+                stop_event.wait(interval)
             except Exception as e:
                 print(f"Error in monitoring thread: {e}")
-                time.sleep(interval)
+                stop_event.wait(interval)
telemetry/src/telemetry/contention.py (2)
383-385: Silent exception swallowing in fallback path.

The try-except-pass pattern silently ignores why the MySQL 8.0+ query failed. Consider logging at debug level to aid troubleshooting version compatibility issues.
🔎 Proposed fix
         except Exception:
             # Fallback to information_schema for older MySQL versions
-            pass
+            # Note: sys.innodb_lock_waits may not exist in MySQL < 8.0
+            pass  # Proceed to legacy fallback
Or if the telemetry system supports debug logging:
-        except Exception:
+        except Exception as e:
             # Fallback to information_schema for older MySQL versions
-            pass
+            self.telemetry_system.plugin.debug(
+                "sys.innodb_lock_waits unavailable, using legacy query: %s" % str(e)
+            )
512-519: Silent exception when fetching deadlock count.

The nested try-except-pass silently ignores errors fetching the deadlock count from GLOBAL_STATUS. This could hide permission issues or MySQL configuration problems.
🔎 Proposed fix
             try:
                 cursor = entity_manager.execute_query(deadlock_query)
                 row = cursor.fetchone()
                 if row:
                     stats["deadlocks"] = int(row[0]) if row[0] else 0
                 cursor.close()
-            except Exception:
-                pass
+            except Exception as e:
+                # GLOBAL_STATUS may not be accessible; deadlock count unavailable
+                stats["deadlocks"] = None
telemetry/example_usage.py (3)
32-39: Hardcoded credentials in example code.

Similar to the MySQL example, consider adding a note about using environment variables in production.
🔎 Proposed improvement
-    # Create entity manager with your configuration
+    # Create entity manager with your configuration
+    # NOTE: Use environment variables for credentials in production
     entity_manager = entity_manager_plugin.load_entity_manager_properties("pgsql", {
         "id": "example_database",
         "host": "localhost",
-        "user": "postgres",
-        "password": "postgres",
+        "user": os.environ.get("POSTGRES_USER", "postgres"),
+        "password": os.environ.get("POSTGRES_PASSWORD", ""),
         "database": "example_db",
         "isolation": "serializable",
     })
129-129: f-strings without placeholders at lines 129, 142, 164.

Multiple f-strings lack interpolation. Remove the f prefix from these lines.
🔎 Proposed fixes
-            print(f"\nBlocking situation detected:")
+            print("\nBlocking situation detected:")
-            print(f"\nLock wait detected:")
+            print("\nLock wait detected:")
-            print(f"\nSlow query detected:")
+            print("\nSlow query detected:")
229-254: Monitor thread lacks graceful shutdown mechanism.

Same issue as the MySQL example—the infinite loop has no exit condition. Consider adding a stop_event parameter for graceful termination.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 422bca6 and 89ea4a7.

📒 Files selected for processing (12)

requirements.txt
telemetry/MYSQL_SUPPORT.md
telemetry/QUICK_START.md
telemetry/README.md
telemetry/example_mysql.py
telemetry/example_usage.py
telemetry/src/telemetry/__init__.py
telemetry/src/telemetry/contention.py
telemetry/src/telemetry/decorators.py
telemetry/src/telemetry/system.py
telemetry/src/telemetry/test.py
telemetry/src/telemetry_plugin.py

🧰 Additional context used

🧬 Code graph analysis (7)

telemetry/src/telemetry/test.py (2)

telemetry/src/telemetry/decorators.py (1)

transaction (42-187)

telemetry/src/telemetry/contention.py (10)

ContentionDetector (32-88)

get_blocking_queries (51-62)

get_blocking_queries (97-166)

get_blocking_queries (339-426)

get_lock_waits (64-75)

get_lock_waits (168-212)

get_lock_waits (428-466)

get_transaction_stats (77-88)

get_transaction_stats (214-273)

get_transaction_stats (468-527)

telemetry/example_mysql.py (3)

data/src/entity_manager_plugin.py (1)

load_entity_manager_properties (78-91)

telemetry/src/telemetry/system.py (2)

instrument_entity_manager (226-461)

get_contention_detector (463-478)

telemetry/src/telemetry/contention.py (9)

get_blocking_queries (51-62)

get_blocking_queries (97-166)

get_blocking_queries (339-426)

get_lock_waits (64-75)

get_lock_waits (168-212)

get_lock_waits (428-466)

get_slow_queries (275-329)

get_slow_queries (529-581)

get_deadlock_info (583-625)

telemetry/src/telemetry_plugin.py (2)

telemetry/src/telemetry/system.py (4)

Telemetry (52-478)

instrument_entity_manager (226-461)

get_contention_detector (463-478)

configure (198-224)

telemetry/src/telemetry/decorators.py (1)

transaction (42-187)

telemetry/src/telemetry/contention.py (1)

telemetry/src/telemetry/test.py (1)

warning (161-162)

telemetry/src/telemetry/system.py (2)

telemetry/src/telemetry_plugin.py (1)

instrument_entity_manager (68-79)

telemetry/src/telemetry/contention.py (3)

PgSQLContentionDetector (91-329)

MySQLContentionDetector (332-625)

ContentionDetector (32-88)

telemetry/src/telemetry/decorators.py (3)

telemetry/src/telemetry/test.py (6)

begin (90-91)

begin (127-128)

commit (93-94)

commit (130-131)

rollback (96-97)

rollback (133-134)

threads/src/thread_pool/system.py (1)

set_status (947-955)

template_engine/src/template_engine/system.py (1)

end (597-598)

telemetry/src/telemetry/__init__.py (3)

telemetry/src/telemetry/contention.py (2)

ContentionDetector (32-88)

PgSQLContentionDetector (91-329)

telemetry/src/telemetry/system.py (1)

Telemetry (52-478)

telemetry/src/telemetry/test.py (1)

TelemetryTest (34-49)

🪛 GitHub Actions: Main Workflow

requirements.txt

[error] 1-1: pip install failed: No matching distribution found for opentelemetry-api>=1.20.0.

🪛 Ruff (0.14.8)

telemetry/src/telemetry/test.py

142-142: Avoid specifying long messages outside the exception class

(TRY003)

telemetry/example_mysql.py

68-68: Undefined name Warehouse

(F821)

75-75: Undefined name Warehouse

(F821)

93-93: Do not catch blind exception: Exception

(BLE001)

119-119: f-string without any placeholders

Remove extraneous f prefix

(F541)

164-164: f-string without any placeholders

Remove extraneous f prefix

(F541)

311-311: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry_plugin.py

46-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

47-47: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

48-57: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

58-58: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

telemetry/src/telemetry/contention.py

51-51: Unused method argument: entity_manager

(ARG002)

64-64: Unused method argument: entity_manager

(ARG002)

77-77: Unused method argument: entity_manager

(ARG002)

160-160: Consider moving this statement to an else block

(TRY300)

162-162: Do not catch blind exception: Exception

(BLE001)

206-206: Consider moving this statement to an else block

(TRY300)

208-208: Do not catch blind exception: Exception

(BLE001)

267-267: Consider moving this statement to an else block

(TRY300)

269-269: Do not catch blind exception: Exception

(BLE001)

323-323: Consider moving this statement to an else block

(TRY300)

325-325: Do not catch blind exception: Exception

(BLE001)

381-381: Consider moving this statement to an else block

(TRY300)

383-385: try-except-pass detected, consider logging the exception

(S110)

383-383: Do not catch blind exception: Exception

(BLE001)

420-420: Consider moving this statement to an else block

(TRY300)

422-422: Do not catch blind exception: Exception

(BLE001)

460-460: Consider moving this statement to an else block

(TRY300)

462-462: Do not catch blind exception: Exception

(BLE001)

518-519: try-except-pass detected, consider logging the exception

(S110)

518-518: Do not catch blind exception: Exception

(BLE001)

521-521: Consider moving this statement to an else block

(TRY300)

523-523: Do not catch blind exception: Exception

(BLE001)

575-575: Consider moving this statement to an else block

(TRY300)

577-577: Do not catch blind exception: Exception

(BLE001)

619-619: Consider moving this statement to an else block

(TRY300)

621-621: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry/system.py

324-324: Consider moving this statement to an else block

(TRY300)

telemetry/example_usage.py

67-67: Undefined name Person

(F821)

73-73: Undefined name Address

(F821)

89-89: Undefined name Person

(F821)

92-92: Undefined name Person

(F821)

112-112: Do not catch blind exception: Exception

(BLE001)

129-129: f-string without any placeholders

Remove extraneous f prefix

(F541)

142-142: f-string without any placeholders

Remove extraneous f prefix

(F541)

164-164: f-string without any placeholders

Remove extraneous f prefix

(F541)

248-248: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry/decorators.py

83-83: Do not use bare except

(E722)

83-84: try-except-pass detected, consider logging the exception

(S110)

93-93: Do not use bare except

(E722)

106-106: Local variable ctx_token is assigned to but never used

Remove assignment to unused variable ctx_token

(F841)

174-174: Unused function argument: args

(ARG001)

174-174: Unused function argument: kwargs

(ARG001)

220-220: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (18)

telemetry/README.md (1)

1-294: Well-structured documentation.

The README provides comprehensive coverage of features, configuration, usage patterns, metrics, and troubleshooting. The examples for both PostgreSQL and MySQL are clear and helpful.

telemetry/QUICK_START.md (1)

1-184: Effective quick-start guide.

The guide provides a clear path from installation through basic usage, with practical examples and troubleshooting advice.

telemetry/MYSQL_SUPPORT.md (1)

1-328: Comprehensive MySQL support documentation.

This document provides excellent coverage of MySQL-specific features, including version compatibility, error detection, and InnoDB monitoring. The compatibility matrix and examples are particularly helpful.

telemetry/src/telemetry/__init__.py (1)

1-44: Clean module initialization.

The module structure follows Python best practices with appropriate exports for the public API.

telemetry/src/telemetry/test.py (1)

57-174: Solid test coverage for core functionality.

The tests appropriately cover plugin loading, decorator behavior with and without OpenTelemetry, exception handling with rollback, and base detector functionality using mock objects.

telemetry/src/telemetry_plugin.py (1)

60-112: Well-structured plugin implementation.

The plugin follows Colony patterns correctly and provides a clean API surface for instrumentation, configuration, and contention detection.

telemetry/src/telemetry/system.py (6)

38-49: Proper graceful degradation for optional dependency.

The try/except block correctly handles the case where OpenTelemetry packages are not installed, setting OTEL_AVAILABLE flag appropriately.

86-151: Robust default configuration with fallback options.

The initialization logic properly checks for external configuration, falls back to OTLP or console exporters, and handles missing dependencies gracefully.

153-196: Clean metrics initialization.

All metric instruments are properly created with appropriate names, descriptions, and units following OpenTelemetry conventions.

336-355: Comprehensive contention detection with MySQL support.

The error detection logic properly identifies MySQL error code 1213 (deadlock) and falls back to string matching for other databases. The contention type classification (deadlock, lock_timeout, serialization_failure, lock_wait) is well-structured.

226-461: Thorough entity manager instrumentation.

The instrumentation wraps all critical entity manager methods (begin, commit, rollback, execute_query, lock, lock_table) with proper tracing spans, metrics recording, and error handling. The implementation correctly tracks transaction lifetimes and detects slow queries.

463-478: Clean contention detector factory.

The factory method provides appropriate detector instances based on the database engine.

requirements.txt (1)

2-4: The OpenTelemetry packages specified in requirements.txt are available at the requested versions. Latest versions are 1.39.1, released in December 2025, which exceeds the >=1.20.0 constraint. No changes needed.

Likely an incorrect or invalid review comment.

telemetry/src/telemetry/decorators.py (2)

162-170: return_value may be referenced before assignment.

If an exception occurs, the finally block executes and then raise re-throws the exception. However, the return return_value on line 170 is unreachable after a raise. This is actually safe because the raise on line 160 exits before reaching line 170, but the code structure is slightly confusing.

The current flow is correct—no fix needed.

190-231: LGTM!

The monitored_query decorator is well-structured, using the start_as_current_span context manager correctly for automatic span lifecycle management. The graceful degradation when OTEL is unavailable is properly implemented.

telemetry/example_mysql.py (1)

320-347: LGTM!

The main() function provides clear guidance on MySQL-specific features and usage instructions. The documentation within example_mysql_best_practices() is informative and aligns with MySQL/InnoDB best practices.

telemetry/src/telemetry/contention.py (1)

32-89: LGTM!

The base ContentionDetector class provides a clean interface with sensible default implementations (empty list/dict). The unused entity_manager parameters in base methods are intentional to define the interface contract for subclasses.

telemetry/example_usage.py (1)

259-281: LGTM!

The main() function provides clear usage instructions and documentation references. The overall example structure is well-organized and educational.

coderabbitai · 2025-12-22T07:40:59Z

telemetry/example_mysql.py

+            # Lock source warehouse (this order of locking can cause deadlocks)
+            self.entity_manager.lock(Warehouse, from_warehouse)
+
+            # Simulate some processing
+            import time
+            time.sleep(0.1)
+
+            # Lock destination warehouse
+            self.entity_manager.lock(Warehouse, to_warehouse)


⚠️ Potential issue | 🟡 Minor

Undefined Warehouse class referenced.

The Warehouse class is used but never defined or imported. For a runnable example, consider either defining a stub class or adding a comment explaining this is pseudocode.

🔎 Proposed fix - add stub or comment

+# Note: Warehouse is a placeholder entity class representing your domain model +# In a real application, this would be an Entity subclass defined in your codebase + class OrderService: def __init__(self, entity_manager): self.entity_manager = entity_manager

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.8)

68-68: Undefined name Warehouse

(F821)

75-75: Undefined name Warehouse

(F821)

🤖 Prompt for AI Agents

In telemetry/example_mysql.py around lines 67 to 75 the code references an undefined Warehouse class when locking warehouses; either define a minimal stub Warehouse class in this module (e.g., a simple dataclass or class with an id/name) above these lines, or add an explicit import from the module where Warehouse is defined, or replace the reference with a clear comment that this is pseudocode and requires a real Warehouse model in real usage; implement one of these fixes so the example is runnable and the reference is resolved.

coderabbitai · 2025-12-22T07:41:00Z

telemetry/example_usage.py

+    class UserService:
+        def __init__(self, entity_manager):
+            self.entity_manager = entity_manager
+
+        @decorators.transaction()
+        def create_user_with_address(self, user_data, address_data):
+            """
+            Creates a user and address in a single transaction.
+            This will be automatically traced.
+            """
+
+            # Create user entity
+            user = Person()
+            user.name = user_data["name"]
+            user.email = user_data["email"]
+            self.entity_manager.save(user)
+
+            # Create address entity
+            address = Address()
+            address.street = address_data["street"]
+            address.city = address_data["city"]
+            address.person = user
+            self.entity_manager.save(address)
+
+            return user
+
+        @decorators.transaction()
+        def update_user_email(self, user_id, new_email):
+            """
+            Updates a user's email.
+            Demonstrates lock acquisition tracking.
+            """
+
+            # Lock the user record
+            self.entity_manager.lock(Person, user_id)
+
+            # Find and update
+            user = self.entity_manager.get(Person, user_id)
+            user.email = new_email
+            self.entity_manager.update(user)
+
+            return user
+


⚠️ Potential issue | 🟡 Minor

Undefined Person and Address classes referenced.

These entity classes are used but never defined or imported. Add a comment clarifying these are placeholder domain entities.

🔎 Proposed fix

+ # Note: Person and Address are placeholder entity classes + # In a real application, these would be Entity subclasses from your domain model class UserService: def __init__(self, entity_manager): self.entity_manager = entity_manager

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class UserService:

def __init__(self, entity_manager):

self.entity_manager = entity_manager

@decorators.transaction()

def create_user_with_address(self, user_data, address_data):

"""

Creates a user and address in a single transaction.

This will be automatically traced.

"""

# Create user entity

user = Person()

user.name = user_data["name"]

user.email = user_data["email"]

self.entity_manager.save(user)

# Create address entity

address = Address()

address.street = address_data["street"]

address.city = address_data["city"]

address.person = user

self.entity_manager.save(address)

return user

@decorators.transaction()

def update_user_email(self, user_id, new_email):

"""

Updates a user's email.

Demonstrates lock acquisition tracking.

"""

# Lock the user record

self.entity_manager.lock(Person, user_id)

# Find and update

user = self.entity_manager.get(Person, user_id)

user.email = new_email

self.entity_manager.update(user)

return user

# Note: Person and Address are placeholder entity classes

# In a real application, these would be Entity subclasses from your domain model

class UserService:

def __init__(self, entity_manager):

self.entity_manager = entity_manager

@decorators.transaction()

def create_user_with_address(self, user_data, address_data):

"""

Creates a user and address in a single transaction.

This will be automatically traced.

"""

# Create user entity

user = Person()

user.name = user_data["name"]

user.email = user_data["email"]

self.entity_manager.save(user)

# Create address entity

address = Address()

address.street = address_data["street"]

address.city = address_data["city"]

address.person = user

self.entity_manager.save(address)

return user

@decorators.transaction()

def update_user_email(self, user_id, new_email):

"""

Updates a user's email.

Demonstrates lock acquisition tracking.

"""

# Lock the user record

self.entity_manager.lock(Person, user_id)

# Find and update

user = self.entity_manager.get(Person, user_id)

user.email = new_email

self.entity_manager.update(user)

return user

🧰 Tools

🪛 Ruff (0.14.8)

67-67: Undefined name Person

(F821)

73-73: Undefined name Address

(F821)

89-89: Undefined name Person

(F821)

92-92: Undefined name Person

(F821)

🤖 Prompt for AI Agents

In telemetry/example_usage.py around lines 55 to 97, the code references undefined domain entity classes Person and Address; add a short comment near the top of this snippet (or import area) stating that Person and Address are placeholder domain entities used for example purposes and should be replaced with real model definitions or imports in production, or alternatively import/define minimal stub classes for the example so the sample code is self-contained and clear to readers.

coderabbitai · 2025-12-22T07:41:00Z

telemetry/example_usage.py

+            # Complex query that might be slow
+            query = """
+            SELECT u.name, u.email, COUNT(o.id) as order_count
+            FROM users u
+            LEFT JOIN orders o ON o.user_id = u.id
+            WHERE o.created_at BETWEEN %s AND %s
+            GROUP BY u.id, u.name, u.email
+            ORDER BY order_count DESC
+            LIMIT 100
+            """
+
+            cursor = self.entity_manager.execute_query(query % (start_date, end_date))
+            results = cursor.fetchall()
+            cursor.close()
+
+            return results


⚠️ Potential issue | 🟡 Minor

SQL injection vulnerability in example code.

The generate_user_report method uses string interpolation (%) to build the query with start_date and end_date. Even in example code, this demonstrates an insecure pattern that users might copy. Consider using parameterized queries or adding a security warning comment.

🔎 Proposed fix

# Complex query that might be slow + # WARNING: Use parameterized queries in production to prevent SQL injection query = """ SELECT u.name, u.email, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id - WHERE o.created_at BETWEEN %s AND %s + WHERE o.created_at BETWEEN ? AND ? GROUP BY u.id, u.name, u.email ORDER BY order_count DESC LIMIT 100 """ - cursor = self.entity_manager.execute_query(query % (start_date, end_date)) + # Use parameterized query (syntax depends on your database driver) + cursor = self.entity_manager.execute_query(query, params=(start_date, end_date))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In telemetry/example_usage.py around lines 190 to 205, the SQL is built via Python string interpolation which exposes an SQL injection risk; replace the string interpolation with a parameterized query call using the DB driver's parameter substitution (e.g., pass the query with placeholders and a tuple of (start_date, end_date) to execute), or use the entity_manager/ORM parameter binding API if available, and add a brief comment warning not to interpolate user input directly into SQL.

coderabbitai · 2025-12-22T07:41:00Z

telemetry/QUICK_START.md

+```bash
+pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
+```


⚠️ Potential issue | 🔴 Critical

Update installation command with correct package versions.

This installation command will fail due to the invalid version constraint (>=1.20.0) mentioned in requirements.txt. Update once the correct versions are determined.

🤖 Prompt for AI Agents

In telemetry/QUICK_START.md around lines 9–11, the pip install example uses unpinned packages but conflicts with the version constraints in requirements.txt (the >=1.20.0 constraint is causing install failures); update the command to use the concrete, compatible package versions listed in requirements.txt (or valid semver specifiers that match those constraints), e.g. replace the current line with a pip install that pins each opentelemetry package to the exact versions from requirements.txt or to compatible >=/<== ranges that are valid.

coderabbitai · 2025-12-22T07:41:00Z

telemetry/README.md

+```bash
+pip install opentelemetry-api opentelemetry-sdk
+```


⚠️ Potential issue | 🔴 Critical

Update installation instructions with correct package versions.

The installation command references OpenTelemetry packages, but the version constraint in requirements.txt (>=1.20.0) appears to be invalid based on the pipeline failure. Once the correct versions are verified, ensure this documentation is updated accordingly.

🤖 Prompt for AI Agents

In telemetry/README.md around lines 19-21, the install instruction lists opentelemetry packages without correct version constraints that conflict with requirements.txt (the >=1.20.0 constraint appears invalid and caused pipeline failures); verify the exact, published OpenTelemetry package versions on PyPI that are compatible with the project, then update the README install command to include those explicit, working version(s) that match requirements.txt (or alternatively fix requirements.txt to a valid constraint) so docs and CI are consistent.

coderabbitai · 2025-12-22T07:41:00Z

telemetry/src/telemetry/contention.py

+    def get_slow_queries(self, entity_manager, duration_threshold="5 seconds"):
+        """
+        Returns currently executing slow queries.
+
+        :type entity_manager: EntityManager
+        :param entity_manager: The entity manager to query.
+        :type duration_threshold: String
+        :param duration_threshold: PostgreSQL interval for slow query threshold.
+        :rtype: list
+        :return: List of slow query information.
+        """
+
+        query = """
+        SELECT
+            pid,
+            usename,
+            application_name,
+            client_addr,
+            state,
+            query,
+            NOW() - query_start AS duration,
+            wait_event_type,
+            wait_event
+        FROM pg_stat_activity
+        WHERE state != 'idle'
+          AND query_start IS NOT NULL
+          AND NOW() - query_start > interval '%s'
+        ORDER BY duration DESC;
+        """ % duration_threshold


⚠️ Potential issue | 🟠 Major

SQL injection vulnerability via string interpolation.

The duration_threshold parameter is directly interpolated into the SQL query using % string formatting. If user input reaches this parameter, it could lead to SQL injection. Use parameterized queries instead.

🔎 Proposed fix

The fix depends on whether the entity manager's execute_query supports parameterized queries. If it does:

query = """ SELECT pid, usename, application_name, client_addr, state, query, NOW() - query_start AS duration, wait_event_type, wait_event FROM pg_stat_activity WHERE state != 'idle' AND query_start IS NOT NULL - AND NOW() - query_start > interval '%s' + AND NOW() - query_start > interval %s ORDER BY duration DESC; - """ % duration_threshold + """ + # Pass duration_threshold as a parameter + cursor = entity_manager.execute_query(query, params=(duration_threshold,))

If parameterized queries aren't supported, validate the input:

import re if not re.match(r'^\d+\s*(second|seconds|minute|minutes|hour|hours)$', duration_threshold, re.IGNORECASE): raise ValueError("Invalid duration_threshold format")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In telemetry/src/telemetry/contention.py around lines 275 to 303, the duration_threshold is directly interpolated into the SQL string causing a SQL injection risk; change the implementation to use a parameterized query (pass the interval value as a bound parameter to entity_manager.execute_query) or, if the entity manager does not support parameter binding for interval literals, validate and whitelist the input format (e.g., allow only numeric value + unit like "5 seconds|minutes|hours" via a strict regex) and raise ValueError on invalid input before formatting into the SQL.

coderabbitai · 2025-12-22T07:41:00Z

telemetry/src/telemetry/contention.py

+    def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):
+        """
+        Returns currently executing slow queries from processlist.
+
+        :type entity_manager: EntityManager
+        :param entity_manager: The entity manager to query.
+        :type duration_threshold_seconds: int
+        :param duration_threshold_seconds: Minimum query duration in seconds.
+        :rtype: list
+        :return: List of slow query information.
+        """
+
+        query = """
+        SELECT
+            ID as pid,
+            USER as user,
+            HOST as host,
+            DB as db,
+            COMMAND as command,
+            TIME as duration_seconds,
+            STATE as state,
+            INFO as query
+        FROM information_schema.PROCESSLIST
+        WHERE COMMAND != 'Sleep'
+          AND TIME > %d
+          AND INFO IS NOT NULL
+        ORDER BY TIME DESC;
+        """ % duration_threshold_seconds


⚠️ Potential issue | 🟠 Major

SQL injection vulnerability via string interpolation.

Similar to the PostgreSQL detector, duration_threshold_seconds is directly interpolated into the SQL query. Although the docstring specifies int, the type is not enforced at runtime.

🔎 Proposed fix

- def get_slow_queries(self, entity_manager, duration_threshold_seconds=5): + def get_slow_queries(self, entity_manager, duration_threshold_seconds=5): + # Ensure parameter is an integer to prevent SQL injection + duration_threshold_seconds = int(duration_threshold_seconds) + """ Returns currently executing slow queries from processlist. ...

Or use parameterized queries if supported by the entity manager.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):

"""

Returns currently executing slow queries from processlist.

:type entity_manager: EntityManager

:param entity_manager: The entity manager to query.

:type duration_threshold_seconds: int

:param duration_threshold_seconds: Minimum query duration in seconds.

:rtype: list

:return: List of slow query information.

"""

query = """

SELECT

ID as pid,

USER as user,

HOST as host,

DB as db,

COMMAND as command,

TIME as duration_seconds,

STATE as state,

INFO as query

FROM information_schema.PROCESSLIST

WHERE COMMAND != 'Sleep'

AND TIME > %d

AND INFO IS NOT NULL

ORDER BY TIME DESC;

""" % duration_threshold_seconds

def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):

"""

Returns currently executing slow queries from processlist.

:type entity_manager: EntityManager

:param entity_manager: The entity manager to query.

:type duration_threshold_seconds: int

:param duration_threshold_seconds: Minimum query duration in seconds.

:rtype: list

:return: List of slow query information.

"""

# Ensure parameter is an integer to prevent SQL injection

duration_threshold_seconds = int(duration_threshold_seconds)

query = """

SELECT

ID as pid,

USER as user,

HOST as host,

DB as db,

COMMAND as command,

TIME as duration_seconds,

STATE as state,

INFO as query

FROM information_schema.PROCESSLIST

WHERE COMMAND != 'Sleep'

AND TIME > %d

AND INFO IS NOT NULL

ORDER BY TIME DESC;

""" % duration_threshold_seconds

🤖 Prompt for AI Agents

In telemetry/src/telemetry/contention.py around lines 529 to 556, the SQL builds a query by interpolating duration_threshold_seconds directly into the string which allows SQL injection; change this to use a parameterized query or at minimum enforce an integer cast before using it. Replace the string interpolation with a parameter placeholder supported by the entity_manager (e.g. "%s" or "?" depending on its DB API) and pass duration_threshold_seconds as a bound parameter to entity_manager.execute; if parameterization is not possible, explicitly validate/cast duration_threshold_seconds = int(duration_threshold_seconds) and use that integer in the query string.

coderabbitai · 2025-12-22T07:41:00Z

telemetry/src/telemetry/contention.py

+    def get_deadlock_info(self, entity_manager):
+        """
+        Returns recent deadlock information from InnoDB status.
+
+        :type entity_manager: EntityManager
+        :param entity_manager: The entity manager to query.
+        :rtype: dict
+        :return: Deadlock information.
+        """
+
+        query = "SHOW ENGINE INNODB STATUS"
+
+        try:
+            cursor = entity_manager.execute_query(query)
+            row = cursor.fetchone()
+
+            if row and len(row) > 2:
+                status_text = row[2]
+
+                # Parse the latest deadlock information
+                deadlock_info = {}
+                if "LATEST DETECTED DEADLOCK" in status_text:
+                    # Extract deadlock section
+                    start = status_text.find("LATEST DETECTED DEADLOCK")
+                    end = status_text.find("------------", start + 1)
+                    if end > start:
+                        deadlock_section = status_text[start:end]
+                        deadlock_info["has_recent_deadlock"] = True
+                        deadlock_info["deadlock_text"] = deadlock_section[:500]  # First 500 chars
+                else:
+                    deadlock_info["has_recent_deadlock"] = False
+
+                cursor.close()
+                return deadlock_info
+
+            cursor.close()
+            return {}
+
+        except Exception as e:
+            self.telemetry_system.plugin.warning(
+                "Failed to get deadlock info: %s" % str(e)
+            )
+            return {}


⚠️ Potential issue | 🟡 Minor

Deadlock parsing may miss section boundary.

The parsing logic looks for "------------" as the end marker, but InnoDB status sections can have varying delimiter counts. If the delimiter isn't found exactly, end becomes -1, causing end > start to fail, and deadlock_info remains empty (not even has_recent_deadlock: False).

🔎 Proposed fix

if "LATEST DETECTED DEADLOCK" in status_text: # Extract deadlock section start = status_text.find("LATEST DETECTED DEADLOCK") end = status_text.find("------------", start + 1) - if end > start: + if end > start and end != -1: deadlock_section = status_text[start:end] deadlock_info["has_recent_deadlock"] = True deadlock_info["deadlock_text"] = deadlock_section[:500] # First 500 chars + else: + # Section found but couldn't determine end; still flag as detected + deadlock_info["has_recent_deadlock"] = True + deadlock_info["deadlock_text"] = status_text[start:start+500] else: deadlock_info["has_recent_deadlock"] = False

🧰 Tools

🪛 Ruff (0.14.8)

619-619: Consider moving this statement to an else block

(TRY300)

621-621: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

telemetry/src/telemetry/contention.py around lines 583 to 625: the current logic looks for the exact string "------------" as the end of the deadlock section which can fail (find returns -1) and leaves deadlock_info empty; update parsing to locate the next section delimiter more robustly — after finding the "LATEST DETECTED DEADLOCK" start, search for the next line that is a sequence of hyphens of length >= 3 using a regex or a loop over lines, treat a not-found end as end-of-text, set deadlock_info["has_recent_deadlock"] = True when a section is found and always set it to False when the marker isn't present, trim the captured section to the first 500 chars, and ensure the cursor is closed in all paths (preferably via finally or context manager).

Adds complete SQLite support to the OpenTelemetry telemetry plugin for database monitoring and contention detection. SQLite's file-based architecture requires different monitoring approaches than client-server databases. SQLite-Specific Features: - Database file monitoring (path, size, page statistics) - PRAGMA-based diagnostics for database configuration - SQLITE_BUSY error detection (database locked errors) - Database integrity checks (PRAGMA integrity_check) - Optimization support (VACUUM and ANALYZE) - Lock configuration monitoring (busy_timeout, locking_mode) - Journal mode detection (DELETE, WAL, etc.) SQLiteContentionDetector Methods: - get_database_info(): File path, size, PRAGMA settings - get_lock_waits(): Lock configuration (timeout, mode) - get_transaction_stats(): Database statistics and transaction state - get_integrity_check(): Runs PRAGMA integrity_check - optimize_database(): Runs VACUUM and ANALYZE - get_blocking_queries(): Returns empty (no system tables) - get_slow_queries(): Returns empty (use OTel traces) Error Detection Enhancements (system.py): - SQLite OperationalError detection for "database is locked" - Contention type classification: "database_locked" - Enhanced error keyword detection including "busy" - Automatic SQLITE_BUSY error categorization Key Differences from PostgreSQL/MySQL: - File-based database (not client-server) - Database-level locks (not row-level) - No system tables for monitoring connections - PRAGMA statements for diagnostics - VACUUM needed for space reclamation - WAL mode recommended for concurrency Documentation: - example_sqlite.py: Complete SQLite usage examples - README.md: SQLite-specific examples and usage - Best practices for WAL mode and optimization - Monitoring strategies for file-based databases Components Updated: - contention.py: Added SQLiteContentionDetector class (338 lines) - system.py: Enhanced error detection for SQLite - __init__.py: Export SQLiteContentionDetector - README.md: SQLite examples and documentation The implementation provides appropriate monitoring for SQLite's unique characteristics while maintaining API compatibility with the PostgreSQL and MySQL detectors.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

telemetry/src/telemetry/contention.py (3)

275-303: SQL injection vulnerability remains unaddressed.

The duration_threshold parameter is still directly interpolated into the SQL query. This was flagged in a previous review. Consider validating input format or using parameterized queries.

529-556: SQL injection vulnerability via integer interpolation remains unaddressed.

The duration_threshold_seconds parameter is interpolated without explicit type enforcement. This was flagged in a previous review.

604-616: Deadlock parsing edge case remains unaddressed.

If the "------------" delimiter isn't found after the deadlock section start, end will be -1 and no has_recent_deadlock key is set in the returned dict. This was flagged in a previous review.

🧹 Nitpick comments (1)

telemetry/SQLITE_SUPPORT.md (1)
207-222: Add language specifiers to fenced code blocks.

The trace attribute code blocks at lines 207-213 and 216-222 are missing language specifiers. Consider adding text or yaml for better rendering and markdown linter compliance.
🔎 Suggested fix
 ## 🔍 Trace Attributes for SQLite
 
 ### Transaction Spans
-```
+```text
 db.system = "sqlite"
 db.transaction.type = "required"
 db.transaction.duration_ms = 125
 db.transaction.contention = true  # If SQLITE_BUSY occurred
 db.contention.type = "database_locked"  # SQLite-specific
Query Spans

- +text
db.system = "sqlite"
db.statement = "UPDATE products SET..."
db.slow_query = true
db.lock.contention = true
db.contention.type = "database_locked"

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89ea4a7 and 8135727.

📒 Files selected for processing (6)

telemetry/README.md
telemetry/SQLITE_SUPPORT.md
telemetry/example_sqlite.py
telemetry/src/telemetry/__init__.py
telemetry/src/telemetry/contention.py
telemetry/src/telemetry/system.py

🚧 Files skipped from review as they are similar to previous changes (1)

telemetry/src/telemetry/init.py

🧰 Additional context used

🧬 Code graph analysis (2)

telemetry/example_sqlite.py (3)

data/src/entity_manager_plugin.py (1)

load_entity_manager_properties (78-91)

telemetry/src/telemetry/system.py (1)

get_contention_detector (471-488)

telemetry/src/telemetry/contention.py (10)

get_database_info (637-738)

get_lock_waits (64-75)

get_lock_waits (168-212)

get_lock_waits (428-466)

get_lock_waits (755-788)

get_transaction_stats (77-88)

get_transaction_stats (214-273)

get_transaction_stats (468-527)

get_transaction_stats (790-858)

get_integrity_check (932-963)

telemetry/src/telemetry/contention.py (1)

telemetry/src/telemetry/test.py (1)

warning (161-162)

🪛 markdownlint-cli2 (0.18.1)

telemetry/SQLITE_SUPPORT.md

207-207: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

216-216: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

330-330: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

361-361: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

452-452: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

telemetry/README.md

299-299: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

🪛 Ruff (0.14.8)

telemetry/src/telemetry/system.py

324-324: Consider moving this statement to an else block

(TRY300)

telemetry/example_sqlite.py

34-34: Probable insecure usage of temporary file or directory: "/tmp/example.db"

(S108)

64-64: Undefined name Order

(F821)

76-76: Do not catch blind exception: Exception

(BLE001)

166-166: f-string without any placeholders

Remove extraneous f prefix

(F541)

295-295: Do not catch blind exception: Exception

(BLE001)

345-345: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry/contention.py

51-51: Unused method argument: entity_manager

(ARG002)

64-64: Unused method argument: entity_manager

(ARG002)

77-77: Unused method argument: entity_manager

(ARG002)

160-160: Consider moving this statement to an else block

(TRY300)

162-162: Do not catch blind exception: Exception

(BLE001)

206-206: Consider moving this statement to an else block

(TRY300)

208-208: Do not catch blind exception: Exception

(BLE001)

267-267: Consider moving this statement to an else block

(TRY300)

269-269: Do not catch blind exception: Exception

(BLE001)

323-323: Consider moving this statement to an else block

(TRY300)

325-325: Do not catch blind exception: Exception

(BLE001)

381-381: Consider moving this statement to an else block

(TRY300)

383-385: try-except-pass detected, consider logging the exception

(S110)

383-383: Do not catch blind exception: Exception

(BLE001)

420-420: Consider moving this statement to an else block

(TRY300)

422-422: Do not catch blind exception: Exception

(BLE001)

460-460: Consider moving this statement to an else block

(TRY300)

462-462: Do not catch blind exception: Exception

(BLE001)

518-519: try-except-pass detected, consider logging the exception

(S110)

518-518: Do not catch blind exception: Exception

(BLE001)

521-521: Consider moving this statement to an else block

(TRY300)

523-523: Do not catch blind exception: Exception

(BLE001)

575-575: Consider moving this statement to an else block

(TRY300)

577-577: Do not catch blind exception: Exception

(BLE001)

619-619: Consider moving this statement to an else block

(TRY300)

621-621: Do not catch blind exception: Exception

(BLE001)

662-663: try-except-pass detected, consider logging the exception

(S110)

662-662: Do not catch blind exception: Exception

(BLE001)

678-679: try-except-pass detected, consider logging the exception

(S110)

678-678: Do not catch blind exception: Exception

(BLE001)

688-689: try-except-pass detected, consider logging the exception

(S110)

688-688: Do not catch blind exception: Exception

(BLE001)

698-699: try-except-pass detected, consider logging the exception

(S110)

698-698: Do not catch blind exception: Exception

(BLE001)

709-710: try-except-pass detected, consider logging the exception

(S110)

709-709: Do not catch blind exception: Exception

(BLE001)

719-720: try-except-pass detected, consider logging the exception

(S110)

719-719: Do not catch blind exception: Exception

(BLE001)

729-730: try-except-pass detected, consider logging the exception

(S110)

729-729: Do not catch blind exception: Exception

(BLE001)

732-732: Consider moving this statement to an else block

(TRY300)

734-734: Do not catch blind exception: Exception

(BLE001)

740-740: Unused method argument: entity_manager

(ARG002)

782-782: Consider moving this statement to an else block

(TRY300)

784-784: Do not catch blind exception: Exception

(BLE001)

813-814: try-except-pass detected, consider logging the exception

(S110)

813-813: Do not catch blind exception: Exception

(BLE001)

829-830: try-except-pass detected, consider logging the exception

(S110)

829-829: Do not catch blind exception: Exception

(BLE001)

839-840: try-except-pass detected, consider logging the exception

(S110)

839-839: Do not catch blind exception: Exception

(BLE001)

849-850: try-except-pass detected, consider logging the exception

(S110)

849-849: Do not catch blind exception: Exception

(BLE001)

852-852: Consider moving this statement to an else block

(TRY300)

854-854: Do not catch blind exception: Exception

(BLE001)

860-860: Unused method argument: entity_manager

(ARG002)

860-860: Unused method argument: duration_threshold_seconds

(ARG002)

904-904: Do not catch blind exception: Exception

(BLE001)

905-905: Use explicit conversion flag

Replace with conversion flag

(RUF010)

913-913: Do not catch blind exception: Exception

(BLE001)

914-914: Use explicit conversion flag

Replace with conversion flag

(RUF010)

924-924: Consider moving this statement to an else block

(TRY300)

926-926: Do not catch blind exception: Exception

(BLE001)

953-957: Consider moving this statement to an else block

(TRY300)

959-959: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (9)

telemetry/src/telemetry/system.py (5)

249-261: LGTM!

The per-connection transaction timing tracking using id(connection) as the key is appropriate for the use case. The closure-scoped dictionary ensures isolation per instrumented entity manager.

263-303: LGTM!

The commit and rollback wrappers correctly handle timing and metrics. The finally block ensures active_transactions is decremented even if the underlying operation fails.

305-400: LGTM!

The query instrumentation is comprehensive with proper span creation, query truncation (500 chars), contention detection for multiple database engines, and configurable slow query thresholds. The error classification logic correctly identifies MySQL deadlocks (error 1213), SQLite BUSY errors, and generic lock-related errors.

401-458: LGTM!

The lock method wrappers correctly instrument both row-level and table-level locks with appropriate span attributes and duration metrics.

471-488: LGTM!

Clean factory method for creating engine-specific contention detectors with a sensible fallback to the base ContentionDetector.

telemetry/example_sqlite.py (1)

13-44: LGTM!

The setup function follows the correct Colony plugin loading pattern and demonstrates proper instrumentation of the SQLite entity manager.

telemetry/src/telemetry/contention.py (3)

637-738: LGTM!

The get_database_info method comprehensively collects SQLite database information using PRAGMA commands. The individual try-except blocks provide graceful degradation if specific PRAGMAs fail, which is appropriate for diagnostic tooling.

877-930: LGTM!

The optimize_database method correctly handles both ANALYZE and VACUUM operations with appropriate error handling. The comment noting that VACUUM cannot run inside a transaction is helpful, and failures are properly captured in the results dict.

932-963: LGTM!

The get_integrity_check method correctly implements SQLite's PRAGMA integrity_check with proper result parsing and structured return values.

coderabbitai · 2025-12-22T08:00:32Z

telemetry/example_sqlite.py

+            order = self.entity_manager.get(Order, order_id)
+            order.status = "processed"
+            self.entity_manager.update(order)


⚠️ Potential issue | 🟡 Minor

Order class is undefined.

The Order class is used but never imported or defined. This will raise a NameError if the example is executed. Consider either:

Adding a comment explaining that Order should be replaced with the user's entity class, or

Creating a placeholder class for the example.

🔎 Suggested fix

def example_sqlite_lock_handling(entity_manager): """ Example demonstrating SQLite lock detection and handling. """ + # Note: Replace 'Order' with your actual entity class + # class Order: + # status = None + class OrderService: def __init__(self, entity_manager): self.entity_manager = entity_manager @decorators.transaction() def process_order(self, order_id): """ Processes an order. If the database is locked, OpenTelemetry will detect it. """ - # Simulate some database work + # Example: Replace Order with your entity class order = self.entity_manager.get(Order, order_id)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.8)

64-64: Undefined name Order

(F821)

🤖 Prompt for AI Agents

In telemetry/example_sqlite.py around lines 64 to 66, the example uses Order but Order is not defined or imported causing a NameError; fix by either adding a brief inline comment above this snippet stating "Replace Order with your entity class" or by adding a minimal placeholder dataclass/entity definition for Order in the example (with the fields used, e.g., id and status) and import it appropriately so the snippet runs as-is.

coderabbitai · 2025-12-22T08:00:32Z

telemetry/example_sqlite.py

+    print("\nRunning VACUUM and ANALYZE...")
+    results = detector.optimize_database(entity_manager)
+
+    print(f"\nOptimization Results:")


⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-string prefix.

The string has no placeholders, so the f prefix is unnecessary.

🔎 Suggested fix

- print(f"\nOptimization Results:") + print("\nOptimization Results:")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"\nOptimization Results:")

print("\nOptimization Results:")

🧰 Tools

🪛 Ruff (0.14.8)

166-166: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In telemetry/example_sqlite.py at line 166, the print statement uses an unnecessary f-string prefix for a plain string; remove the leading "f" so the statement becomes a normal string literal (print("\nOptimization Results:")) to avoid misleading use of f-strings.

coderabbitai · 2025-12-22T08:00:32Z

telemetry/example_sqlite.py

+                # Check integrity periodically (e.g., daily)
+                import datetime
+                if datetime.datetime.now().hour == 3:  # 3 AM
+                    integrity = detector.get_integrity_check(entity_manager)
+                    if not integrity.get('is_ok'):
+                        print("\n[ALERT] Database integrity check failed!")


⚠️ Potential issue | 🟡 Minor

Integrity check may run multiple times during 3 AM hour.

The condition datetime.datetime.now().hour == 3 will evaluate to True for every iteration within the 3 AM hour, causing the integrity check to run repeatedly (every interval seconds). Consider tracking whether the check was already performed today.

🔎 Suggested fix

+ last_integrity_date = None + def monitor(): print(f"\nStarted SQLite monitoring (checking every {interval}s)") + nonlocal last_integrity_date while True: try: # ... existing code ... # Check integrity periodically (e.g., daily) import datetime - if datetime.datetime.now().hour == 3: # 3 AM + today = datetime.date.today() + if datetime.datetime.now().hour == 3 and last_integrity_date != today: integrity = detector.get_integrity_check(entity_manager) if not integrity.get('is_ok'): print("\n[ALERT] Database integrity check failed!") + last_integrity_date = today

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In telemetry/example_sqlite.py around lines 336 to 341, the integrity check conditional uses datetime.datetime.now().hour == 3 which will be True for every loop iteration during the 3 AM hour and cause repeated checks; fix this by adding a persistent tracker (e.g., last_integrity_date variable declared outside the loop) and only run the check when the current date is different from last_integrity_date and the hour is 3, then update last_integrity_date to today after running the check so it only runs once per day.

coderabbitai · 2025-12-22T08:00:32Z

telemetry/README.md

+## License
+
+Apache License, Version 2.0
+
+#### SQLite Example
+
+```python
+# Get a SQLite contention detector
+detector = telemetry_plugin.get_contention_detector("sqlite")
+
+# Get database information (file path, size, PRAGMA settings)
+info = detector.get_database_info(entity_manager)
+print(f"File path: {info['file_path']}")
+print(f"File size: {info['file_size_mb']} MB")
+print(f"Journal mode: {info['journal_mode']}")
+print(f"Synchronous mode: {info['synchronous']}")
+print(f"Busy timeout: {info['busy_timeout_ms']} ms")
+
+# Get lock configuration
+lock_info = detector.get_lock_waits(entity_manager)
+for info in lock_info:
+    print(f"Busy timeout: {info['busy_timeout_ms']} ms")
+    print(f"Locking mode: {info['locking_mode']}")
+
+# Get transaction statistics
+stats = detector.get_transaction_stats(entity_manager)
+print(f"Database size: {stats['database_size_mb']} MB")
+print(f"Page count: {stats['page_count']}")
+print(f"Freelist count: {stats['freelist_count']}")
+print(f"Active transaction: {stats['active_transaction']}")
+
+# Run integrity check
+integrity = detector.get_integrity_check(entity_manager)
+if integrity['is_ok']:
+    print("✓ Database integrity OK")
+else:
+    print("⚠️ Database integrity issues detected")
+
+# Optimize database (VACUUM and ANALYZE)
+results = detector.optimize_database(entity_manager)
+print(f"VACUUM: {results['vacuum']}")
+print(f"ANALYZE: {results['analyze']}")
+print(f"Space saved: {results.get('space_saved_mb', 0)} MB")
+```
+
+**SQLite-Specific Notes:**
+- SQLite uses database-level locks (not row-level)
+- Contention detected via SQLITE_BUSY/OperationalError
+- No processlist or system tables like PostgreSQL/MySQL
+- Use PRAGMA statements for diagnostics
+- Enable WAL mode for better concurrency
+
+See [example_sqlite.py](example_sqlite.py) for complete SQLite examples.
+


⚠️ Potential issue | 🟡 Minor

SQLite Example section is misplaced after License.

The SQLite Example section (lines 299-348) appears after the License section, which breaks the document flow. It should be placed alongside the PostgreSQL and MySQL examples in the "Contention Detection" section (around line 128).

Additionally, the heading level #### doesn't follow the expected increment from ## (should start at ###).

🔎 Suggested restructure

Move the SQLite example block (lines 299-348) to after the MySQL example (line 168), changing the heading to:

-#### SQLite Example +### SQLite Example

Then ensure the License section remains at the end of the document.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

299-299: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

🤖 Prompt for AI Agents

telemetry/README.md lines 295-348: the "SQLite Example" block is incorrectly placed after the License and uses an incorrect heading level; cut the entire block (lines 299-348) and paste it into the "Contention Detection" examples area immediately after the MySQL example (around line ~168), change the heading from "#### SQLite Example" to "### SQLite Example" so it follows the document hierarchy, and confirm the License section stays at the end of the file.

coderabbitai · 2025-12-22T08:00:32Z

telemetry/SQLITE_SUPPORT.md

+}
+```
+
+**PRAGMACommands Used:**


⚠️ Potential issue | 🟡 Minor

Typo: Missing space in "PRAGMACommands".

Should be "PRAGMA Commands" for readability.

🔎 Suggested fix

-**PRAGMACommands Used:** +**PRAGMA Commands Used:**

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

**PRAGMACommands Used:**

**PRAGMA Commands Used:**

🤖 Prompt for AI Agents

In telemetry/SQLITE_SUPPORT.md around line 64, the heading "PRAGMACommands Used:" has a missing space; change it to "PRAGMA Commands Used:" (i.e., insert a space between "PRAGMA" and "Commands") to improve readability and consistency with other headings.

claude added 3 commits December 22, 2025 07:12

docs: add comprehensive MySQL support documentation

89ea4a7

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

claude added 2 commits December 22, 2025 07:54

docs: add comprehensive SQLite support documentation

8135727

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

joamag assigned Copilot and joamag Dec 22, 2025

	print(f"\nOptimization Results:")
	print("\nOptimization Results:")

feat: add OpenTelemetry plugin for database contention detection #20

Are you sure you want to change the base?

feat: add OpenTelemetry plugin for database contention detection #20

Uh oh!

Conversation

joamag commented Dec 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Query Spans

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joamag commented Dec 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 22, 2025 •

edited

Loading