Skip to content

Conversation

@joamag
Copy link
Contributor

@joamag joamag commented Dec 22, 2025

Implements a comprehensive OpenTelemetry instrumentation plugin for the
Colony entity manager to detect and monitor database transaction
contention and performance issues.

Features:

  • Automatic instrumentation of entity manager operations
  • Distributed tracing for transactions, queries, and locks
  • Database-specific contention detectors for PostgreSQL and MySQL
  • Real-time blocking query detection
  • Lock wait monitoring and analysis
  • Transaction statistics collection
  • Slow query identification
  • Deadlock and serialization failure detection

Components:

  • telemetry_plugin.py: Main plugin following Colony patterns
  • system.py: Core telemetry system with instrumentation logic
  • decorators.py: Instrumented transaction decorators
  • contention.py: Database-specific contention detection
  • test.py: Unit tests for telemetry functionality

Configuration:

  • Environment-based configuration (OTEL_* variables)
  • Support for OTLP exporters (Jaeger, Tempo, etc.)
  • Console exporter for development/debugging
  • Configurable slow query thresholds

Metrics:

  • db.transaction.duration: Transaction execution time
  • db.query.duration: Query execution time
  • db.lock.wait_duration: Lock acquisition time
  • db.transaction.active: Active transaction count
  • db.contention.events: Contention event counter

Documentation:

  • README.md: Comprehensive usage guide
  • QUICK_START.md: Quick setup instructions
  • example_usage.py: Practical examples

The implementation is non-intrusive and follows the Colony plugin
architecture, making it easy to integrate with existing applications.

Summary by CodeRabbit

  • New Features

    • OpenTelemetry plugin added to instrument entity manager: transaction/query tracing, metrics, and engine-specific contention detectors (PostgreSQL, MySQL, SQLite).
    • Transaction and query decorators for automatic tracing and contention-aware behavior.
    • Example scripts demonstrating MySQL/SQLite/PostgreSQL telemetry, monitoring and periodic checks.
  • Documentation

    • Comprehensive telemetry README, quick-start guide, and database-specific support docs with usage, best practices, and troubleshooting.
  • Tests

    • New test suite covering plugin loading and decorator behaviors.
  • Chores

    • Added OpenTelemetry runtime dependencies.

✏️ Tip: You can customize this high-level summary in your review settings.

Implements a comprehensive OpenTelemetry instrumentation plugin for the
Colony entity manager to detect and monitor database transaction
contention and performance issues.

Features:
- Automatic instrumentation of entity manager operations
- Distributed tracing for transactions, queries, and locks
- Database-specific contention detectors for PostgreSQL and MySQL
- Real-time blocking query detection
- Lock wait monitoring and analysis
- Transaction statistics collection
- Slow query identification
- Deadlock and serialization failure detection

Components:
- telemetry_plugin.py: Main plugin following Colony patterns
- system.py: Core telemetry system with instrumentation logic
- decorators.py: Instrumented transaction decorators
- contention.py: Database-specific contention detection
- test.py: Unit tests for telemetry functionality

Configuration:
- Environment-based configuration (OTEL_* variables)
- Support for OTLP exporters (Jaeger, Tempo, etc.)
- Console exporter for development/debugging
- Configurable slow query thresholds

Metrics:
- db.transaction.duration: Transaction execution time
- db.query.duration: Query execution time
- db.lock.wait_duration: Lock acquisition time
- db.transaction.active: Active transaction count
- db.contention.events: Contention event counter

Documentation:
- README.md: Comprehensive usage guide
- QUICK_START.md: Quick setup instructions
- example_usage.py: Practical examples

The implementation is non-intrusive and follows the Colony plugin
architecture, making it easy to integrate with existing applications.
Enhances the OpenTelemetry telemetry plugin with full MySQL 5.7+ and
8.0+ compatibility for database contention detection and monitoring.

MySQL-Specific Features:
- InnoDB lock wait detection with fallback for MySQL 5.7/8.0+
- Deadlock detection via error code 1213 and InnoDB status parsing
- Transaction statistics from information_schema.innodb_trx
- Slow query detection from PROCESSLIST
- SHOW ENGINE INNODB STATUS parsing for recent deadlocks
- Automatic version detection (sys.innodb_lock_waits for 8.0+)

Enhanced Error Detection:
- MySQL deadlock error code 1213 detection
- Improved contention type classification (deadlock, lock_timeout,
  serialization_failure, lock_wait)
- Database-specific error code parsing
- Enhanced span attributes for contention events

MySQLContentionDetector Methods:
- get_blocking_queries(): Detects InnoDB lock waits
- get_lock_waits(): Returns lock wait statistics
- get_transaction_stats(): InnoDB transaction metrics + deadlock count
- get_slow_queries(): Queries from PROCESSLIST exceeding threshold
- get_deadlock_info(): Parses SHOW ENGINE INNODB STATUS

Updated Instrumentation (system.py):
- MySQL-specific error code detection in execute_query wrapper
- Contention type attribute (db.contention.type) in spans
- Improved error categorization for both MySQL and PostgreSQL

Documentation:
- example_mysql.py: Complete MySQL usage examples
- README.md: Added MySQL-specific examples and usage
- MySQL best practices for contention avoidance

Query Compatibility:
- Uses sys.innodb_lock_waits for MySQL 8.0+
- Falls back to information_schema.innodb_lock_waits for 5.7
- CASE WHEN aggregation instead of FILTER for MySQL compatibility
- GLOBAL_STATUS for deadlock count retrieval

The implementation maintains full backward compatibility with existing
PostgreSQL functionality while adding comprehensive MySQL support.
@coderabbitai
Copy link

coderabbitai bot commented Dec 22, 2025

Walkthrough

Adds a new Telemetry plugin with OpenTelemetry integration: instrumentation, decorators, a Telemetry system, cross-database contention detectors (PostgreSQL, MySQL, SQLite), examples, docs, tests, and requirements updates to include OpenTelemetry packages.

Changes

Cohort / File(s) Summary
Dependencies
requirements.txt
Added OpenTelemetry packages: opentelemetry-api>=1.20.0, opentelemetry-sdk>=1.20.0, opentelemetry-exporter-otlp>=1.20.0.
Documentation
telemetry/README.md, telemetry/QUICK_START.md, telemetry/MYSQL_SUPPORT.md, telemetry/SQLITE_SUPPORT.md
New comprehensive telemetry docs and DB-specific support guides (MySQL, SQLite), quick-starts, examples, compatibility notes, and troubleshooting.
Examples
telemetry/example_usage.py, telemetry/example_mysql.py, telemetry/example_sqlite.py
New example scripts demonstrating setup, instrumentation, transaction tracing, contention detection, deadlock handling, optimization, best practices, and periodic monitoring for PostgreSQL, MySQL, and SQLite.
Plugin Bridge
telemetry/src/telemetry_plugin.py
New TelemetryPlugin class integrating with Colony plugin system; declares metadata, dependencies, load logic, and exposes instrumentation, decorator access, contention detector factory, and configuration entrypoints.
Core Telemetry System
telemetry/src/telemetry/system.py
New Telemetry class: conditional OpenTelemetry initialization, tracer/meter setup, metric instruments, configure() entrypoint, and instrument_entity_manager() to wrap EM operations and emit spans/metrics.
Decorators / Instrumentation Helpers
telemetry/src/telemetry/decorators.py
New decorators: transaction(...) and monitored_query that manage transaction lifecycle, create spans, record events/metrics, detect contention patterns, and degrade gracefully when OpenTelemetry absent.
Contention Detectors
telemetry/src/telemetry/contention.py
New ContentionDetector base plus PgSQLContentionDetector, MySQLContentionDetector, and SQLiteContentionDetector implementations to collect blocking queries, lock waits, transaction stats, slow queries, deadlocks, DB info, and optimization helpers.
Module Init / Exports
telemetry/src/telemetry/__init__.py
Adds module metadata and public exports: ContentionDetector, PgSQLContentionDetector, MySQLContentionDetector, SQLiteContentionDetector, Telemetry, TelemetryTest, and retains existing submodule imports.
Tests
telemetry/src/telemetry/test.py
New test module with TelemetryTest and TelemetryTestCase validating plugin load, decorator behavior (with/without OTEL), exception handling, and base contention detector behavior.
Support / Notes
telemetry/MYSQL_SUPPORT.md, telemetry/SQLITE_SUPPORT.md
DB-specific support documents outlining instrumentation, metrics, error detection, and recommended configs for MySQL and SQLite.

Sequence Diagram

sequenceDiagram
    actor App as Application
    participant Dec as Decorator
    participant Tele as Telemetry System
    participant EM as Entity Manager
    participant DB as Database
    participant OTEL as OpenTelemetry

    App->>Dec: Call decorated transactional function
    Dec->>Tele: Request tracer/span
    Tele->>OTEL: Start span (db.transaction.<name>)
    Dec->>EM: begin_transaction()
    EM->>DB: BEGIN
    DB-->>EM: OK
    Dec->>App: Execute function body (calls EM queries)
    App->>EM: execute_query() / lock()
    EM->>DB: SQL statements
    DB-->>EM: Results / errors
    EM->>Tele: Report metrics/events (query duration, lock wait)
    Tele->>OTEL: Emit metrics & events
    alt Success
        Dec->>EM: commit_transaction()
        EM->>DB: COMMIT
        DB-->>EM: OK
        Tele->>OTEL: Mark span OK, add commit event
    else Exception (lock/deadlock/timeout)
        EM->>Dec: raise exception
        Dec->>Tele: Classify contention, increment retry metadata
        Dec->>EM: rollback_transaction()
        EM->>DB: ROLLBACK
        DB-->>EM: OK
        Tele->>OTEL: Record error, retry_count, and contention event
    end
    Dec->>OTEL: End span
    OTEL-->>Dec: Closed
    Dec-->>App: Return or raise
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus review on:
    • telemetry/src/telemetry/contention.py — DB-specific SQL, fallbacks, deadlock parsing, and safe defaults.
    • telemetry/src/telemetry/system.py — instrumentation wrappers, metric creation, lifecycle and thread-safety.
    • telemetry/src/telemetry/decorators.py — transaction state handling, exception classification for contention/retries.
    • telemetry/src/telemetry_plugin.py — plugin dependency declarations and load lifecycle.
    • Examples and tests for realistic usage and graceful degradation when OpenTelemetry is absent.

Poem

🐰 I hopped into traces, swift and spry,
Counting locks and spans that flutter by.
From WAL to InnoDB, each query I spy,
I nibble contention and watch metrics fly.
A telemetry carrot — observability pie. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.09% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: introducing an OpenTelemetry plugin for database contention detection. It is specific, concise, and directly reflects the primary objective of the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/opentelemetry-db-contention-AEbmu

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (17)
telemetry/src/telemetry/test.py (1)

142-142: Consider extracting the exception message.

The error message "Test error" could be extracted to a constant for better maintainability, though this is a test-only concern.

Based on static analysis hints.

telemetry/src/telemetry_plugin.py (1)

46-58: Consider adding ClassVar annotations for mutable class attributes.

Static analysis suggests annotating these mutable class attributes with typing.ClassVar to indicate they are class-level rather than instance-level. However, this may be a Colony framework convention that doesn't require type hints.

Proposed enhancement
+from typing import ClassVar
+
 class TelemetryPlugin(colony.Plugin):
     ...
-    platforms = [colony.CPYTHON_ENVIRONMENT]
-    capabilities = ["telemetry"]
-    dependencies = [
+    platforms: ClassVar = [colony.CPYTHON_ENVIRONMENT]
+    capabilities: ClassVar = ["telemetry"]
+    dependencies: ClassVar = [
         ...
     ]
-    main_modules = ["telemetry"]
+    main_modules: ClassVar = ["telemetry"]

Based on static analysis hints.

telemetry/src/telemetry/system.py (1)

321-324: Consider restructuring the return statement.

The return statement at line 324 could be moved to an else block for better code structure, though the current implementation is functionally correct.

Proposed refactor
                 try:
                     result = original_execute_query(query, close_cursor)
                     span.set_status(Status(StatusCode.OK))
-                    return result
-
-                except Exception as e:
+                except Exception as e:
+                    # error handling...
+                    raise
+                else:
+                    return result

Based on static analysis hints.

telemetry/src/telemetry/decorators.py (5)

81-84: Bare except swallows all exceptions silently.

Using a bare except with pass hides failures when obtaining the tracer. This can mask issues like misconfigured OpenTelemetry or unexpected errors. Consider catching specific exceptions or at minimum logging a debug message.

🔎 Proposed fix
             if OTEL_AVAILABLE and trace_enabled:
                 try:
                     tracer = trace.get_tracer("colony.entity_manager")
-                except:
-                    pass
+                except Exception:
+                    # Tracer initialization failed; proceed without tracing
+                    tracer = None

91-94: Bare except when retrieving engine name.

Similar to the tracer acquisition, this bare except silently catches all exceptions. The fallback to "unknown" is reasonable, but consider catching Exception explicitly for clarity.

🔎 Proposed fix
                 try:
                     engine_name = self_value.entity_manager.engine.get_engine_name()
-                except:
+                except Exception:
                     engine_name = "unknown"

106-106: Unused ctx_token variable.

The result of trace.set_span_in_context(span) is assigned but never used. If context propagation is intended, this token should be used to activate the context. Otherwise, consider removing it. Note: start_as_current_span context manager (used in monitored_query) handles this automatically.

🔎 Proposed fix - remove if not needed
-                ctx_token = trace.set_span_in_context(span)
+                # Note: span is manually managed; context propagation not required here

150-155: retry_count is misleading—increment without retry logic.

The retry_count is incremented on contention detection and recorded as a span attribute, but the decorator does not implement actual retry logic. This could mislead users into thinking retries are handled automatically. Consider renaming to contention_detected (boolean) or documenting that retry logic is the caller's responsibility.

🔎 Proposed fix - clarify intent
-                    if is_contention:
-                        span.set_attribute("db.transaction.contention", True)
-                        span.add_event("contention_detected", {"error.type": type(e).__name__})
-                        retry_count += 1
-
-                    span.set_attribute("db.transaction.retry_count", retry_count)
+                    if is_contention:
+                        span.set_attribute("db.transaction.contention", True)
+                        span.set_attribute("db.transaction.retry_suggested", True)
+                        span.add_event("contention_detected", {"error.type": type(e).__name__})

174-187: Unused *args, **kwargs in decorator signature.

The decorator function accepts *args, **kwargs but never uses them. This appears to be a remnant of boilerplate. Remove to avoid confusion.

🔎 Proposed fix
-    def decorator(function, *args, **kwargs):
+    def decorator(function):
         """
         The decorator function for the instrumented transaction decorator.

         :type function: Function
         :param function: The function to be decorated.
         :rtype: Function
         :return: The decorator interceptor function.
         """

         decorator_interceptor_function = create_decorator_interceptor(function)
         return decorator_interceptor_function
telemetry/example_mysql.py (4)

33-40: Hardcoded credentials in example code.

The example uses hardcoded credentials (root/root). While acceptable for demonstration purposes, consider adding a comment noting this is example-only and should use environment variables in production, or use placeholder values like "your_password".

🔎 Proposed improvement
     # Create MySQL entity manager
+    # NOTE: Use environment variables for credentials in production
     entity_manager = entity_manager_plugin.load_entity_manager_properties("mysql", {
         "id": "mysql_database",
         "host": "localhost",
         "port": 3306,
-        "user": "root",
-        "password": "root",
+        "user": os.environ.get("MYSQL_USER", "root"),
+        "password": os.environ.get("MYSQL_PASSWORD", ""),
         "database": "example_db",
         "isolation": "read committed",
     })

119-119: f-string without placeholders.

Line 119 uses an f-string but contains no interpolation. Remove the f prefix.

🔎 Proposed fix
-            print(f"\nDeadlock/Block detected:")
+            print("\nDeadlock/Block detected:")

164-164: f-string without placeholders.

Same issue as above—remove the f prefix.

🔎 Proposed fix
-            print(f"\nSlow query detected:")
+            print("\nSlow query detected:")

280-318: Monitor thread lacks graceful shutdown mechanism.

The monitor function runs an infinite while True loop with no exit condition. Consider adding a shutdown flag or using threading.Event for graceful termination. The broad exception handling is acceptable here to keep the monitoring thread alive.

🔎 Proposed improvement for graceful shutdown
-def monitor_mysql_periodically(entity_manager, telemetry_plugin, interval=30):
+def monitor_mysql_periodically(entity_manager, telemetry_plugin, interval=30, stop_event=None):
     """
     Sets up periodic MySQL monitoring for production environments.
     ...
+    :type stop_event: threading.Event
+    :param stop_event: Optional event to signal shutdown.
     """

     import time
     import threading

     detector = telemetry_plugin.get_contention_detector("mysql")
+    if stop_event is None:
+        stop_event = threading.Event()

     def monitor():
         print(f"\nStarted MySQL monitoring (checking every {interval}s)")

-        while True:
+        while not stop_event.is_set():
             try:
                 # ... monitoring logic ...
-                time.sleep(interval)
+                stop_event.wait(interval)
             except Exception as e:
                 print(f"Error in monitoring thread: {e}")
-                time.sleep(interval)
+                stop_event.wait(interval)
telemetry/src/telemetry/contention.py (2)

383-385: Silent exception swallowing in fallback path.

The try-except-pass pattern silently ignores why the MySQL 8.0+ query failed. Consider logging at debug level to aid troubleshooting version compatibility issues.

🔎 Proposed fix
         except Exception:
             # Fallback to information_schema for older MySQL versions
-            pass
+            # Note: sys.innodb_lock_waits may not exist in MySQL < 8.0
+            pass  # Proceed to legacy fallback

Or if the telemetry system supports debug logging:

-        except Exception:
+        except Exception as e:
             # Fallback to information_schema for older MySQL versions
-            pass
+            self.telemetry_system.plugin.debug(
+                "sys.innodb_lock_waits unavailable, using legacy query: %s" % str(e)
+            )

512-519: Silent exception when fetching deadlock count.

The nested try-except-pass silently ignores errors fetching the deadlock count from GLOBAL_STATUS. This could hide permission issues or MySQL configuration problems.

🔎 Proposed fix
             try:
                 cursor = entity_manager.execute_query(deadlock_query)
                 row = cursor.fetchone()
                 if row:
                     stats["deadlocks"] = int(row[0]) if row[0] else 0
                 cursor.close()
-            except Exception:
-                pass
+            except Exception as e:
+                # GLOBAL_STATUS may not be accessible; deadlock count unavailable
+                stats["deadlocks"] = None
telemetry/example_usage.py (3)

32-39: Hardcoded credentials in example code.

Similar to the MySQL example, consider adding a note about using environment variables in production.

🔎 Proposed improvement
-    # Create entity manager with your configuration
+    # Create entity manager with your configuration
+    # NOTE: Use environment variables for credentials in production
     entity_manager = entity_manager_plugin.load_entity_manager_properties("pgsql", {
         "id": "example_database",
         "host": "localhost",
-        "user": "postgres",
-        "password": "postgres",
+        "user": os.environ.get("POSTGRES_USER", "postgres"),
+        "password": os.environ.get("POSTGRES_PASSWORD", ""),
         "database": "example_db",
         "isolation": "serializable",
     })

129-129: f-strings without placeholders at lines 129, 142, 164.

Multiple f-strings lack interpolation. Remove the f prefix from these lines.

🔎 Proposed fixes
-            print(f"\nBlocking situation detected:")
+            print("\nBlocking situation detected:")
-            print(f"\nLock wait detected:")
+            print("\nLock wait detected:")
-            print(f"\nSlow query detected:")
+            print("\nSlow query detected:")

229-254: Monitor thread lacks graceful shutdown mechanism.

Same issue as the MySQL example—the infinite loop has no exit condition. Consider adding a stop_event parameter for graceful termination.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 422bca6 and 89ea4a7.

📒 Files selected for processing (12)
  • requirements.txt
  • telemetry/MYSQL_SUPPORT.md
  • telemetry/QUICK_START.md
  • telemetry/README.md
  • telemetry/example_mysql.py
  • telemetry/example_usage.py
  • telemetry/src/telemetry/__init__.py
  • telemetry/src/telemetry/contention.py
  • telemetry/src/telemetry/decorators.py
  • telemetry/src/telemetry/system.py
  • telemetry/src/telemetry/test.py
  • telemetry/src/telemetry_plugin.py
🧰 Additional context used
🧬 Code graph analysis (7)
telemetry/src/telemetry/test.py (2)
telemetry/src/telemetry/decorators.py (1)
  • transaction (42-187)
telemetry/src/telemetry/contention.py (10)
  • ContentionDetector (32-88)
  • get_blocking_queries (51-62)
  • get_blocking_queries (97-166)
  • get_blocking_queries (339-426)
  • get_lock_waits (64-75)
  • get_lock_waits (168-212)
  • get_lock_waits (428-466)
  • get_transaction_stats (77-88)
  • get_transaction_stats (214-273)
  • get_transaction_stats (468-527)
telemetry/example_mysql.py (3)
data/src/entity_manager_plugin.py (1)
  • load_entity_manager_properties (78-91)
telemetry/src/telemetry/system.py (2)
  • instrument_entity_manager (226-461)
  • get_contention_detector (463-478)
telemetry/src/telemetry/contention.py (9)
  • get_blocking_queries (51-62)
  • get_blocking_queries (97-166)
  • get_blocking_queries (339-426)
  • get_lock_waits (64-75)
  • get_lock_waits (168-212)
  • get_lock_waits (428-466)
  • get_slow_queries (275-329)
  • get_slow_queries (529-581)
  • get_deadlock_info (583-625)
telemetry/src/telemetry_plugin.py (2)
telemetry/src/telemetry/system.py (4)
  • Telemetry (52-478)
  • instrument_entity_manager (226-461)
  • get_contention_detector (463-478)
  • configure (198-224)
telemetry/src/telemetry/decorators.py (1)
  • transaction (42-187)
telemetry/src/telemetry/contention.py (1)
telemetry/src/telemetry/test.py (1)
  • warning (161-162)
telemetry/src/telemetry/system.py (2)
telemetry/src/telemetry_plugin.py (1)
  • instrument_entity_manager (68-79)
telemetry/src/telemetry/contention.py (3)
  • PgSQLContentionDetector (91-329)
  • MySQLContentionDetector (332-625)
  • ContentionDetector (32-88)
telemetry/src/telemetry/decorators.py (3)
telemetry/src/telemetry/test.py (6)
  • begin (90-91)
  • begin (127-128)
  • commit (93-94)
  • commit (130-131)
  • rollback (96-97)
  • rollback (133-134)
threads/src/thread_pool/system.py (1)
  • set_status (947-955)
template_engine/src/template_engine/system.py (1)
  • end (597-598)
telemetry/src/telemetry/__init__.py (3)
telemetry/src/telemetry/contention.py (2)
  • ContentionDetector (32-88)
  • PgSQLContentionDetector (91-329)
telemetry/src/telemetry/system.py (1)
  • Telemetry (52-478)
telemetry/src/telemetry/test.py (1)
  • TelemetryTest (34-49)
🪛 GitHub Actions: Main Workflow
requirements.txt

[error] 1-1: pip install failed: No matching distribution found for opentelemetry-api>=1.20.0.

🪛 Ruff (0.14.8)
telemetry/src/telemetry/test.py

142-142: Avoid specifying long messages outside the exception class

(TRY003)

telemetry/example_mysql.py

68-68: Undefined name Warehouse

(F821)


75-75: Undefined name Warehouse

(F821)


93-93: Do not catch blind exception: Exception

(BLE001)


119-119: f-string without any placeholders

Remove extraneous f prefix

(F541)


164-164: f-string without any placeholders

Remove extraneous f prefix

(F541)


311-311: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry_plugin.py

46-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


47-47: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


48-57: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


58-58: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

telemetry/src/telemetry/contention.py

51-51: Unused method argument: entity_manager

(ARG002)


64-64: Unused method argument: entity_manager

(ARG002)


77-77: Unused method argument: entity_manager

(ARG002)


160-160: Consider moving this statement to an else block

(TRY300)


162-162: Do not catch blind exception: Exception

(BLE001)


206-206: Consider moving this statement to an else block

(TRY300)


208-208: Do not catch blind exception: Exception

(BLE001)


267-267: Consider moving this statement to an else block

(TRY300)


269-269: Do not catch blind exception: Exception

(BLE001)


323-323: Consider moving this statement to an else block

(TRY300)


325-325: Do not catch blind exception: Exception

(BLE001)


381-381: Consider moving this statement to an else block

(TRY300)


383-385: try-except-pass detected, consider logging the exception

(S110)


383-383: Do not catch blind exception: Exception

(BLE001)


420-420: Consider moving this statement to an else block

(TRY300)


422-422: Do not catch blind exception: Exception

(BLE001)


460-460: Consider moving this statement to an else block

(TRY300)


462-462: Do not catch blind exception: Exception

(BLE001)


518-519: try-except-pass detected, consider logging the exception

(S110)


518-518: Do not catch blind exception: Exception

(BLE001)


521-521: Consider moving this statement to an else block

(TRY300)


523-523: Do not catch blind exception: Exception

(BLE001)


575-575: Consider moving this statement to an else block

(TRY300)


577-577: Do not catch blind exception: Exception

(BLE001)


619-619: Consider moving this statement to an else block

(TRY300)


621-621: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry/system.py

324-324: Consider moving this statement to an else block

(TRY300)

telemetry/example_usage.py

67-67: Undefined name Person

(F821)


73-73: Undefined name Address

(F821)


89-89: Undefined name Person

(F821)


92-92: Undefined name Person

(F821)


112-112: Do not catch blind exception: Exception

(BLE001)


129-129: f-string without any placeholders

Remove extraneous f prefix

(F541)


142-142: f-string without any placeholders

Remove extraneous f prefix

(F541)


164-164: f-string without any placeholders

Remove extraneous f prefix

(F541)


248-248: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry/decorators.py

83-83: Do not use bare except

(E722)


83-84: try-except-pass detected, consider logging the exception

(S110)


93-93: Do not use bare except

(E722)


106-106: Local variable ctx_token is assigned to but never used

Remove assignment to unused variable ctx_token

(F841)


174-174: Unused function argument: args

(ARG001)


174-174: Unused function argument: kwargs

(ARG001)


220-220: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (18)
telemetry/README.md (1)

1-294: Well-structured documentation.

The README provides comprehensive coverage of features, configuration, usage patterns, metrics, and troubleshooting. The examples for both PostgreSQL and MySQL are clear and helpful.

telemetry/QUICK_START.md (1)

1-184: Effective quick-start guide.

The guide provides a clear path from installation through basic usage, with practical examples and troubleshooting advice.

telemetry/MYSQL_SUPPORT.md (1)

1-328: Comprehensive MySQL support documentation.

This document provides excellent coverage of MySQL-specific features, including version compatibility, error detection, and InnoDB monitoring. The compatibility matrix and examples are particularly helpful.

telemetry/src/telemetry/__init__.py (1)

1-44: Clean module initialization.

The module structure follows Python best practices with appropriate exports for the public API.

telemetry/src/telemetry/test.py (1)

57-174: Solid test coverage for core functionality.

The tests appropriately cover plugin loading, decorator behavior with and without OpenTelemetry, exception handling with rollback, and base detector functionality using mock objects.

telemetry/src/telemetry_plugin.py (1)

60-112: Well-structured plugin implementation.

The plugin follows Colony patterns correctly and provides a clean API surface for instrumentation, configuration, and contention detection.

telemetry/src/telemetry/system.py (6)

38-49: Proper graceful degradation for optional dependency.

The try/except block correctly handles the case where OpenTelemetry packages are not installed, setting OTEL_AVAILABLE flag appropriately.


86-151: Robust default configuration with fallback options.

The initialization logic properly checks for external configuration, falls back to OTLP or console exporters, and handles missing dependencies gracefully.


153-196: Clean metrics initialization.

All metric instruments are properly created with appropriate names, descriptions, and units following OpenTelemetry conventions.


336-355: Comprehensive contention detection with MySQL support.

The error detection logic properly identifies MySQL error code 1213 (deadlock) and falls back to string matching for other databases. The contention type classification (deadlock, lock_timeout, serialization_failure, lock_wait) is well-structured.


226-461: Thorough entity manager instrumentation.

The instrumentation wraps all critical entity manager methods (begin, commit, rollback, execute_query, lock, lock_table) with proper tracing spans, metrics recording, and error handling. The implementation correctly tracks transaction lifetimes and detects slow queries.


463-478: Clean contention detector factory.

The factory method provides appropriate detector instances based on the database engine.

requirements.txt (1)

2-4: The OpenTelemetry packages specified in requirements.txt are available at the requested versions. Latest versions are 1.39.1, released in December 2025, which exceeds the >=1.20.0 constraint. No changes needed.

Likely an incorrect or invalid review comment.

telemetry/src/telemetry/decorators.py (2)

162-170: return_value may be referenced before assignment.

If an exception occurs, the finally block executes and then raise re-throws the exception. However, the return return_value on line 170 is unreachable after a raise. This is actually safe because the raise on line 160 exits before reaching line 170, but the code structure is slightly confusing.

The current flow is correct—no fix needed.


190-231: LGTM!

The monitored_query decorator is well-structured, using the start_as_current_span context manager correctly for automatic span lifecycle management. The graceful degradation when OTEL is unavailable is properly implemented.

telemetry/example_mysql.py (1)

320-347: LGTM!

The main() function provides clear guidance on MySQL-specific features and usage instructions. The documentation within example_mysql_best_practices() is informative and aligns with MySQL/InnoDB best practices.

telemetry/src/telemetry/contention.py (1)

32-89: LGTM!

The base ContentionDetector class provides a clean interface with sensible default implementations (empty list/dict). The unused entity_manager parameters in base methods are intentional to define the interface contract for subclasses.

telemetry/example_usage.py (1)

259-281: LGTM!

The main() function provides clear usage instructions and documentation references. The overall example structure is well-organized and educational.

Comment on lines +67 to +75
# Lock source warehouse (this order of locking can cause deadlocks)
self.entity_manager.lock(Warehouse, from_warehouse)

# Simulate some processing
import time
time.sleep(0.1)

# Lock destination warehouse
self.entity_manager.lock(Warehouse, to_warehouse)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Undefined Warehouse class referenced.

The Warehouse class is used but never defined or imported. For a runnable example, consider either defining a stub class or adding a comment explaining this is pseudocode.

🔎 Proposed fix - add stub or comment
+# Note: Warehouse is a placeholder entity class representing your domain model
+# In a real application, this would be an Entity subclass defined in your codebase
+
 class OrderService:
     def __init__(self, entity_manager):
         self.entity_manager = entity_manager

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.8)

68-68: Undefined name Warehouse

(F821)


75-75: Undefined name Warehouse

(F821)

🤖 Prompt for AI Agents
In telemetry/example_mysql.py around lines 67 to 75 the code references an
undefined Warehouse class when locking warehouses; either define a minimal stub
Warehouse class in this module (e.g., a simple dataclass or class with an
id/name) above these lines, or add an explicit import from the module where
Warehouse is defined, or replace the reference with a clear comment that this is
pseudocode and requires a real Warehouse model in real usage; implement one of
these fixes so the example is runnable and the reference is resolved.

Comment on lines +55 to +97
class UserService:
def __init__(self, entity_manager):
self.entity_manager = entity_manager

@decorators.transaction()
def create_user_with_address(self, user_data, address_data):
"""
Creates a user and address in a single transaction.
This will be automatically traced.
"""

# Create user entity
user = Person()
user.name = user_data["name"]
user.email = user_data["email"]
self.entity_manager.save(user)

# Create address entity
address = Address()
address.street = address_data["street"]
address.city = address_data["city"]
address.person = user
self.entity_manager.save(address)

return user

@decorators.transaction()
def update_user_email(self, user_id, new_email):
"""
Updates a user's email.
Demonstrates lock acquisition tracking.
"""

# Lock the user record
self.entity_manager.lock(Person, user_id)

# Find and update
user = self.entity_manager.get(Person, user_id)
user.email = new_email
self.entity_manager.update(user)

return user

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Undefined Person and Address classes referenced.

These entity classes are used but never defined or imported. Add a comment clarifying these are placeholder domain entities.

🔎 Proposed fix
+    # Note: Person and Address are placeholder entity classes
+    # In a real application, these would be Entity subclasses from your domain model
     class UserService:
         def __init__(self, entity_manager):
             self.entity_manager = entity_manager
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class UserService:
def __init__(self, entity_manager):
self.entity_manager = entity_manager
@decorators.transaction()
def create_user_with_address(self, user_data, address_data):
"""
Creates a user and address in a single transaction.
This will be automatically traced.
"""
# Create user entity
user = Person()
user.name = user_data["name"]
user.email = user_data["email"]
self.entity_manager.save(user)
# Create address entity
address = Address()
address.street = address_data["street"]
address.city = address_data["city"]
address.person = user
self.entity_manager.save(address)
return user
@decorators.transaction()
def update_user_email(self, user_id, new_email):
"""
Updates a user's email.
Demonstrates lock acquisition tracking.
"""
# Lock the user record
self.entity_manager.lock(Person, user_id)
# Find and update
user = self.entity_manager.get(Person, user_id)
user.email = new_email
self.entity_manager.update(user)
return user
# Note: Person and Address are placeholder entity classes
# In a real application, these would be Entity subclasses from your domain model
class UserService:
def __init__(self, entity_manager):
self.entity_manager = entity_manager
@decorators.transaction()
def create_user_with_address(self, user_data, address_data):
"""
Creates a user and address in a single transaction.
This will be automatically traced.
"""
# Create user entity
user = Person()
user.name = user_data["name"]
user.email = user_data["email"]
self.entity_manager.save(user)
# Create address entity
address = Address()
address.street = address_data["street"]
address.city = address_data["city"]
address.person = user
self.entity_manager.save(address)
return user
@decorators.transaction()
def update_user_email(self, user_id, new_email):
"""
Updates a user's email.
Demonstrates lock acquisition tracking.
"""
# Lock the user record
self.entity_manager.lock(Person, user_id)
# Find and update
user = self.entity_manager.get(Person, user_id)
user.email = new_email
self.entity_manager.update(user)
return user
🧰 Tools
🪛 Ruff (0.14.8)

67-67: Undefined name Person

(F821)


73-73: Undefined name Address

(F821)


89-89: Undefined name Person

(F821)


92-92: Undefined name Person

(F821)

🤖 Prompt for AI Agents
In telemetry/example_usage.py around lines 55 to 97, the code references
undefined domain entity classes Person and Address; add a short comment near the
top of this snippet (or import area) stating that Person and Address are
placeholder domain entities used for example purposes and should be replaced
with real model definitions or imports in production, or alternatively
import/define minimal stub classes for the example so the sample code is
self-contained and clear to readers.

Comment on lines +190 to +205
# Complex query that might be slow
query = """
SELECT u.name, u.email, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.created_at BETWEEN %s AND %s
GROUP BY u.id, u.name, u.email
ORDER BY order_count DESC
LIMIT 100
"""

cursor = self.entity_manager.execute_query(query % (start_date, end_date))
results = cursor.fetchall()
cursor.close()

return results
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQL injection vulnerability in example code.

The generate_user_report method uses string interpolation (%) to build the query with start_date and end_date. Even in example code, this demonstrates an insecure pattern that users might copy. Consider using parameterized queries or adding a security warning comment.

🔎 Proposed fix
             # Complex query that might be slow
+            # WARNING: Use parameterized queries in production to prevent SQL injection
             query = """
             SELECT u.name, u.email, COUNT(o.id) as order_count
             FROM users u
             LEFT JOIN orders o ON o.user_id = u.id
-            WHERE o.created_at BETWEEN %s AND %s
+            WHERE o.created_at BETWEEN ? AND ?
             GROUP BY u.id, u.name, u.email
             ORDER BY order_count DESC
             LIMIT 100
             """

-            cursor = self.entity_manager.execute_query(query % (start_date, end_date))
+            # Use parameterized query (syntax depends on your database driver)
+            cursor = self.entity_manager.execute_query(query, params=(start_date, end_date))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In telemetry/example_usage.py around lines 190 to 205, the SQL is built via
Python string interpolation which exposes an SQL injection risk; replace the
string interpolation with a parameterized query call using the DB driver's
parameter substitution (e.g., pass the query with placeholders and a tuple of
(start_date, end_date) to execute), or use the entity_manager/ORM parameter
binding API if available, and add a brief comment warning not to interpolate
user input directly into SQL.

Comment on lines +9 to +11
```bash
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Update installation command with correct package versions.

This installation command will fail due to the invalid version constraint (>=1.20.0) mentioned in requirements.txt. Update once the correct versions are determined.

🤖 Prompt for AI Agents
In telemetry/QUICK_START.md around lines 9–11, the pip install example uses
unpinned packages but conflicts with the version constraints in requirements.txt
(the >=1.20.0 constraint is causing install failures); update the command to use
the concrete, compatible package versions listed in requirements.txt (or valid
semver specifiers that match those constraints), e.g. replace the current line
with a pip install that pins each opentelemetry package to the exact versions
from requirements.txt or to compatible >=/<== ranges that are valid.

Comment on lines +19 to +21
```bash
pip install opentelemetry-api opentelemetry-sdk
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Update installation instructions with correct package versions.

The installation command references OpenTelemetry packages, but the version constraint in requirements.txt (>=1.20.0) appears to be invalid based on the pipeline failure. Once the correct versions are verified, ensure this documentation is updated accordingly.

🤖 Prompt for AI Agents
In telemetry/README.md around lines 19-21, the install instruction lists
opentelemetry packages without correct version constraints that conflict with
requirements.txt (the >=1.20.0 constraint appears invalid and caused pipeline
failures); verify the exact, published OpenTelemetry package versions on PyPI
that are compatible with the project, then update the README install command to
include those explicit, working version(s) that match requirements.txt (or
alternatively fix requirements.txt to a valid constraint) so docs and CI are
consistent.

Comment on lines +275 to +303
def get_slow_queries(self, entity_manager, duration_threshold="5 seconds"):
"""
Returns currently executing slow queries.

:type entity_manager: EntityManager
:param entity_manager: The entity manager to query.
:type duration_threshold: String
:param duration_threshold: PostgreSQL interval for slow query threshold.
:rtype: list
:return: List of slow query information.
"""

query = """
SELECT
pid,
usename,
application_name,
client_addr,
state,
query,
NOW() - query_start AS duration,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start IS NOT NULL
AND NOW() - query_start > interval '%s'
ORDER BY duration DESC;
""" % duration_threshold
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQL injection vulnerability via string interpolation.

The duration_threshold parameter is directly interpolated into the SQL query using % string formatting. If user input reaches this parameter, it could lead to SQL injection. Use parameterized queries instead.

🔎 Proposed fix

The fix depends on whether the entity manager's execute_query supports parameterized queries. If it does:

         query = """
         SELECT
             pid,
             usename,
             application_name,
             client_addr,
             state,
             query,
             NOW() - query_start AS duration,
             wait_event_type,
             wait_event
         FROM pg_stat_activity
         WHERE state != 'idle'
           AND query_start IS NOT NULL
-          AND NOW() - query_start > interval '%s'
+          AND NOW() - query_start > interval %s
         ORDER BY duration DESC;
-        """ % duration_threshold
+        """
+        # Pass duration_threshold as a parameter
+        cursor = entity_manager.execute_query(query, params=(duration_threshold,))

If parameterized queries aren't supported, validate the input:

import re
if not re.match(r'^\d+\s*(second|seconds|minute|minutes|hour|hours)$', duration_threshold, re.IGNORECASE):
    raise ValueError("Invalid duration_threshold format")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In telemetry/src/telemetry/contention.py around lines 275 to 303, the
duration_threshold is directly interpolated into the SQL string causing a SQL
injection risk; change the implementation to use a parameterized query (pass the
interval value as a bound parameter to entity_manager.execute_query) or, if the
entity manager does not support parameter binding for interval literals,
validate and whitelist the input format (e.g., allow only numeric value + unit
like "5 seconds|minutes|hours" via a strict regex) and raise ValueError on
invalid input before formatting into the SQL.

Comment on lines +529 to +556
def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):
"""
Returns currently executing slow queries from processlist.

:type entity_manager: EntityManager
:param entity_manager: The entity manager to query.
:type duration_threshold_seconds: int
:param duration_threshold_seconds: Minimum query duration in seconds.
:rtype: list
:return: List of slow query information.
"""

query = """
SELECT
ID as pid,
USER as user,
HOST as host,
DB as db,
COMMAND as command,
TIME as duration_seconds,
STATE as state,
INFO as query
FROM information_schema.PROCESSLIST
WHERE COMMAND != 'Sleep'
AND TIME > %d
AND INFO IS NOT NULL
ORDER BY TIME DESC;
""" % duration_threshold_seconds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQL injection vulnerability via string interpolation.

Similar to the PostgreSQL detector, duration_threshold_seconds is directly interpolated into the SQL query. Although the docstring specifies int, the type is not enforced at runtime.

🔎 Proposed fix
-    def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):
+    def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):
+        # Ensure parameter is an integer to prevent SQL injection
+        duration_threshold_seconds = int(duration_threshold_seconds)
+
         """
         Returns currently executing slow queries from processlist.
         ...

Or use parameterized queries if supported by the entity manager.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):
"""
Returns currently executing slow queries from processlist.
:type entity_manager: EntityManager
:param entity_manager: The entity manager to query.
:type duration_threshold_seconds: int
:param duration_threshold_seconds: Minimum query duration in seconds.
:rtype: list
:return: List of slow query information.
"""
query = """
SELECT
ID as pid,
USER as user,
HOST as host,
DB as db,
COMMAND as command,
TIME as duration_seconds,
STATE as state,
INFO as query
FROM information_schema.PROCESSLIST
WHERE COMMAND != 'Sleep'
AND TIME > %d
AND INFO IS NOT NULL
ORDER BY TIME DESC;
""" % duration_threshold_seconds
def get_slow_queries(self, entity_manager, duration_threshold_seconds=5):
"""
Returns currently executing slow queries from processlist.
:type entity_manager: EntityManager
:param entity_manager: The entity manager to query.
:type duration_threshold_seconds: int
:param duration_threshold_seconds: Minimum query duration in seconds.
:rtype: list
:return: List of slow query information.
"""
# Ensure parameter is an integer to prevent SQL injection
duration_threshold_seconds = int(duration_threshold_seconds)
query = """
SELECT
ID as pid,
USER as user,
HOST as host,
DB as db,
COMMAND as command,
TIME as duration_seconds,
STATE as state,
INFO as query
FROM information_schema.PROCESSLIST
WHERE COMMAND != 'Sleep'
AND TIME > %d
AND INFO IS NOT NULL
ORDER BY TIME DESC;
""" % duration_threshold_seconds
🤖 Prompt for AI Agents
In telemetry/src/telemetry/contention.py around lines 529 to 556, the SQL builds
a query by interpolating duration_threshold_seconds directly into the string
which allows SQL injection; change this to use a parameterized query or at
minimum enforce an integer cast before using it. Replace the string
interpolation with a parameter placeholder supported by the entity_manager (e.g.
"%s" or "?" depending on its DB API) and pass duration_threshold_seconds as a
bound parameter to entity_manager.execute; if parameterization is not possible,
explicitly validate/cast duration_threshold_seconds =
int(duration_threshold_seconds) and use that integer in the query string.

Comment on lines +583 to +625
def get_deadlock_info(self, entity_manager):
"""
Returns recent deadlock information from InnoDB status.

:type entity_manager: EntityManager
:param entity_manager: The entity manager to query.
:rtype: dict
:return: Deadlock information.
"""

query = "SHOW ENGINE INNODB STATUS"

try:
cursor = entity_manager.execute_query(query)
row = cursor.fetchone()

if row and len(row) > 2:
status_text = row[2]

# Parse the latest deadlock information
deadlock_info = {}
if "LATEST DETECTED DEADLOCK" in status_text:
# Extract deadlock section
start = status_text.find("LATEST DETECTED DEADLOCK")
end = status_text.find("------------", start + 1)
if end > start:
deadlock_section = status_text[start:end]
deadlock_info["has_recent_deadlock"] = True
deadlock_info["deadlock_text"] = deadlock_section[:500] # First 500 chars
else:
deadlock_info["has_recent_deadlock"] = False

cursor.close()
return deadlock_info

cursor.close()
return {}

except Exception as e:
self.telemetry_system.plugin.warning(
"Failed to get deadlock info: %s" % str(e)
)
return {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Deadlock parsing may miss section boundary.

The parsing logic looks for "------------" as the end marker, but InnoDB status sections can have varying delimiter counts. If the delimiter isn't found exactly, end becomes -1, causing end > start to fail, and deadlock_info remains empty (not even has_recent_deadlock: False).

🔎 Proposed fix
                 if "LATEST DETECTED DEADLOCK" in status_text:
                     # Extract deadlock section
                     start = status_text.find("LATEST DETECTED DEADLOCK")
                     end = status_text.find("------------", start + 1)
-                    if end > start:
+                    if end > start and end != -1:
                         deadlock_section = status_text[start:end]
                         deadlock_info["has_recent_deadlock"] = True
                         deadlock_info["deadlock_text"] = deadlock_section[:500]  # First 500 chars
+                    else:
+                        # Section found but couldn't determine end; still flag as detected
+                        deadlock_info["has_recent_deadlock"] = True
+                        deadlock_info["deadlock_text"] = status_text[start:start+500]
                 else:
                     deadlock_info["has_recent_deadlock"] = False
🧰 Tools
🪛 Ruff (0.14.8)

619-619: Consider moving this statement to an else block

(TRY300)


621-621: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
telemetry/src/telemetry/contention.py around lines 583 to 625: the current logic
looks for the exact string "------------" as the end of the deadlock section
which can fail (find returns -1) and leaves deadlock_info empty; update parsing
to locate the next section delimiter more robustly — after finding the "LATEST
DETECTED DEADLOCK" start, search for the next line that is a sequence of hyphens
of length >= 3 using a regex or a loop over lines, treat a not-found end as
end-of-text, set deadlock_info["has_recent_deadlock"] = True when a section is
found and always set it to False when the marker isn't present, trim the
captured section to the first 500 chars, and ensure the cursor is closed in all
paths (preferably via finally or context manager).

Adds complete SQLite support to the OpenTelemetry telemetry plugin for
database monitoring and contention detection. SQLite's file-based
architecture requires different monitoring approaches than client-server
databases.

SQLite-Specific Features:
- Database file monitoring (path, size, page statistics)
- PRAGMA-based diagnostics for database configuration
- SQLITE_BUSY error detection (database locked errors)
- Database integrity checks (PRAGMA integrity_check)
- Optimization support (VACUUM and ANALYZE)
- Lock configuration monitoring (busy_timeout, locking_mode)
- Journal mode detection (DELETE, WAL, etc.)

SQLiteContentionDetector Methods:
- get_database_info(): File path, size, PRAGMA settings
- get_lock_waits(): Lock configuration (timeout, mode)
- get_transaction_stats(): Database statistics and transaction state
- get_integrity_check(): Runs PRAGMA integrity_check
- optimize_database(): Runs VACUUM and ANALYZE
- get_blocking_queries(): Returns empty (no system tables)
- get_slow_queries(): Returns empty (use OTel traces)

Error Detection Enhancements (system.py):
- SQLite OperationalError detection for "database is locked"
- Contention type classification: "database_locked"
- Enhanced error keyword detection including "busy"
- Automatic SQLITE_BUSY error categorization

Key Differences from PostgreSQL/MySQL:
- File-based database (not client-server)
- Database-level locks (not row-level)
- No system tables for monitoring connections
- PRAGMA statements for diagnostics
- VACUUM needed for space reclamation
- WAL mode recommended for concurrency

Documentation:
- example_sqlite.py: Complete SQLite usage examples
- README.md: SQLite-specific examples and usage
- Best practices for WAL mode and optimization
- Monitoring strategies for file-based databases

Components Updated:
- contention.py: Added SQLiteContentionDetector class (338 lines)
- system.py: Enhanced error detection for SQLite
- __init__.py: Export SQLiteContentionDetector
- README.md: SQLite examples and documentation

The implementation provides appropriate monitoring for SQLite's
unique characteristics while maintaining API compatibility with
the PostgreSQL and MySQL detectors.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
telemetry/src/telemetry/contention.py (3)

275-303: SQL injection vulnerability remains unaddressed.

The duration_threshold parameter is still directly interpolated into the SQL query. This was flagged in a previous review. Consider validating input format or using parameterized queries.


529-556: SQL injection vulnerability via integer interpolation remains unaddressed.

The duration_threshold_seconds parameter is interpolated without explicit type enforcement. This was flagged in a previous review.


604-616: Deadlock parsing edge case remains unaddressed.

If the "------------" delimiter isn't found after the deadlock section start, end will be -1 and no has_recent_deadlock key is set in the returned dict. This was flagged in a previous review.

🧹 Nitpick comments (1)
telemetry/SQLITE_SUPPORT.md (1)

207-222: Add language specifiers to fenced code blocks.

The trace attribute code blocks at lines 207-213 and 216-222 are missing language specifiers. Consider adding text or yaml for better rendering and markdown linter compliance.

🔎 Suggested fix
 ## 🔍 Trace Attributes for SQLite
 
 ### Transaction Spans
-```
+```text
 db.system = "sqlite"
 db.transaction.type = "required"
 db.transaction.duration_ms = 125
 db.transaction.contention = true  # If SQLITE_BUSY occurred
 db.contention.type = "database_locked"  # SQLite-specific

Query Spans

- +text
db.system = "sqlite"
db.statement = "UPDATE products SET..."
db.slow_query = true
db.lock.contention = true
db.contention.type = "database_locked"

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89ea4a7 and 8135727.

📒 Files selected for processing (6)
  • telemetry/README.md
  • telemetry/SQLITE_SUPPORT.md
  • telemetry/example_sqlite.py
  • telemetry/src/telemetry/__init__.py
  • telemetry/src/telemetry/contention.py
  • telemetry/src/telemetry/system.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • telemetry/src/telemetry/init.py
🧰 Additional context used
🧬 Code graph analysis (2)
telemetry/example_sqlite.py (3)
data/src/entity_manager_plugin.py (1)
  • load_entity_manager_properties (78-91)
telemetry/src/telemetry/system.py (1)
  • get_contention_detector (471-488)
telemetry/src/telemetry/contention.py (10)
  • get_database_info (637-738)
  • get_lock_waits (64-75)
  • get_lock_waits (168-212)
  • get_lock_waits (428-466)
  • get_lock_waits (755-788)
  • get_transaction_stats (77-88)
  • get_transaction_stats (214-273)
  • get_transaction_stats (468-527)
  • get_transaction_stats (790-858)
  • get_integrity_check (932-963)
telemetry/src/telemetry/contention.py (1)
telemetry/src/telemetry/test.py (1)
  • warning (161-162)
🪛 markdownlint-cli2 (0.18.1)
telemetry/SQLITE_SUPPORT.md

207-207: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


216-216: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


330-330: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


361-361: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


452-452: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

telemetry/README.md

299-299: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

🪛 Ruff (0.14.8)
telemetry/src/telemetry/system.py

324-324: Consider moving this statement to an else block

(TRY300)

telemetry/example_sqlite.py

34-34: Probable insecure usage of temporary file or directory: "/tmp/example.db"

(S108)


64-64: Undefined name Order

(F821)


76-76: Do not catch blind exception: Exception

(BLE001)


166-166: f-string without any placeholders

Remove extraneous f prefix

(F541)


295-295: Do not catch blind exception: Exception

(BLE001)


345-345: Do not catch blind exception: Exception

(BLE001)

telemetry/src/telemetry/contention.py

51-51: Unused method argument: entity_manager

(ARG002)


64-64: Unused method argument: entity_manager

(ARG002)


77-77: Unused method argument: entity_manager

(ARG002)


160-160: Consider moving this statement to an else block

(TRY300)


162-162: Do not catch blind exception: Exception

(BLE001)


206-206: Consider moving this statement to an else block

(TRY300)


208-208: Do not catch blind exception: Exception

(BLE001)


267-267: Consider moving this statement to an else block

(TRY300)


269-269: Do not catch blind exception: Exception

(BLE001)


323-323: Consider moving this statement to an else block

(TRY300)


325-325: Do not catch blind exception: Exception

(BLE001)


381-381: Consider moving this statement to an else block

(TRY300)


383-385: try-except-pass detected, consider logging the exception

(S110)


383-383: Do not catch blind exception: Exception

(BLE001)


420-420: Consider moving this statement to an else block

(TRY300)


422-422: Do not catch blind exception: Exception

(BLE001)


460-460: Consider moving this statement to an else block

(TRY300)


462-462: Do not catch blind exception: Exception

(BLE001)


518-519: try-except-pass detected, consider logging the exception

(S110)


518-518: Do not catch blind exception: Exception

(BLE001)


521-521: Consider moving this statement to an else block

(TRY300)


523-523: Do not catch blind exception: Exception

(BLE001)


575-575: Consider moving this statement to an else block

(TRY300)


577-577: Do not catch blind exception: Exception

(BLE001)


619-619: Consider moving this statement to an else block

(TRY300)


621-621: Do not catch blind exception: Exception

(BLE001)


662-663: try-except-pass detected, consider logging the exception

(S110)


662-662: Do not catch blind exception: Exception

(BLE001)


678-679: try-except-pass detected, consider logging the exception

(S110)


678-678: Do not catch blind exception: Exception

(BLE001)


688-689: try-except-pass detected, consider logging the exception

(S110)


688-688: Do not catch blind exception: Exception

(BLE001)


698-699: try-except-pass detected, consider logging the exception

(S110)


698-698: Do not catch blind exception: Exception

(BLE001)


709-710: try-except-pass detected, consider logging the exception

(S110)


709-709: Do not catch blind exception: Exception

(BLE001)


719-720: try-except-pass detected, consider logging the exception

(S110)


719-719: Do not catch blind exception: Exception

(BLE001)


729-730: try-except-pass detected, consider logging the exception

(S110)


729-729: Do not catch blind exception: Exception

(BLE001)


732-732: Consider moving this statement to an else block

(TRY300)


734-734: Do not catch blind exception: Exception

(BLE001)


740-740: Unused method argument: entity_manager

(ARG002)


782-782: Consider moving this statement to an else block

(TRY300)


784-784: Do not catch blind exception: Exception

(BLE001)


813-814: try-except-pass detected, consider logging the exception

(S110)


813-813: Do not catch blind exception: Exception

(BLE001)


829-830: try-except-pass detected, consider logging the exception

(S110)


829-829: Do not catch blind exception: Exception

(BLE001)


839-840: try-except-pass detected, consider logging the exception

(S110)


839-839: Do not catch blind exception: Exception

(BLE001)


849-850: try-except-pass detected, consider logging the exception

(S110)


849-849: Do not catch blind exception: Exception

(BLE001)


852-852: Consider moving this statement to an else block

(TRY300)


854-854: Do not catch blind exception: Exception

(BLE001)


860-860: Unused method argument: entity_manager

(ARG002)


860-860: Unused method argument: duration_threshold_seconds

(ARG002)


904-904: Do not catch blind exception: Exception

(BLE001)


905-905: Use explicit conversion flag

Replace with conversion flag

(RUF010)


913-913: Do not catch blind exception: Exception

(BLE001)


914-914: Use explicit conversion flag

Replace with conversion flag

(RUF010)


924-924: Consider moving this statement to an else block

(TRY300)


926-926: Do not catch blind exception: Exception

(BLE001)


953-957: Consider moving this statement to an else block

(TRY300)


959-959: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (9)
telemetry/src/telemetry/system.py (5)

249-261: LGTM!

The per-connection transaction timing tracking using id(connection) as the key is appropriate for the use case. The closure-scoped dictionary ensures isolation per instrumented entity manager.


263-303: LGTM!

The commit and rollback wrappers correctly handle timing and metrics. The finally block ensures active_transactions is decremented even if the underlying operation fails.


305-400: LGTM!

The query instrumentation is comprehensive with proper span creation, query truncation (500 chars), contention detection for multiple database engines, and configurable slow query thresholds. The error classification logic correctly identifies MySQL deadlocks (error 1213), SQLite BUSY errors, and generic lock-related errors.


401-458: LGTM!

The lock method wrappers correctly instrument both row-level and table-level locks with appropriate span attributes and duration metrics.


471-488: LGTM!

Clean factory method for creating engine-specific contention detectors with a sensible fallback to the base ContentionDetector.

telemetry/example_sqlite.py (1)

13-44: LGTM!

The setup function follows the correct Colony plugin loading pattern and demonstrates proper instrumentation of the SQLite entity manager.

telemetry/src/telemetry/contention.py (3)

637-738: LGTM!

The get_database_info method comprehensively collects SQLite database information using PRAGMA commands. The individual try-except blocks provide graceful degradation if specific PRAGMAs fail, which is appropriate for diagnostic tooling.


877-930: LGTM!

The optimize_database method correctly handles both ANALYZE and VACUUM operations with appropriate error handling. The comment noting that VACUUM cannot run inside a transaction is helpful, and failures are properly captured in the results dict.


932-963: LGTM!

The get_integrity_check method correctly implements SQLite's PRAGMA integrity_check with proper result parsing and structured return values.

Comment on lines +64 to +66
order = self.entity_manager.get(Order, order_id)
order.status = "processed"
self.entity_manager.update(order)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Order class is undefined.

The Order class is used but never imported or defined. This will raise a NameError if the example is executed. Consider either:

  1. Adding a comment explaining that Order should be replaced with the user's entity class, or
  2. Creating a placeholder class for the example.
🔎 Suggested fix
 def example_sqlite_lock_handling(entity_manager):
     """
     Example demonstrating SQLite lock detection and handling.
     """
 
+    # Note: Replace 'Order' with your actual entity class
+    # class Order:
+    #     status = None
+
     class OrderService:
         def __init__(self, entity_manager):
             self.entity_manager = entity_manager
 
         @decorators.transaction()
         def process_order(self, order_id):
             """
             Processes an order.
             If the database is locked, OpenTelemetry will detect it.
             """
 
-            # Simulate some database work
+            # Example: Replace Order with your entity class
             order = self.entity_manager.get(Order, order_id)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.8)

64-64: Undefined name Order

(F821)

🤖 Prompt for AI Agents
In telemetry/example_sqlite.py around lines 64 to 66, the example uses Order but
Order is not defined or imported causing a NameError; fix by either adding a
brief inline comment above this snippet stating "Replace Order with your entity
class" or by adding a minimal placeholder dataclass/entity definition for Order
in the example (with the fields used, e.g., id and status) and import it
appropriately so the snippet runs as-is.

print("\nRunning VACUUM and ANALYZE...")
results = detector.optimize_database(entity_manager)

print(f"\nOptimization Results:")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unnecessary f-string prefix.

The string has no placeholders, so the f prefix is unnecessary.

🔎 Suggested fix
-    print(f"\nOptimization Results:")
+    print("\nOptimization Results:")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"\nOptimization Results:")
print("\nOptimization Results:")
🧰 Tools
🪛 Ruff (0.14.8)

166-166: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In telemetry/example_sqlite.py at line 166, the print statement uses an
unnecessary f-string prefix for a plain string; remove the leading "f" so the
statement becomes a normal string literal (print("\nOptimization Results:")) to
avoid misleading use of f-strings.

Comment on lines +336 to +341
# Check integrity periodically (e.g., daily)
import datetime
if datetime.datetime.now().hour == 3: # 3 AM
integrity = detector.get_integrity_check(entity_manager)
if not integrity.get('is_ok'):
print("\n[ALERT] Database integrity check failed!")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Integrity check may run multiple times during 3 AM hour.

The condition datetime.datetime.now().hour == 3 will evaluate to True for every iteration within the 3 AM hour, causing the integrity check to run repeatedly (every interval seconds). Consider tracking whether the check was already performed today.

🔎 Suggested fix
+    last_integrity_date = None
+
     def monitor():
         print(f"\nStarted SQLite monitoring (checking every {interval}s)")
+        nonlocal last_integrity_date
 
         while True:
             try:
                 # ... existing code ...
 
                 # Check integrity periodically (e.g., daily)
                 import datetime
-                if datetime.datetime.now().hour == 3:  # 3 AM
+                today = datetime.date.today()
+                if datetime.datetime.now().hour == 3 and last_integrity_date != today:
                     integrity = detector.get_integrity_check(entity_manager)
                     if not integrity.get('is_ok'):
                         print("\n[ALERT] Database integrity check failed!")
+                    last_integrity_date = today

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In telemetry/example_sqlite.py around lines 336 to 341, the integrity check
conditional uses datetime.datetime.now().hour == 3 which will be True for every
loop iteration during the 3 AM hour and cause repeated checks; fix this by
adding a persistent tracker (e.g., last_integrity_date variable declared outside
the loop) and only run the check when the current date is different from
last_integrity_date and the hour is 3, then update last_integrity_date to today
after running the check so it only runs once per day.

Comment on lines +295 to +348
## License

Apache License, Version 2.0

#### SQLite Example

```python
# Get a SQLite contention detector
detector = telemetry_plugin.get_contention_detector("sqlite")

# Get database information (file path, size, PRAGMA settings)
info = detector.get_database_info(entity_manager)
print(f"File path: {info['file_path']}")
print(f"File size: {info['file_size_mb']} MB")
print(f"Journal mode: {info['journal_mode']}")
print(f"Synchronous mode: {info['synchronous']}")
print(f"Busy timeout: {info['busy_timeout_ms']} ms")

# Get lock configuration
lock_info = detector.get_lock_waits(entity_manager)
for info in lock_info:
print(f"Busy timeout: {info['busy_timeout_ms']} ms")
print(f"Locking mode: {info['locking_mode']}")

# Get transaction statistics
stats = detector.get_transaction_stats(entity_manager)
print(f"Database size: {stats['database_size_mb']} MB")
print(f"Page count: {stats['page_count']}")
print(f"Freelist count: {stats['freelist_count']}")
print(f"Active transaction: {stats['active_transaction']}")

# Run integrity check
integrity = detector.get_integrity_check(entity_manager)
if integrity['is_ok']:
print("✓ Database integrity OK")
else:
print("⚠️ Database integrity issues detected")

# Optimize database (VACUUM and ANALYZE)
results = detector.optimize_database(entity_manager)
print(f"VACUUM: {results['vacuum']}")
print(f"ANALYZE: {results['analyze']}")
print(f"Space saved: {results.get('space_saved_mb', 0)} MB")
```

**SQLite-Specific Notes:**
- SQLite uses database-level locks (not row-level)
- Contention detected via SQLITE_BUSY/OperationalError
- No processlist or system tables like PostgreSQL/MySQL
- Use PRAGMA statements for diagnostics
- Enable WAL mode for better concurrency

See [example_sqlite.py](example_sqlite.py) for complete SQLite examples.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQLite Example section is misplaced after License.

The SQLite Example section (lines 299-348) appears after the License section, which breaks the document flow. It should be placed alongside the PostgreSQL and MySQL examples in the "Contention Detection" section (around line 128).

Additionally, the heading level #### doesn't follow the expected increment from ## (should start at ###).

🔎 Suggested restructure

Move the SQLite example block (lines 299-348) to after the MySQL example (line 168), changing the heading to:

-#### SQLite Example
+### SQLite Example

Then ensure the License section remains at the end of the document.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

299-299: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

🤖 Prompt for AI Agents
telemetry/README.md lines 295-348: the "SQLite Example" block is incorrectly
placed after the License and uses an incorrect heading level; cut the entire
block (lines 299-348) and paste it into the "Contention Detection" examples area
immediately after the MySQL example (around line ~168), change the heading from
"#### SQLite Example" to "### SQLite Example" so it follows the document
hierarchy, and confirm the License section stays at the end of the file.

}
```

**PRAGMACommands Used:**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: Missing space in "PRAGMACommands".

Should be "PRAGMA Commands" for readability.

🔎 Suggested fix
-**PRAGMACommands Used:**
+**PRAGMA Commands Used:**
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**PRAGMACommands Used:**
**PRAGMA Commands Used:**
🤖 Prompt for AI Agents
In telemetry/SQLITE_SUPPORT.md around line 64, the heading "PRAGMACommands
Used:" has a missing space; change it to "PRAGMA Commands Used:" (i.e., insert a
space between "PRAGMA" and "Commands") to improve readability and consistency
with other headings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants