Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: judoscale/judoscale-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: mediapredict/judoscale-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 2 commits
  • 2 files changed
  • 2 contributors

Commits on Dec 2, 2024

  1. Use acquired connection for inspecting active workers / tasks

    It appears that not passing a connection will make it so that inspect
    down the road fetches a connection from the pool:
    
    `inspect.active()` delegate internally to `_request`:
    https://github.com/celery/celery/blob/92514ac88afc4ccdff31f3a1018b04499607ca1e/celery/app/control.py#L136-L149
    
    which delegates back to `control.broadcast`, passing the connection and
    some other options:
    https://github.com/celery/celery/blob/92514ac88afc4ccdff31f3a1018b04499607ca1e/celery/app/control.py#L105-L111
    
    `broadcast` documents the `connection` option:
    https://github.com/celery/celery/blob/92514ac88afc4ccdff31f3a1018b04499607ca1e/celery/app/control.py#L744-L756
    
        > connection (kombu.Connection): Custom broker connection to use,
        > if not set, a connection will be acquired from the pool.
    
    Fetching the connection from the pool appears to not work well with
    `max-tasks-per-child`, which makes it recycle the worker every time it
    processes the amount of tasks configured. In other words, with
    `max-tasks-per-child=10`, whenever a worker reaches 10 tasks processed,
    a new worker will replace the old one:
    https://docs.celeryq.dev/en/v5.4.0/userguide/workers.html#max-tasks-per-child-setting
    
    The "replacing of workers" with the combination of fetching a new
    connection from the pool results in time out errors and workers stop
    processing any tasks:
    
        [2024-12-02 15:38:34,568: ERROR/MainProcess]
          Timed out waiting for UP message from <ForkProcess(ForkPoolWorker-4, started daemon)>
        [2024-12-02 15:38:34,571: ERROR/MainProcess]
          Process 'ForkPoolWorker-4' pid:20235 exited with 'signal 9 (SIGKILL)'
    
    This error goes on an on, as the main process enters in a loop trying to
    boot up a new worker and failing every time, potentially because of
    having to wait for a connection from the pool, that's never made
    available. I'm guessing that's exhausting the pool, possibly with
    acquiring too many connections in between workers being replaced, and it
    gets into a lock state that cannot recover and starts to time out.
    (because the pool will lock while waiting for a connection to become
    available, and it also seems to return write connections by default,
    which we don't need.)
    
    We already have a read-only connection available within our collector
    that we can simply reuse, passing it down to the inspect object that can
    be shared, so no need to mess with the pool and avoiding getting into
    this lock state. This change should fix the time out issue above and
    continue to process all tasks just fine.
    carlosantoniodasilva committed Dec 2, 2024
    Configuration menu
    Copy the full SHA
    6215570 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2024

  1. Merge pull request #1 from igorkramaric/fix-inspect-connection

    Use acquired connection for inspecting active workers / tasks
    igorkramaric authored Dec 4, 2024
    Configuration menu
    Copy the full SHA
    4cce0fe View commit details
    Browse the repository at this point in the history
Loading