You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/03 21:22:26 UTC

[GitHub] [airflow] nadflinn opened a new pull request #8695: Celery worker prefetch multiplier configurable

nadflinn opened a new pull request #8695:
URL: https://github.com/apache/airflow/pull/8695


   There is some debate about whether Celery autoscale actually works, as discussed in the comments to this PR:
   https://github.com/apache/airflow/pull/3989#issuecomment-535882666
   
   This is also related to the discussion in this issue:
   https://github.com/apache/airflow/issues/8480
   
   I ran into this issue as well with Airflow (autoscale not working) and had a look at the celery code and I think the issue is that for the worker process count to grow this is dependent on the number of tasks the worker has claimed, known as the prefetch_count.  If the prefetch_count isn't above the worker process count, then the number of worker processes won't budge. It seems like a catch-22. Airflow runs into this problem because `worker_prefetch_multiplier` is set to 1 (and `task_acks_late` is set to True...setting this to False also bumps the prefetch_count).
   
   This issue can be worked around by setting the `worker_prefetch_multiplier` setting to an int greater than 1.  In this PR I included a note about the implications of this in the config and a link to relevant documentation.   Currently in airflow `worker_prefetch_multiplier` is set to 1 so a worker can't prefetch and lay claim to more tasks than it has process workers.  So in theory setting this to 2 can get you into trouble if you have worker A that has 6 processes and has grabbed 10 tasks and the 6 tasks it is working on are long running causing the other 4 tasks to be blocked.  Meanwhile worker B just finished up processing its own 6 tasks and is available to work on the 4 that are backed up on worker A but A has already claimed those tasks.  If you are running one worker, though, then this shouldn't be a problem.
   
   This PR makes `worker_prefetch_multiplier` configurable so that the user can get autoscale working if they feel that for their use case `worker_prefetch_multiplier` of greater than 1 won't be an issue.
   
   I also [opened up a Celery PR](https://github.com/celery/celery/pull/6069) with a suggested fix for this issue.
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [X] Description above provides context of the change
   - [X] Unit tests coverage for changes (not needed for documentation changes)
   - [X] Target Github ISSUE in description if exists
   - [X] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [X] Relevant documentation is updated including usage instructions.
   - [X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#issuecomment-624588200


   I see that your change to celery was accepted and I am wondering what will be released first: celery or airflow with your fix? If it will be celery then I think we should avoid fixing external libraries issues in our code. @kaxil @dimberman any thoughts? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nadflinn commented on a change in pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
nadflinn commented on a change in pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#discussion_r422154400



##########
File path: airflow/config_templates/default_airflow.cfg
##########
@@ -550,6 +550,15 @@ worker_concurrency = 8
 # Example: worker_autoscale = 16,12
 # worker_autoscale =
 
+# Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+# greater than 1 for autoscaling to work.  Be aware that this causes the number of
+# records that a worker prefetchs to be greater than the available worker processes.
+# If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+# one worker prefetches tasks and those tasks are blocked by long running work then another
+# worker with available process workers can't grab those tasks.
+# https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+# worker_prefetch_multiplier =

Review comment:
       See comment above.  I added an `example` in `config.yml` which generates
   `Example: worker_prefetch_multiplier = 1`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nadflinn commented on a change in pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
nadflinn commented on a change in pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#discussion_r422529444



##########
File path: airflow/config_templates/default_airflow.cfg
##########
@@ -550,6 +550,15 @@ worker_concurrency = 8
 # Example: worker_autoscale = 16,12
 # worker_autoscale =
 
+# Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+# greater than 1 for autoscaling to work.  Be aware that this causes the number of
+# records that a worker prefetchs to be greater than the available worker processes.
+# If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+# one worker prefetches tasks and those tasks are blocked by long running work then another
+# worker with available process workers can't grab those tasks.
+# https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+# worker_prefetch_multiplier =

Review comment:
       Ah, got it, yeah.  I had it commented out bc autoscale is commented out but the setting has independent value and like you said default is 1 anyways.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nadflinn commented on a change in pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
nadflinn commented on a change in pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#discussion_r422156337



##########
File path: airflow/config_templates/default_airflow.cfg
##########
@@ -550,6 +550,15 @@ worker_concurrency = 8
 # Example: worker_autoscale = 16,12
 # worker_autoscale =
 
+# Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+# greater than 1 for autoscaling to work.  Be aware that this causes the number of
+# records that a worker prefetchs to be greater than the available worker processes.
+# If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+# one worker prefetches tasks and those tasks are blocked by long running work then another
+# worker with available process workers can't grab those tasks.
+# https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+# worker_prefetch_multiplier =

Review comment:
       I'm not sure that you can add a `default` in `config.yml` and have it be commented out when you generate `default_airflow.cfg`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nadflinn commented on a change in pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
nadflinn commented on a change in pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#discussion_r422529561



##########
File path: airflow/config_templates/default_airflow.cfg
##########
@@ -550,6 +550,15 @@ worker_concurrency = 8
 # Example: worker_autoscale = 16,12
 # worker_autoscale =
 
+# Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+# greater than 1 for autoscaling to work.  Be aware that this causes the number of
+# records that a worker prefetchs to be greater than the available worker processes.
+# If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+# one worker prefetches tasks and those tasks are blocked by long running work then another
+# worker with available process workers can't grab those tasks.
+# https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+# worker_prefetch_multiplier =

Review comment:
       Missed your comment until now and given that this was just merged I'll just leave it be given that functionality is there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nadflinn commented on a change in pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
nadflinn commented on a change in pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#discussion_r422152685



##########
File path: airflow/config_templates/config.yml
##########
@@ -1130,6 +1130,19 @@
       type: string
       example: 16,12
       default: ~
+    - name: worker_prefetch_multiplier
+      description: |
+        Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+        greater than 1 for autoscaling to work.  Be aware that this causes the number of
+        records that a worker prefetchs to be greater than the available worker processes.
+        If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+        one worker prefetches tasks and those tasks are blocked by long running work then another
+        worker with available process workers can't grab those tasks.
+        https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+      version_added: ~
+      type: int
+      example: ~
+      default: ~

Review comment:
       I added an `example` instead of a `default` so that `worker_prefetch_multiplier` was commented out after running `pre_commit_yaml_to_cfg.py` to generate `default_airflow.cfg`.  With a `default` the setting is not commented out.   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#discussion_r421143742



##########
File path: airflow/config_templates/default_airflow.cfg
##########
@@ -550,6 +550,15 @@ worker_concurrency = 8
 # Example: worker_autoscale = 16,12
 # worker_autoscale =
 
+# Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+# greater than 1 for autoscaling to work.  Be aware that this causes the number of
+# records that a worker prefetchs to be greater than the available worker processes.
+# If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+# one worker prefetches tasks and those tasks are blocked by long running work then another
+# worker with available process workers can't grab those tasks.
+# https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+# worker_prefetch_multiplier =

Review comment:
       ```suggestion
   # worker_prefetch_multiplier = 1
   ```

##########
File path: airflow/config_templates/config.yml
##########
@@ -1130,6 +1130,19 @@
       type: string
       example: 16,12
       default: ~
+    - name: worker_prefetch_multiplier
+      description: |
+        Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+        greater than 1 for autoscaling to work.  Be aware that this causes the number of
+        records that a worker prefetchs to be greater than the available worker processes.
+        If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+        one worker prefetches tasks and those tasks are blocked by long running work then another
+        worker with available process workers can't grab those tasks.
+        https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+      version_added: ~
+      type: int
+      example: ~
+      default: ~

Review comment:
       ```suggestion
         default: 1
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#discussion_r422158345



##########
File path: airflow/config_templates/default_airflow.cfg
##########
@@ -550,6 +550,15 @@ worker_concurrency = 8
 # Example: worker_autoscale = 16,12
 # worker_autoscale =
 
+# Due to an issue with Celery, worker_prefetch_multiplier must also be set to an int
+# greater than 1 for autoscaling to work.  Be aware that this causes the number of
+# records that a worker prefetchs to be greater than the available worker processes.
+# If you have multiple workers this can cause some jobs to be unnecessarily blocked if
+# one worker prefetches tasks and those tasks are blocked by long running work then another
+# worker with available process workers can't grab those tasks.
+# https://docs.celeryproject.org/en/stable/userguide/optimizing.html#prefetch-limits
+# worker_prefetch_multiplier =

Review comment:
       Yes sorry I meant, it doesn't have to be commented out so as the default is 1 :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] nadflinn commented on pull request #8695: Make celery worker_prefetch_multiplier configurable

Posted by GitBox <gi...@apache.org>.
nadflinn commented on pull request #8695:
URL: https://github.com/apache/airflow/pull/8695#issuecomment-624770541


   Yeah, I had opened this before the celery fix was accepted.  I'm ambivalent about it at this point. I wouldn't say this is necessarily "fixing" an external issue - it's just exposing `worker_prefetch_multiplier` so that it can be configured.  I added some language in the description of `worker_prefetch_multiplier` that was specific to the bug and that should probably be removed.  But it is an optimization setting that users could want to adjust independent of it being a work-around for this bug.  That said, it is an optimization ideal for lots of short running tasks where having the tasks prefetched is advantageous, which doesn't really seem to overlap with how Airflow is generally used.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org