You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/08 16:43:55 UTC

[GitHub] [airflow] FloChehab opened a new pull request #13571: Parametrized keda task concurrency in chart

FloChehab opened a new pull request #13571:
URL: https://github.com/apache/airflow/pull/13571


   Hello,
   
   This PR adds a small feature to the chart:
   
   * Rely on the config.celery.worker_concurrency value
   to determine the number of task a keda worker can take
   (vs the previous 16 that was hardcoded in the query).
   * Updated documentation accordingly
   
   The only impact is that with this PR the default concurrency of KEDA goes down from 16 to 8 (I thought that would be better than to set the concurrency by default from 8 to 16 for all CeleryExecutor deployments).
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r554194779



##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       The default is 16 and this changes it to 8




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #13571:
URL: https://github.com/apache/airflow/pull/13571


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r570734550



##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       i wonder if a better solution would be to allow user to specify the entire query, instead of `worker_concurrency`.  ~for one, this would resolve the problem highlighted in my other comment~.  additionally though it would open the door to support for mysql.  mysql has a different query syntax but otherwise there is a keda trigger for it and we could add support for it by enabling custom query.  custom query is also more flexible in general.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r570733754



##########
File path: chart/values.yaml
##########
@@ -709,6 +709,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 16

Review comment:
       this seems potentially misleading because, if i understand correctly, it affects only the keda query but not the actual work concurrency setting.

##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       i wonder if a better solution would be to allow user to specify the entire query, instead of `worker_concurrency`.  for one, this would resolve the problem highlighted in my other comment.  additionally though it would open the door to support for mysql.  mysql has a different query syntax but otherwise there is a keda trigger for it and we could add support for it by enabling custom query.  custom query is also more flexible in general.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r554419164



##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       I know about this as I mentionned in the PR description:
   
   > The only impact is that with this PR the default concurrency of KEDA goes down from 16 to 8 (I thought that would be better than to set the concurrency by default from 8 to 16 for all CeleryExecutor deployments).
   
   I did this because the default value in airflow settings seems to be 8: https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L666
   
   I am definitely ok setting back this to 16 if you want too.

##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       I know about this as I mentioned in the PR description:
   
   > The only impact is that with this PR the default concurrency of KEDA goes down from 16 to 8 (I thought that would be better than to set the concurrency by default from 8 to 16 for all CeleryExecutor deployments).
   
   I did this because the default value in airflow settings seems to be 8: https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L666
   
   I am definitely ok setting back this to 16 if you want too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571467978



##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       oh yes -- you are right




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r570733754



##########
File path: chart/values.yaml
##########
@@ -709,6 +709,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 16

Review comment:
       this seems potentially misleading because, if i understand correctly, it affects only the keda query but not the actual work concurrency setting.

##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       i wonder if a better solution would be to allow user to specify the entire query, instead of `worker_concurrency`.  for one, this would resolve the problem highlighted in my other comment.  additionally though it would open the door to support for mysql.  mysql has a different query syntax but otherwise there is a keda trigger for it and we could add support for it by enabling custom query.  custom query is also more flexible in general.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r554419164



##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       I know about this as I mentionned in the PR description:
   
   > The only impact is that with this PR the default concurrency of KEDA goes down from 16 to 8 (I thought that would be better than to set the concurrency by default from 8 to 16 for all CeleryExecutor deployments).
   
   I did this because the default value in airflow settings seems to be 8: https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L666
   
   I am definitely ok setting back this to 16 if you want too.

##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       I know about this as I mentioned in the PR description:
   
   > The only impact is that with this PR the default concurrency of KEDA goes down from 16 to 8 (I thought that would be better than to set the concurrency by default from 8 to 16 for all CeleryExecutor deployments).
   
   I did this because the default value in airflow settings seems to be 8: https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L666
   
   I am definitely ok setting back this to 16 if you want too.

##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       I know about this as I mentioned in the PR description:
   
   > The only impact is that with this PR the default concurrency of KEDA goes down from 16 to 8 (I thought that would be better than to set the concurrency by default from 8 to 16 for all CeleryExecutor deployments).
   
   I did this because the default value in airflow settings seems to be 8: https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L666
   
   I am definitely ok setting back this to 16 if you want to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#issuecomment-757296105


   (i've rebase / push -f to fix conflict in the chart Readme)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571915182



##########
File path: chart/README.md
##########
@@ -349,6 +350,8 @@ helm install airflow . \
     --set workers.persistence.enabled=false
 ```
 
+You can also configure the number of tasks that each keda worker will take with the `config.celery.worker_concurrency` value.

Review comment:
       OK I see, the important point you are making here is that you need to set celery concurrency through helm in order to have KEDA scaling be consistent with celery config.  E.g. if you use airflow.cfg, orr env vars, they won't necessarily agree.
   
   I might just include the query template to make it explicit.
   
   Here's another crack at it:
   
   > KEDA will derive the desired number of celery workers by querying the metastore: <the query>.  You should set celery worker concurrency through helm value `config.celery.worker_concurrency` (i.e. instead of airflow.cfg or environment variables) so that the KEDA trigger will be consistent with worker concurrency.  
   
   
   

##########
File path: chart/README.md
##########
@@ -349,6 +350,8 @@ helm install airflow . \
     --set workers.persistence.enabled=false
 ```
 
+You can also configure the number of tasks that each keda worker will take with the `config.celery.worker_concurrency` value.

Review comment:
       OK I see, the important point you are making here is that you need to set celery concurrency through helm in order to have KEDA scaling be consistent with celery config.  E.g. if you use airflow.cfg, orr env vars, they won't necessarily agree.
   
   I might just include the query template to make it explicit.
   
   Here's another crack at it:
   
   > KEDA will derive the desired number of celery workers by querying the metastore: put query here in code block.  You should set celery worker concurrency through helm value `config.celery.worker_concurrency` (i.e. instead of airflow.cfg or environment variables) so that the KEDA trigger will be consistent with worker concurrency.  
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab edited a comment on pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab edited a comment on pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#issuecomment-757296105


   (I've rebased / push -f to fix conflict in the chart Readme)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r554419164



##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       I know about this as I mentioned in the PR description:
   
   > The only impact is that with this PR the default concurrency of KEDA goes down from 16 to 8 (I thought that would be better than to set the concurrency by default from 8 to 16 for all CeleryExecutor deployments).
   
   I did this because the default value in airflow settings seems to be 8: https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg#L666
   
   I am definitely ok setting back this to 16 if you want to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571465823



##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       Currently the chart does not support MySQL anyway. Agree in principle though




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#issuecomment-758171466


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r555002096



##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       Let's keep that 16 please -- https://github.com/apache/airflow/pull/13612 (This PR will change the default setting in default_airflow.cfg) back to 16




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#issuecomment-775346821


   Awesome, thank you @FloChehab and @dstandish 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571945894



##########
File path: chart/README.md
##########
@@ -349,6 +350,8 @@ helm install airflow . \
     --set workers.persistence.enabled=false
 ```
 
+You can also configure the number of tasks that each keda worker will take with the `config.celery.worker_concurrency` value.

Review comment:
       Awesome, thanks for the suggestion, let's go for this!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571866427



##########
File path: chart/README.md
##########
@@ -349,6 +350,8 @@ helm install airflow . \
     --set workers.persistence.enabled=false
 ```
 
+You can also configure the number of tasks that each keda worker will take with the `config.celery.worker_concurrency` value.

Review comment:
       Agree.
   Would that be better?
   
   > KEDA scaling (ie. the number of Airflow celery workers that are spawned by KEDA) will also depend on the number of task each worker is allowed to handle. This number is the same as the `worker_concurrency` Airflow parameter and it must be tweaked with the `config.celery.worker_concurrency` value to properly propagate to the KEDA scaler configuration.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571467658



##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       the chart does not support mysql _when you run the metastore on k8s along with airflow_.  however, wouldn't it work with mysql if you had an external metastore e.g. rds (which is what is recommended anyway)?  
   
   i know postgres is the better fit for airflow but some folks might use other dbs and off top of my head i think it would probably work since you just need to specify conn uri and disable postgres and pgbouncer...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571467658



##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       the chart does not support mysql when you run the metastore on k8s along with airflow.  however, wouldn't it work with mysql if you had an external metastore e.g. rds (which is what is recommended anyway)?  
   
   i know postgres is the better fit for airflow but some folks might use other dbs and off top of my head i think it would probably work since you just need to specify conn uri and disable postgres and pgbouncer...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571856923



##########
File path: chart/README.md
##########
@@ -349,6 +350,8 @@ helm install airflow . \
     --set workers.persistence.enabled=false
 ```
 
+You can also configure the number of tasks that each keda worker will take with the `config.celery.worker_concurrency` value.

Review comment:
       Word choice might be a little imprecise here.
   
   They are _celery_ workers not _keda_ workers --- keda just autoscales the celery workers but does not have workers of its own.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r554194779



##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       The default is 16 and this changes it to 8




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571467069



##########
File path: chart/values.yaml
##########
@@ -709,6 +709,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 16

Review comment:
       oh nice




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r555014796



##########
File path: chart/values.yaml
##########
@@ -706,6 +706,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 8

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#issuecomment-757296105


   (i've rebase / push -f to fix conflict in the chart Readme)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab edited a comment on pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab edited a comment on pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#issuecomment-757296105


   (I've rebased / push -f to fix conflict in the chart Readme)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571465602



##########
File path: chart/values.yaml
##########
@@ -709,6 +709,7 @@ config:
     rbac: 'True'
   celery:
     default_queue: celery
+    worker_concurrency: 16

Review comment:
       It does affect the actual concurrency setting:
   
   https://github.com/apache/airflow/blob/cddbf9c11d092422e6695d7a5a5c859fdf140753/chart/templates/configmaps/configmap.yaml#L37-L43




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] FloChehab commented on pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
FloChehab commented on pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#issuecomment-775107977


   > Can you rebase your PR on latest Master please and tag me again so we can merge it
   
   Hello @kaxil, we should be good to merge for this PR.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #13571: Parametrized keda task concurrency in chart

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #13571:
URL: https://github.com/apache/airflow/pull/13571#discussion_r571467658



##########
File path: chart/templates/workers/worker-kedaautoscaler.yaml
##########
@@ -36,13 +36,13 @@ metadata:
 spec:
   scaleTargetRef:
     deploymentName: {{ .Release.Name }}-worker
-  pollingInterval:  {{  .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
-  cooldownPeriod: {{  .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
-  maxReplicaCount: {{  .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
+  pollingInterval:  {{ .Values.workers.keda.pollingInterval }}   # Optional. Default: 30 seconds
+  cooldownPeriod: {{ .Values.workers.keda.cooldownPeriod }}    # Optional. Default: 300 seconds
+  maxReplicaCount: {{ .Values.workers.keda.maxReplicaCount }}   # Optional. Default: 100
   triggers:
     - type: postgresql
       metadata:
         targetQueryValue: "1"
         connection: AIRFLOW_CONN_AIRFLOW_DB
-        query: "SELECT ceil(COUNT(*)::decimal / 16) FROM task_instance WHERE state='running' OR state='queued'"
+        query: "SELECT ceil(COUNT(*)::decimal / {{ .Values.config.celery.worker_concurrency }}) FROM task_instance WHERE state='running' OR state='queued'"

Review comment:
       the chart does not support mysql _when you run the metastore on k8s along with airflow_.  however, wouldn't it work with mysql if you had an external metastore e.g. rds (which is what is recommended anyway)?  
   
   i know postgres is the better choice for airflow and much more widely embraced but some folks might use other dbs and off top of my head i think it would probably work since you just need to specify conn uri and disable postgres and pgbouncer...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org