You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "dirrao (via GitHub)" <gi...@apache.org> on 2023/12/06 16:55:55 UTC
[PR] list pods performance optimization [airflow]
dirrao opened a new pull request, #36092:
URL: https://github.com/apache/airflow/pull/36092
What happened
_list_pods function uses kube list_namespaced_pod and list_pod_for_all_namespaces kube functions. Right now, these Kube functions will get the entire pod spec though we are interested in the pod metadata alone. This _list_pods is refered in clear_not_launched_queued_tasks. try_adopt_task_instances and _adopt_completed_pods functions.
When we run the airflow at large scale (with worker pods of more than > 500). The _list_pods function takes a significant amount of time (upto 15 - 30 seconds with 500 worker pods) due to unnecessary data transfer (V1PodList up to a few 10 MBs) and JSON deserialization overhead. This is blocking us from scaling the airflow to run at large scale
What you think should happen instead
Request the Pod metadata instead of entire Pod payload. It will help to reduce significant network data transfer and JSON deserialization overhead.
More details at https://github.com/apache/airflow/issues/35599
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [PR] list pods performance optimization [airflow]
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #36092:
URL: https://github.com/apache/airflow/pull/36092
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [PR] list pods performance optimization [airflow]
Posted by "dirrao (via GitHub)" <gi...@apache.org>.
dirrao commented on PR #36092:
URL: https://github.com/apache/airflow/pull/36092#issuecomment-1845121966
check the example pod partial meta data payload as follows.
` ResourceInstance[PartialObjectMetadataList]:
apiVersion: meta.k8s.io/v1
items:
- apiVersion: meta.k8s.io/v1
kind: PartialObjectMetadata
metadata:
annotations:
dag_id: test-dag-20231019-221007
run_id: scheduled__2023-10-19T21:30:00+00:00
task_id: pod
try_number: '2'
creationTimestamp: '2023-10-21T20:19:40Z'
labels:
airflow-worker: '16726976'
airflow_version: 2.3.3
cluster: test-cluster
dag_id: test-dag
kubernetes_executor: 'True'
role: airflow-worker
run_id: scheduled__2023-10-19T2130000000-6ec9604e1
sdr.appname: airflow
task_id: pod
try_number: '2'
name: testdag-0f53218c20ed444a945054a2295e528a
namespace: test-dag
resourceVersion: '1835685630'
uid: 78441e8f-2fca-48e1-8334-330506b74bcda`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org