You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "dirrao (via GitHub)" <gi...@apache.org> on 2023/12/06 16:55:55 UTC

[PR] list pods performance optimization [airflow]

dirrao opened a new pull request, #36092:
URL: https://github.com/apache/airflow/pull/36092

   What happened
   
   _list_pods function uses kube list_namespaced_pod and list_pod_for_all_namespaces kube functions. Right now, these Kube functions will get the entire pod spec though we are interested in the pod metadata alone. This _list_pods is refered in clear_not_launched_queued_tasks. try_adopt_task_instances and _adopt_completed_pods functions.
   
   When we run the airflow at large scale (with worker pods of more than > 500). The _list_pods function takes a significant amount of time (upto 15 - 30 seconds with 500 worker pods) due to unnecessary data transfer (V1PodList up to a few 10 MBs) and JSON deserialization overhead. This is blocking us from scaling the airflow to run at large scale
   
   What you think should happen instead
   
   Request the Pod metadata instead of entire Pod payload. It will help to reduce significant network data transfer and JSON deserialization overhead.
   
   More details at https://github.com/apache/airflow/issues/35599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] list pods performance optimization [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #36092:
URL: https://github.com/apache/airflow/pull/36092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] list pods performance optimization [airflow]

Posted by "dirrao (via GitHub)" <gi...@apache.org>.
dirrao commented on PR #36092:
URL: https://github.com/apache/airflow/pull/36092#issuecomment-1845121966

   check the example pod partial meta data payload as follows.
   `  ResourceInstance[PartialObjectMetadataList]:
     apiVersion: meta.k8s.io/v1
     items:
     - apiVersion: meta.k8s.io/v1
       kind: PartialObjectMetadata
       metadata:
         annotations:
           dag_id: test-dag-20231019-221007
           run_id: scheduled__2023-10-19T21:30:00+00:00
           task_id: pod
           try_number: '2'
         creationTimestamp: '2023-10-21T20:19:40Z'
         labels:
           airflow-worker: '16726976'
           airflow_version: 2.3.3
           cluster: test-cluster
           dag_id: test-dag
           kubernetes_executor: 'True'
           role: airflow-worker
           run_id: scheduled__2023-10-19T2130000000-6ec9604e1
           sdr.appname: airflow
           task_id: pod
           try_number: '2'
         name: testdag-0f53218c20ed444a945054a2295e528a
         namespace: test-dag
         resourceVersion: '1835685630'
         uid: 78441e8f-2fca-48e1-8334-330506b74bcda`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org