You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/05 14:01:58 UTC

[GitHub] [airflow] XD-DENG opened a new pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

XD-DENG opened a new pull request #15204:
URL: https://github.com/apache/airflow/pull/15204


   closes: #15193
   
   Currently condition `if the pod is created by Airflow` is not considered. This commit fixes this.
   
   We decide if the Pod is created by Airflow via checking if it has all the labels added in [`PodGenerator.construct_pod()`](https://github.com/apache/airflow/blob/2.0.1/airflow/kubernetes/pod_generator.py#L385).
   
   Test added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#issuecomment-813587809


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r609102137



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -90,7 +90,23 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+    ]

Review comment:
       Yea go for it, I couldn't think of any reason :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r607111232



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -91,9 +91,24 @@ def cleanup_pods(args):
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
     list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'airflow-worker',
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+        'kubernetes_executor',
+    ]
     while True:  # pylint: disable=too-many-nested-blocks
         pod_list = kube_client.list_namespaced_pod(**list_kwargs)

Review comment:
       Can we use `label_selector` parameter to do server-side filtering? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#issuecomment-815493367


   Thanks @mik-laj and @kaxil . 
   
   I have added one more commit, https://github.com/apache/airflow/pull/15204/commits/c1218d5d584111717ad9f8569dd4e2f29637b8eb, to update the help string of this CLI (to highlight we only clean up pods created by KubernetesExecutor/KubernetesPodOperator), so that the scope is clearer to users.
   
   If you have no objection about it, I will merge later today. Cheers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG merged pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
XD-DENG merged pull request #15204:
URL: https://github.com/apache/airflow/pull/15204


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r607172542



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -90,7 +90,23 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+    ]

Review comment:
       We should probably not look for **all** of these labels:
   
   We could probably simplify this to see if there is `kubernetes_pod_operator` or `kubernetes_executor` in the labels.
   
   Let me think a bit more on this to see if we are missing a case -- what happens if a **Celery Worker**  pod hangs around




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r608404225



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -90,7 +90,23 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+    ]

Review comment:
       Hi @kaxil , a gentle ping for this :)
   
   I'm not sure how Celery Worker will be involved in this though. IMO we are focusing on "_pods started by Airflow_", not "_pods running Airflow (like worker node)_"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r607138403



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -90,7 +90,25 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'airflow-worker',
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+        'kubernetes_executor',
+    ]
+    list_kwargs = {
+        "namespace": namespace,
+        "limit": 500,
+        "label_selector": client.V1LabelSelector(
+            match_expressions=[
+                client.V1LabelSelectorRequirement(key=label, operator="Exists")

Review comment:
       > The requirements are ANDed.
   
   This command should also work when the pod does not contain `kubernetes_executor` label i.e. it is started by KubernetetsPodOperator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r607127273



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -91,9 +91,24 @@ def cleanup_pods(args):
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
     list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'airflow-worker',
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+        'kubernetes_executor',
+    ]
     while True:  # pylint: disable=too-many-nested-blocks
         pod_list = kube_client.list_namespaced_pod(**list_kwargs)

Review comment:
       Definitely agree. Many thanks for the suggestion!
   
   Changes pushed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r607159566



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -90,7 +90,25 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'airflow-worker',
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+        'kubernetes_executor',
+    ]
+    list_kwargs = {
+        "namespace": namespace,
+        "limit": 500,
+        "label_selector": client.V1LabelSelector(
+            match_expressions=[
+                client.V1LabelSelectorRequirement(key=label, operator="Exists")

Review comment:
       Makes sense to me. For the same reason, if I'm not wrong, we should exclude considering `airflow-worker` as well.
   
   Addressed in https://github.com/apache/airflow/pull/15204/commits/e406a0895a173587fd0cff8354650c740e687fd1
   
   Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r607111232



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -91,9 +91,24 @@ def cleanup_pods(args):
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
     list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'airflow-worker',
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+        'kubernetes_executor',
+    ]
     while True:  # pylint: disable=too-many-nested-blocks
         pod_list = kube_client.list_namespaced_pod(**list_kwargs)

Review comment:
       Can we use `label_selector`/`field_selector` parameters to do server-side filtering? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#discussion_r607138403



##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -90,7 +90,25 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'airflow-worker',
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+        'kubernetes_executor',
+    ]
+    list_kwargs = {
+        "namespace": namespace,
+        "limit": 500,
+        "label_selector": client.V1LabelSelector(
+            match_expressions=[
+                client.V1LabelSelectorRequirement(key=label, operator="Exists")

Review comment:
       > The requirements are ANDed.
   
   
   This command should also work when the pod does not contain `kubernetes_executor` label i.e. it is started by KubernetetsPodOperator.

##########
File path: airflow/cli/commands/kubernetes_command.py
##########
@@ -90,7 +90,25 @@ def cleanup_pods(args):
     print('Loading Kubernetes configuration')
     kube_client = get_kube_client()
     print(f'Listing pods in namespace {namespace}')
-    list_kwargs = {"namespace": namespace, "limit": 500}
+    airflow_pod_labels = [
+        'airflow-worker',
+        'dag_id',
+        'task_id',
+        'execution_date',
+        'try_number',
+        'airflow_version',
+        'kubernetes_executor',
+    ]
+    list_kwargs = {
+        "namespace": namespace,
+        "limit": 500,
+        "label_selector": client.V1LabelSelector(
+            match_expressions=[
+                client.V1LabelSelectorRequirement(key=label, operator="Exists")

Review comment:
       > The requirements are ANDed.
   
   https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1LabelSelector.md
   
   This command should also work when the pod does not contain `kubernetes_executor` label i.e. it is started by KubernetetsPodOperator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15204: BugFix: CLI 'kubernetes cleanup-pods' should only clean up Airflow-created Pods

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15204:
URL: https://github.com/apache/airflow/pull/15204#issuecomment-813446902


   [The Workflow run](https://github.com/apache/airflow/actions/runs/719570810) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org