You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/31 19:33:15 UTC

[GitHub] [airflow] dimberman opened a new pull request #10666: Add on_kill support for the KubernetesPodOperator

dimberman opened a new pull request #10666:
URL: https://github.com/apache/airflow/pull/10666


   This PR ensures that when a user kills a KubernetesPodOperator task
   in the airflow UI, that the associated pod is also killed using the
   on_kill method.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684917240


   > @ashb I think this should be a hard decision. If you kill a task in the UI, there's an expectation that it is killed immediately. Setting a timeout config would just add another knob on a pretty crammed panel.
   
   I don't think there's any requirement for it to be "instant", just quick. I was thinking this would be an operator paremeter, not anything in the UI.
   
   I was also though perhaps confusing Kube's gracePeriodSeconds with the docker stop-timeout-then-kill but they are different.
   
   So this is probably okay. Only question that remains then is if this needs a config option to disable. I don't know either way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686786315


   @spencerschack oof good catch I'll make a quick PR to make that optional thank you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] spencerschack commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
spencerschack commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686773180


   @dimberman Sorry to just jump into the convo, but by passing 0 seconds as the grace period, you're telling k8s to send SIGKILL to the container process instead of SIGTERM and giving it a chance to clean up. This would be an incompatible change with my company's setup due to the fact that tasks need to clean up resources before they exit. At the very least the grace period should configurable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-687282389


   @ashb @spencerschack  Fix PR :) https://github.com/apache/airflow/pull/10727


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #10666: Add on_kill support for the KubernetesPodOperator

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-683986627


   cc: @jhtimmins 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684899179


   Questions:
   
   - Does this want to be an option/argument to the operator to control this behaviour, or do you think that this always makes sense?
   - Doe we want to hard-kill it, or do we want to give the pod some grace period after receiving a signal to stop what it's doing and clean up?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-685183867


   > My 2c, I think giving pods the opportunity to gracefully exit should be the default. This could also be triggered by draining a node for maintenance, resource pressure on the node, etc, if I'm not mistaken.
   
   is that a candidate for operator parameter or kubernetes_config?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#discussion_r481202289



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -441,3 +444,10 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
                 'Pod returned a failure: {state}'.format(state=final_state)
             )
         return final_state, result
+
+    def on_kill(self) -> None:
+        if self.pod:
+            pod: k8s.V1Pod = self.pod
+            namespace = pod.metadata.namespace
+            name = pod.metadata.name
+            self.client.delete_namespaced_pod(name=name, namespace=namespace, grace_period_seconds=0)

Review comment:
       Isn't gracePeriodSeconds something else than how long before the pod is stopped -- this controls how long the pod is kept around for in the API server, but not when the pod is terminated, isn't it?
   
   (I remember we investigated this for an EKS fix around short lived pods and increasing this didn't help.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-685787218


   @jedcunningham oof sorry didn't see your comment there. We can make a separate commit to add that as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684921099


   @ashb I don't see a situation where someone would want to turn this off. That would be the equivalent of asking people "would you like us to leave behind zombie processes when airflow shuts down". 
   
   I think an operator parameter is possible... though it'd still just be kind of a weird thing to ask. "How long would you like the pod to stay up after you manually kill the task?" I imagine any form of graceful shutdown could be added to the pod spec in the form of a prestophook.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on a change in pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman commented on a change in pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#discussion_r481206080



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -441,3 +444,10 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
                 'Pod returned a failure: {state}'.format(state=final_state)
             )
         return final_state, result
+
+    def on_kill(self) -> None:
+        if self.pod:
+            pod: k8s.V1Pod = self.pod
+            namespace = pod.metadata.namespace
+            name = pod.metadata.name
+            self.client.delete_namespaced_pod(name=name, namespace=namespace, grace_period_seconds=0)

Review comment:
       @ashb grace_period_seconds is a recommendation, raising it doesn't necessarily mean the ApiServer will wait that long. By setting it to 0 we are telling the ApiServer "Delete this before doing anything else" rather than "get around to it eventually"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-685183867


   > My 2c, I think giving pods the opportunity to gracefully exit should be the default. This could also be triggered by draining a node for maintenance, resource pressure on the node, etc, if I'm not mistaken.
   
   That might be a candidate for operator parameter or kubernetes_config? WDYT @dimberman ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684900462


   @ashb I think this should be a hard decision. If you kill a task in the UI, there's an expectation that it is killed immediately. Setting a timeout config would just add another knob on a pretty crammed panel. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684988060


   My 2c, I think giving pods the opportunity to gracefully exit should be the default. This could also be triggered by draining a node for maintenance, resource pressure on the node, etc, if I'm not mistaken.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686786941


   My pleasure @spencerschack :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] spencerschack commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
spencerschack commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686786777


   Thanks @dimberman! Appreciate your hard work!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman merged pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.

Posted by GitBox <gi...@apache.org>.
dimberman merged pull request #10666:
URL: https://github.com/apache/airflow/pull/10666


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org