You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/31 19:33:15 UTC
[GitHub] [airflow] dimberman opened a new pull request #10666: Add on_kill support for the KubernetesPodOperator
dimberman opened a new pull request #10666:
URL: https://github.com/apache/airflow/pull/10666
This PR ensures that when a user kills a KubernetesPodOperator task
in the airflow UI, that the associated pod is also killed using the
on_kill method.
<!--
Thank you for contributing! Please make sure that your code changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.
Feel free to ping committers for the review!
In case of existing issue, reference it using one of the following:
closes: #ISSUE
related: #ISSUE
How to write a good git commit message:
http://chris.beams.io/posts/git-commit/
-->
---
**^ Add meaningful description above**
Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684917240
> @ashb I think this should be a hard decision. If you kill a task in the UI, there's an expectation that it is killed immediately. Setting a timeout config would just add another knob on a pretty crammed panel.
I don't think there's any requirement for it to be "instant", just quick. I was thinking this would be an operator paremeter, not anything in the UI.
I was also though perhaps confusing Kube's gracePeriodSeconds with the docker stop-timeout-then-kill but they are different.
So this is probably okay. Only question that remains then is if this needs a config option to disable. I don't know either way.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686786315
@spencerschack oof good catch I'll make a quick PR to make that optional thank you.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] spencerschack commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
spencerschack commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686773180
@dimberman Sorry to just jump into the convo, but by passing 0 seconds as the grace period, you're telling k8s to send SIGKILL to the container process instead of SIGTERM and giving it a chance to clean up. This would be an incompatible change with my company's setup due to the fact that tasks need to clean up resources before they exit. At the very least the grace period should configurable.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-687282389
@ashb @spencerschack Fix PR :) https://github.com/apache/airflow/pull/10727
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on pull request #10666: Add on_kill support for the KubernetesPodOperator
Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-683986627
cc: @jhtimmins
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684899179
Questions:
- Does this want to be an option/argument to the operator to control this behaviour, or do you think that this always makes sense?
- Doe we want to hard-kill it, or do we want to give the pod some grace period after receiving a signal to stop what it's doing and clean up?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-685183867
> My 2c, I think giving pods the opportunity to gracefully exit should be the default. This could also be triggered by draining a node for maintenance, resource pressure on the node, etc, if I'm not mistaken.
is that a candidate for operator parameter or kubernetes_config?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#discussion_r481202289
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -441,3 +444,10 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
'Pod returned a failure: {state}'.format(state=final_state)
)
return final_state, result
+
+ def on_kill(self) -> None:
+ if self.pod:
+ pod: k8s.V1Pod = self.pod
+ namespace = pod.metadata.namespace
+ name = pod.metadata.name
+ self.client.delete_namespaced_pod(name=name, namespace=namespace, grace_period_seconds=0)
Review comment:
Isn't gracePeriodSeconds something else than how long before the pod is stopped -- this controls how long the pod is kept around for in the API server, but not when the pod is terminated, isn't it?
(I remember we investigated this for an EKS fix around short lived pods and increasing this didn't help.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-685787218
@jedcunningham oof sorry didn't see your comment there. We can make a separate commit to add that as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684921099
@ashb I don't see a situation where someone would want to turn this off. That would be the equivalent of asking people "would you like us to leave behind zombie processes when airflow shuts down".
I think an operator parameter is possible... though it'd still just be kind of a weird thing to ask. "How long would you like the pod to stay up after you manually kill the task?" I imagine any form of graceful shutdown could be added to the pod spec in the form of a prestophook.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on a change in pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman commented on a change in pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#discussion_r481206080
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -441,3 +444,10 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
'Pod returned a failure: {state}'.format(state=final_state)
)
return final_state, result
+
+ def on_kill(self) -> None:
+ if self.pod:
+ pod: k8s.V1Pod = self.pod
+ namespace = pod.metadata.namespace
+ name = pod.metadata.name
+ self.client.delete_namespaced_pod(name=name, namespace=namespace, grace_period_seconds=0)
Review comment:
@ashb grace_period_seconds is a recommendation, raising it doesn't necessarily mean the ApiServer will wait that long. By setting it to 0 we are telling the ApiServer "Delete this before doing anything else" rather than "get around to it eventually"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil edited a comment on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-685183867
> My 2c, I think giving pods the opportunity to gracefully exit should be the default. This could also be triggered by draining a node for maintenance, resource pressure on the node, etc, if I'm not mistaken.
That might be a candidate for operator parameter or kubernetes_config? WDYT @dimberman ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684900462
@ashb I think this should be a hard decision. If you kill a task in the UI, there's an expectation that it is killed immediately. Setting a timeout config would just add another knob on a pretty crammed panel.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-684988060
My 2c, I think giving pods the opportunity to gracefully exit should be the default. This could also be triggered by draining a node for maintenance, resource pressure on the node, etc, if I'm not mistaken.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686786941
My pleasure @spencerschack :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] spencerschack commented on pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
spencerschack commented on pull request #10666:
URL: https://github.com/apache/airflow/pull/10666#issuecomment-686786777
Thanks @dimberman! Appreciate your hard work!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dimberman merged pull request #10666: Delete the pod launched by KubernetesPodOperator when task is cleared.
Posted by GitBox <gi...@apache.org>.
dimberman merged pull request #10666:
URL: https://github.com/apache/airflow/pull/10666
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org