You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/22 17:46:45 UTC
[GitHub] [airflow] MatthewRBruce opened a new pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
MatthewRBruce opened a new pull request #15490:
URL: https://github.com/apache/airflow/pull/15490
If a Kubernetes Pod ends in a state other than `SUCCESS` and `is_delete_operator_pod` is True, then use the `final_state` from the previous `create_new_pod_for_operator` call since the pod is already deleted and the current state can't be re-read.
closes: https://github.com/apache/airflow/issues/15456
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r625293615
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -365,7 +365,10 @@ def execute(self, context) -> Optional[str]:
self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
if final_state != State.SUCCESS:
- status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
+ if self.is_delete_operator_pod:
+ status = final_state
Review comment:
Okay, I've updated the code to return the full V1Pod back through from `monitor_pod`. This allows Airflow to log the full pod description in the event of a failure regardless of `is_delete_pod_operator`.
I modified the original test that checked whether `read_namespaced_pod` was called in the event of a failure, since it doesn't need to be called either way now - it just checks that the failed pod description is reported in the Exception.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651229465
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
if len(pod_list.items) == 1:
try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
- final_state, result = self.handle_pod_overlap(
+ final_state, pod_info, result = self.handle_pod_overlap(
labels, try_numbers_match, launcher, pod_list.items[0]
)
else:
self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
- final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
+ final_state, _, pod_info, result = self.create_new_pod_for_operator(labels, launcher)
Review comment:
👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-833841797
[The Workflow run](https://github.com/apache/airflow/actions/runs/818158865) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-831620276
👍 There were legitimately were mocks I missed updating the return values for, but the issue in master may have been a problem too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651771863
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
I was poking at this last night and I didn't end up being overly keen on overwriting the original pod spec in `self.pod` with the remote pod after completion. I went back to the original idea in the PR, but I renamed all the `pod_info` to `remote_pod` - going with the `pod` variable name as you suggested ended up conflating the returned pod with the `pod` parameter passed into some functions and `self.pod` which just seemed confusing.
If you wouldn't mind taking another look at the PR it would be much appreciated :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651247330
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
The problem here is that we've already deleted the pod on like 550, so we can't call `read_pod_events` against it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651229343
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
if len(pod_list.items) == 1:
try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
- final_state, result = self.handle_pod_overlap(
+ final_state, pod_info, result = self.handle_pod_overlap(
Review comment:
It's used on line 368 if the resulting pod isn't successful, no?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-831632578
[The Workflow run](https://github.com/apache/airflow/actions/runs/808528525) is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651273773
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
Ah, that makes sense then. I was just going over the code and your other comments and I wonder if it wouldn't be easier and better to re-read the pod like `self.pod = launcher.read_pod(self.pod)` in the `finally` blocks of `monitor_launched_pod` and `create_new_pod_for_operator`:
https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L528
https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L550
and then just log `self.pod` here:
https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L369
Way fewer changes and a lot less passing the pod around through the various functions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-825509574
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-831595919
@MatthewRBruce, might be worth rebasing on master if you still have k8s test failures - there was an issue in master that is now fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-825132018
Sure thing - Test added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651928203
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
> Thanks for being receptive to iterating on this
No problem I think we all want a solution that everyone likes 😄
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651284179
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
That works for me, just make sure any exceptions when reading it again don't bubble up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651891874
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
Yeah, I went back and forth on that yesterday as well. There is precedence for setting `self.pod` to be the remote pod, but I think its easier to grok as it is now.
Thanks for being receptive to iterating on this 🍺
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651928203
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
> Thanks for being receptive to iterating on this
No problem I think we all want a solution that everyone likes 😄
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r650056775
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
if len(pod_list.items) == 1:
try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
- final_state, result = self.handle_pod_overlap(
+ final_state, pod_info, result = self.handle_pod_overlap(
labels, try_numbers_match, launcher, pod_list.items[0]
)
else:
self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
- final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
+ final_state, _, pod_info, result = self.create_new_pod_for_operator(labels, launcher)
Review comment:
```suggestion
final_state, _, pod, result = self.create_new_pod_for_operator(labels, launcher)
```
I think I like `pod` better than `pod_info`? Only doing this comment once, but applies everywhere.
##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -129,7 +129,7 @@ def start_pod(self, pod: V1Pod, startup_timeout: int = 120):
raise AirflowException("Pod took too long to start")
time.sleep(1)
- def monitor_pod(self, pod: V1Pod, get_logs: bool) -> Tuple[State, Optional[str]]:
+ def monitor_pod(self, pod: V1Pod, get_logs: bool) -> Tuple[State, V1Pod, Optional[str]]:
"""
Monitors a pod and returns the final state
Review comment:
```suggestion
Monitors a pod and returns the final state, pod, and xcom result
```
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
```suggestion
for event in launcher.read_pod_events(pod).items:
```
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
if len(pod_list.items) == 1:
try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
- final_state, result = self.handle_pod_overlap(
+ final_state, pod_info, result = self.handle_pod_overlap(
Review comment:
```suggestion
final_state, _, result = self.handle_pod_overlap(
```
Since we don't use it?
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -518,17 +517,17 @@ def create_new_pod_for_operator(self, labels, launcher) -> Tuple[State, k8s.V1Po
self.log.debug("Starting pod:\n%s", yaml.safe_dump(self.pod.to_dict()))
try:
launcher.start_pod(self.pod, startup_timeout=self.startup_timeout_seconds)
- final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
+ final_state, pod_info, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
except AirflowException:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(self.pod).items:
+ for event in pod_info.items:
self.log.error("Pod Event: %s - %s", event.reason, event.message)
raise
finally:
if self.is_delete_operator_pod:
self.log.debug("Deleting pod for task %s", self.task_id)
launcher.delete_pod(self.pod)
- return final_state, self.pod, result
+ return final_state, self.pod, pod_info, result
Review comment:
Returning the pod twice?
```suggestion
return final_state, pod_info, result
```
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -518,17 +517,17 @@ def create_new_pod_for_operator(self, labels, launcher) -> Tuple[State, k8s.V1Po
self.log.debug("Starting pod:\n%s", yaml.safe_dump(self.pod.to_dict()))
try:
launcher.start_pod(self.pod, startup_timeout=self.startup_timeout_seconds)
- final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
+ final_state, pod_info, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
except AirflowException:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(self.pod).items:
+ for event in pod_info.items:
Review comment:
```suggestion
for event in launcher.read_pod_events(self.pod).items:
```
I think we want to leave this as-is. We wont get a `pod(_info)` back, plus `read_pod_events` just needs the namespace and name, which we can always use `self.pod` for.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r619415183
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -365,7 +365,10 @@ def execute(self, context) -> Optional[str]:
self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
if final_state != State.SUCCESS:
- status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
+ if self.is_delete_operator_pod:
+ status = final_state
Review comment:
That's a good idea - we could modify `monitor_pod` to return a Tuple of `[State, event, result]` instead of just `[State, result]` (https://github.com/apache/airflow/blob/b844621a6f42cc20b3103bc70c774023268f8de8/airflow/providers/cncf/kubernetes/utils/pod_launcher.py#L170). Then `create_new_pod_for_operator` could pass that back through to `execute` too (or `create_new_pod_for_operator` could read the pod again in it's `finally` clause and pass that back)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r619354226
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -365,7 +365,10 @@ def execute(self, context) -> Optional[str]:
self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
if final_state != State.SUCCESS:
- status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
+ if self.is_delete_operator_pod:
+ status = final_state
Review comment:
I wonder if we should make this actually dump the whole pod like it would if `is_delete_operator_pod` was false, instead of the state. It would take a little refactoring to make it happen, but seems like it could be pretty helpful in that scenario?
Maybe this exception should always just include the state, then have `create_new_pod_for_operator` log the full pod on failure?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651232756
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
if len(pod_list.items) == 1:
try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
- final_state, result = self.handle_pod_overlap(
+ final_state, pod_info, result = self.handle_pod_overlap(
Review comment:
Oh yeah, missed that!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-825070171
If anyone's interesting here the logs for a run pre-change:
```
[2021-04-22 14:28:55,336] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Pending
[2021-04-22 14:28:55,336] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12
[2021-04-22 14:28:56,344] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Pending
[2021-04-22 14:28:56,345] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12
[2021-04-22 14:28:57,354] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Pending
[2021-04-22 14:28:57,354] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12
[2021-04-22 14:28:58,363] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Failed
[2021-04-22 14:28:58,364] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 Failed
[2021-04-22 14:28:59,397] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Failed
[2021-04-22 14:28:59,397] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 Failed
[2021-04-22 14:28:59,405] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Failed
[2021-04-22 14:28:59,405] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 Failed
[2021-04-22 14:28:59,430] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 341, in execute
status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
File "/usr/local/lib/python3.8/site-packages/kubernetes/client/apis/core_v1_api.py", line 18446, in read_namespaced_pod
(data) = self.read_namespaced_pod_with_http_info(name, namespace, **kwargs)
File "/usr/local/lib/python3.8/site-packages/kubernetes/client/apis/core_v1_api.py", line 18524, in read_namespaced_pod_with_http_info
return self.api_client.call_api('/api/v1/namespaces/{namespace}/pods/{name}', 'GET',
File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 330, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 163, in __call_api
response_data = self.request(method, url,
File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 351, in request
return self.rest_client.GET(url,
File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 227, in GET
return self.request("GET", url,
File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
Reason: Not Found
```
and post change:
```
[2021-04-22 15:24:16,494] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Pending
[2021-04-22 15:24:16,494] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55
[2021-04-22 15:24:17,503] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Pending
[2021-04-22 15:24:17,503] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55
[2021-04-22 15:24:18,510] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Pending
[2021-04-22 15:24:18,511] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55
[2021-04-22 15:24:19,520] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Failed
[2021-04-22 15:24:19,520] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 Failed
[2021-04-22 15:24:20,554] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Failed
[2021-04-22 15:24:20,554] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 Failed
[2021-04-22 15:24:20,563] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Failed
[2021-04-22 15:24:20,563] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 Failed
[2021-04-22 15:24:20,598] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 347, in execute
raise AirflowException(f'Pod {self.pod.metadata.name} returned a failure: {status}')
airflow.exceptions.AirflowException: Pod hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 returned a failure: failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 350, in execute
raise AirflowException(f'Pod Launching failed: {ex}')
airflow.exceptions.AirflowException: Pod Launching failed: Pod hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 returned a failure: failed
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-834734996
Another rebase to fix a broken static check and we're green.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651265183
##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
:return:
"""
try:
- (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+ (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
finally:
if self.is_delete_operator_pod:
launcher.delete_pod(pod)
if final_state != State.SUCCESS:
if self.log_events_on_failure:
- for event in launcher.read_pod_events(pod).items:
+ for event in pod_info.items:
Review comment:
The pod events are still available after deleting the pod. We can't use `read_namespaced_pod`, but `read_pod_events` should still work. We also won't get `pod_info` back here in all cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`
Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651887003
##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -170,7 +170,8 @@ def monitor_pod(self, pod: V1Pod, get_logs: bool) -> Tuple[State, Optional[str]]
while self.pod_is_running(pod):
self.log.info('Pod %s has state %s', pod.metadata.name, State.RUNNING)
time.sleep(2)
- return self._task_status(self.read_pod(pod)), result
+ pod_info = self.read_pod(pod)
+ return self._task_status(pod_info), pod_info, result
Review comment:
```suggestion
remote_pod = self.read_pod(pod)
return self._task_status(remote_pod), remote_pod, result
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org