You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/22 17:46:45 UTC

[GitHub] [airflow] MatthewRBruce opened a new pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

MatthewRBruce opened a new pull request #15490:
URL: https://github.com/apache/airflow/pull/15490


   If a Kubernetes Pod ends in a state other than `SUCCESS` and `is_delete_operator_pod` is True, then use the `final_state` from the previous `create_new_pod_for_operator` call since the pod is already deleted and the current state can't be re-read.
   
   closes: https://github.com/apache/airflow/issues/15456
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r625293615



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -365,7 +365,10 @@ def execute(self, context) -> Optional[str]:
                 self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
                 final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
             if final_state != State.SUCCESS:
-                status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
+                if self.is_delete_operator_pod:
+                    status = final_state

Review comment:
       Okay, I've updated the code to return the full V1Pod back through from `monitor_pod`.  This allows Airflow to log the full pod description in the event of a failure regardless of `is_delete_pod_operator`. 
   
   I modified the original test that checked whether `read_namespaced_pod` was called in the event of a failure, since it doesn't need to be called either way now - it just checks that the failed pod description is reported in the Exception.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651229465



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
 
             if len(pod_list.items) == 1:
                 try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
-                final_state, result = self.handle_pod_overlap(
+                final_state, pod_info, result = self.handle_pod_overlap(
                     labels, try_numbers_match, launcher, pod_list.items[0]
                 )
             else:
                 self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
-                final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
+                final_state, _, pod_info, result = self.create_new_pod_for_operator(labels, launcher)

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-833841797


   [The Workflow run](https://github.com/apache/airflow/actions/runs/818158865) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-831620276


   👍  There were legitimately were mocks I missed updating the return values for, but the issue in master may have been a problem too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651771863



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       I was poking at this last night and I didn't end up being overly keen on overwriting the original pod spec in `self.pod` with the remote pod after completion.  I went back to the original idea in the PR, but I renamed all the `pod_info` to `remote_pod` - going with the `pod` variable name as you suggested ended up conflating the returned pod with the `pod` parameter passed into some functions and `self.pod` which just seemed confusing.
   
   If you wouldn't mind taking another look at the PR it would be much appreciated :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651247330



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       The problem here is that we've already deleted the pod on like 550, so we can't call `read_pod_events` against it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651229343



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
 
             if len(pod_list.items) == 1:
                 try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
-                final_state, result = self.handle_pod_overlap(
+                final_state, pod_info, result = self.handle_pod_overlap(

Review comment:
       It's used on line 368 if the resulting pod isn't successful, no?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-831632578


   [The Workflow run](https://github.com/apache/airflow/actions/runs/808528525) is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651273773



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       Ah, that makes sense then.  I was just going over the code and your other comments and I wonder if it wouldn't be easier and better to re-read the pod like `self.pod = launcher.read_pod(self.pod)` in the `finally` blocks of `monitor_launched_pod` and `create_new_pod_for_operator`:
   https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L528
   https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L550
   
   and then just log `self.pod` here:
   https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L369
   
   Way fewer changes and a lot less passing the pod around through the various functions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-825509574


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-831595919


   @MatthewRBruce, might be worth rebasing on master if you still have k8s test failures - there was an issue in master that is now fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-825132018


   Sure thing - Test added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651928203



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       > Thanks for being receptive to iterating on this
   
   No problem I think we all want a solution that everyone likes 😄 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651284179



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       That works for me, just make sure any exceptions when reading it again don't bubble up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651891874



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       Yeah, I went back and forth on that yesterday as well. There is precedence for setting `self.pod` to be the remote pod, but I think its easier to grok as it is now.
   
   Thanks for being receptive to iterating on this 🍺




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651928203



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       > Thanks for being receptive to iterating on this
   No problem I think we all want a solution that everyone likes 😄 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r650056775



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
 
             if len(pod_list.items) == 1:
                 try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
-                final_state, result = self.handle_pod_overlap(
+                final_state, pod_info, result = self.handle_pod_overlap(
                     labels, try_numbers_match, launcher, pod_list.items[0]
                 )
             else:
                 self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
-                final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
+                final_state, _, pod_info, result = self.create_new_pod_for_operator(labels, launcher)

Review comment:
       ```suggestion
                   final_state, _, pod, result = self.create_new_pod_for_operator(labels, launcher)
   ```
   
   I think I like `pod` better than `pod_info`? Only doing this comment once, but applies everywhere.

##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -129,7 +129,7 @@ def start_pod(self, pod: V1Pod, startup_timeout: int = 120):
                     raise AirflowException("Pod took too long to start")
                 time.sleep(1)
 
-    def monitor_pod(self, pod: V1Pod, get_logs: bool) -> Tuple[State, Optional[str]]:
+    def monitor_pod(self, pod: V1Pod, get_logs: bool) -> Tuple[State, V1Pod, Optional[str]]:
         """
         Monitors a pod and returns the final state

Review comment:
       ```suggestion
           Monitors a pod and returns the final state, pod, and xcom result
   ```

##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       ```suggestion
                   for event in launcher.read_pod_events(pod).items:
   ```

##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
 
             if len(pod_list.items) == 1:
                 try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
-                final_state, result = self.handle_pod_overlap(
+                final_state, pod_info, result = self.handle_pod_overlap(

Review comment:
       ```suggestion
                   final_state, _, result = self.handle_pod_overlap(
   ```
   Since we don't use it?

##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -518,17 +517,17 @@ def create_new_pod_for_operator(self, labels, launcher) -> Tuple[State, k8s.V1Po
         self.log.debug("Starting pod:\n%s", yaml.safe_dump(self.pod.to_dict()))
         try:
             launcher.start_pod(self.pod, startup_timeout=self.startup_timeout_seconds)
-            final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
+            final_state, pod_info, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
         except AirflowException:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(self.pod).items:
+                for event in pod_info.items:
                     self.log.error("Pod Event: %s - %s", event.reason, event.message)
             raise
         finally:
             if self.is_delete_operator_pod:
                 self.log.debug("Deleting pod for task %s", self.task_id)
                 launcher.delete_pod(self.pod)
-        return final_state, self.pod, result
+        return final_state, self.pod, pod_info, result

Review comment:
       Returning the pod twice?
   ```suggestion
           return final_state, pod_info, result
   ```

##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -518,17 +517,17 @@ def create_new_pod_for_operator(self, labels, launcher) -> Tuple[State, k8s.V1Po
         self.log.debug("Starting pod:\n%s", yaml.safe_dump(self.pod.to_dict()))
         try:
             launcher.start_pod(self.pod, startup_timeout=self.startup_timeout_seconds)
-            final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
+            final_state, pod_info, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
         except AirflowException:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(self.pod).items:
+                for event in pod_info.items:

Review comment:
       ```suggestion
                   for event in launcher.read_pod_events(self.pod).items:
   ```
   
   I think we want to leave this as-is. We wont get a `pod(_info)` back, plus `read_pod_events` just needs the namespace and name, which we can always use `self.pod` for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r619415183



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -365,7 +365,10 @@ def execute(self, context) -> Optional[str]:
                 self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
                 final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
             if final_state != State.SUCCESS:
-                status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
+                if self.is_delete_operator_pod:
+                    status = final_state

Review comment:
       That's a good idea - we could modify `monitor_pod` to return a Tuple of `[State, event, result]` instead of just `[State, result]` (https://github.com/apache/airflow/blob/b844621a6f42cc20b3103bc70c774023268f8de8/airflow/providers/cncf/kubernetes/utils/pod_launcher.py#L170).  Then `create_new_pod_for_operator` could pass that back through to `execute` too (or `create_new_pod_for_operator` could read the pod again in it's `finally` clause and pass that back)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r619354226



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -365,7 +365,10 @@ def execute(self, context) -> Optional[str]:
                 self.log.info("creating pod with labels %s and launcher %s", labels, launcher)
                 final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
             if final_state != State.SUCCESS:
-                status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
+                if self.is_delete_operator_pod:
+                    status = final_state

Review comment:
       I wonder if we should make this actually dump the whole pod like it would if `is_delete_operator_pod` was false, instead of the state. It would take a little refactoring to make it happen, but seems like it could be pretty helpful in that scenario?
   
   Maybe this exception should always just include the state, then have `create_new_pod_for_operator` log the full pod on failure?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651232756



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -358,15 +358,14 @@ def execute(self, context) -> Optional[str]:
 
             if len(pod_list.items) == 1:
                 try_numbers_match = self._try_numbers_match(context, pod_list.items[0])
-                final_state, result = self.handle_pod_overlap(
+                final_state, pod_info, result = self.handle_pod_overlap(

Review comment:
       Oh yeah, missed that!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-825070171


   If anyone's interesting here the logs for a run pre-change:
   ```
   [2021-04-22 14:28:55,336] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Pending
   [2021-04-22 14:28:55,336] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12
   [2021-04-22 14:28:56,344] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Pending
   [2021-04-22 14:28:56,345] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12
   [2021-04-22 14:28:57,354] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Pending
   [2021-04-22 14:28:57,354] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12
   [2021-04-22 14:28:58,363] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Failed
   [2021-04-22 14:28:58,364] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 Failed
   [2021-04-22 14:28:59,397] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Failed
   [2021-04-22 14:28:59,397] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 Failed
   [2021-04-22 14:28:59,405] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 had an event of type Failed
   [2021-04-22 14:28:59,405] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.0c8adcbf2e15483d8511404d771eed12 Failed
   [2021-04-22 14:28:59,430] {taskinstance.py:1482} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
       result = task_copy.execute(context=context)
     File "/usr/local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 341, in execute
       status = self.client.read_namespaced_pod(self.pod.metadata.name, self.namespace)
     File "/usr/local/lib/python3.8/site-packages/kubernetes/client/apis/core_v1_api.py", line 18446, in read_namespaced_pod
       (data) = self.read_namespaced_pod_with_http_info(name, namespace, **kwargs)
     File "/usr/local/lib/python3.8/site-packages/kubernetes/client/apis/core_v1_api.py", line 18524, in read_namespaced_pod_with_http_info
       return self.api_client.call_api('/api/v1/namespaces/{namespace}/pods/{name}', 'GET',
     File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 330, in call_api
       return self.__call_api(resource_path, method,
     File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 163, in __call_api
       response_data = self.request(method, url,
     File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 351, in request
       return self.rest_client.GET(url,
     File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 227, in GET
       return self.request("GET", url,
     File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 222, in request
       raise ApiException(http_resp=r)
   kubernetes.client.rest.ApiException: (404)
   Reason: Not Found
   ```
   
   and post change:
   
   ```
   [2021-04-22 15:24:16,494] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Pending
   [2021-04-22 15:24:16,494] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55
   [2021-04-22 15:24:17,503] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Pending
   [2021-04-22 15:24:17,503] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55
   [2021-04-22 15:24:18,510] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Pending
   [2021-04-22 15:24:18,511] {pod_launcher.py:113} WARNING - Pod not yet started: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55
   [2021-04-22 15:24:19,520] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Failed
   [2021-04-22 15:24:19,520] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 Failed
   [2021-04-22 15:24:20,554] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Failed
   [2021-04-22 15:24:20,554] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 Failed
   [2021-04-22 15:24:20,563] {pod_launcher.py:177} INFO - Event: hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 had an event of type Failed
   [2021-04-22 15:24:20,563] {pod_launcher.py:287} ERROR - Event with job id hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 Failed
   [2021-04-22 15:24:20,598] {taskinstance.py:1482} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/usr/local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 347, in execute
       raise AirflowException(f'Pod {self.pod.metadata.name} returned a failure: {status}')
   airflow.exceptions.AirflowException: Pod hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 returned a failure: failed
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
       result = task_copy.execute(context=context)
     File "/usr/local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 350, in execute
       raise AirflowException(f'Pod Launching failed: {ex}')
   airflow.exceptions.AirflowException: Pod Launching failed: Pod hello-world-pod-fail.2f52f1aa0a2c447ab187626bec759e55 returned a failure: failed
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] MatthewRBruce commented on pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
MatthewRBruce commented on pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#issuecomment-834734996


   Another rebase to fix a broken static check and we're green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651265183



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -545,17 +544,17 @@ def monitor_launched_pod(self, launcher, pod) -> Tuple[State, Optional[str]]:
         :return:
         """
         try:
-            (final_state, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
+            (final_state, pod_info, result) = launcher.monitor_pod(pod, get_logs=self.get_logs)
         finally:
             if self.is_delete_operator_pod:
                 launcher.delete_pod(pod)
         if final_state != State.SUCCESS:
             if self.log_events_on_failure:
-                for event in launcher.read_pod_events(pod).items:
+                for event in pod_info.items:

Review comment:
       The pod events are still available after deleting the pod. We can't use `read_namespaced_pod`, but `read_pod_events` should still work. We also won't get `pod_info` back here in all cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15490: Fix unsuccessful KubernetesPod final_state call when `is_delete_operator_pod=True`

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15490:
URL: https://github.com/apache/airflow/pull/15490#discussion_r651887003



##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -170,7 +170,8 @@ def monitor_pod(self, pod: V1Pod, get_logs: bool) -> Tuple[State, Optional[str]]
         while self.pod_is_running(pod):
             self.log.info('Pod %s has state %s', pod.metadata.name, State.RUNNING)
             time.sleep(2)
-        return self._task_status(self.read_pod(pod)), result
+        pod_info = self.read_pod(pod)
+        return self._task_status(pod_info), pod_info, result

Review comment:
       ```suggestion
           remote_pod = self.read_pod(pod)
           return self._task_status(remote_pod), remote_pod, result
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org