You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/06 16:39:10 UTC

[GitHub] [airflow] Sinsin1367 opened a new pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Sinsin1367 opened a new pull request #8734:
URL: https://github.com/apache/airflow/pull/8734


   …his will surface OOMKilled cases by kubernetes (#8722)
   
   Added an optional flag that logs container statuses when the pod fails. It helps when a container runs out of memory to log the OOMKilled status of it. Users then would know they need to request more memory for their tasks.
   https://github.com/apache/airflow/issues/8722
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [ x] Description above provides context of the change
   - [ x] Unit tests coverage for changes (not needed for documentation changes)
   - [ x] Target Github ISSUE in description if exists
   - [ x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [ x] Relevant documentation is updated including usage instructions.
   - [ x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#issuecomment-630730338


   Thanks for the tag @Sinsin1367 . Besides being in favour of flipping those settings to True, it looks good to me. 
   
   As context, we currently run this in production since it helps our users debug erroneous statuses like OOMKilled, especially since it's difficult to pull out those statuses any other way if `is_delete_pod_operator=True`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Sinsin1367 commented on a change in pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
Sinsin1367 commented on a change in pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#discussion_r422179683



##########
File path: airflow/kubernetes/pod_launcher.py
##########
@@ -183,6 +183,23 @@ def read_pod_logs(self, pod: V1Pod, tail_lines: int = 10):
                 'There was an error reading the kubernetes API: {}'.format(e)
             )
 
+    @tenacity.retry(
+        stop=tenacity.stop_after_attempt(3),
+        wait=tenacity.wait_exponential(),
+        reraise=True
+    )
+    def read_pod_status(self, pod):
+        """Reads statuses from the POD"""
+        try:
+            return self._client.read_namespaced_pod_status(
+                namespace=pod.metadata.namespace,
+                name=pod.metadata.name
+            )
+        except BaseHTTPError as e:
+            raise AirflowException(

Review comment:
       @ashb It is the way all api calls are written in pod_launcher. All other api calls in methods "read_pod", "read_pod_logs", and "read_pod_events" are capturing and throwing AirflowExection. kubernetes_pod.py is also only capturing AirflowException so we have to capture and rethrow with current code structure I would say.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Sinsin1367 commented on pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
Sinsin1367 commented on pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#issuecomment-644865210


   @ashb I still believe this PR is not where I should flip these flags specially the former flag. We can have a separate PR to set these defaults to True but including it in this one seems to be mixing multiple things to me. If you agree, do you mind to merge this? If now I would flip the flags.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stale[bot] commented on pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#issuecomment-703189619


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Sinsin1367 commented on pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
Sinsin1367 commented on pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#issuecomment-625412537


   @ashb  @chrismclennon I would appreciate if you guys take a look and share your thoughts.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on a change in pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on a change in pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#discussion_r427191009



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -177,6 +181,7 @@ def __init__(self,  # pylint: disable=too-many-arguments,too-many-locals
                  full_pod_spec: Optional[k8s.V1Pod] = None,
                  init_containers: Optional[List[k8s.V1Container]] = None,
                  log_events_on_failure: bool = False,
+                 log_container_statuses_on_failure: bool = False,

Review comment:
       In my opinion having both `log_events_on_failure` and `log_container_statuses_on_failure` be True makes more sense to me since printing out a few extra logs is close to zero cost and doesn't add that much clutter. While it looks like it's set to False to be aligned with the precedent set in code, I'm in favor of switching both to True.
   
   Although it's tiny, it does feel slightly adjacent to the immediate issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Sinsin1367 commented on a change in pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
Sinsin1367 commented on a change in pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#discussion_r422181509



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -177,6 +181,7 @@ def __init__(self,  # pylint: disable=too-many-arguments,too-many-locals
                  full_pod_spec: Optional[k8s.V1Pod] = None,
                  init_containers: Optional[List[k8s.V1Container]] = None,
                  log_events_on_failure: bool = False,
+                 log_container_statuses_on_failure: bool = False,

Review comment:
       I do think these logs are useful info and absolutely no harm in having them on by default. But I can see why they defaulted to False not to contaminate logs unless the task has a problem and you will turn it on then.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on a change in pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on a change in pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#discussion_r427191009



##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -177,6 +181,7 @@ def __init__(self,  # pylint: disable=too-many-arguments,too-many-locals
                  full_pod_spec: Optional[k8s.V1Pod] = None,
                  init_containers: Optional[List[k8s.V1Container]] = None,
                  log_events_on_failure: bool = False,
+                 log_container_statuses_on_failure: bool = False,

Review comment:
       In my opinion having both `log_events_on_failure` and `log_container_statuses_on_failure` be True makes more sense to me since printing out a few extra logs is close to zero cost and doesn't add that much clutter. While it looks like it's set to False to be aligned with the precedent set in code, I'm in favor of switching both to True.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Adil-Ibragimov commented on pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
Adil-Ibragimov commented on pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#issuecomment-673360455


   This MR should closed and replaced with new one https://github.com/apache/airflow/pull/10315 (rebased version to resolve conflicts)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stale[bot] commented on pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#issuecomment-670836635


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#discussion_r421989967



##########
File path: airflow/kubernetes/pod_launcher.py
##########
@@ -183,6 +183,23 @@ def read_pod_logs(self, pod: V1Pod, tail_lines: int = 10):
                 'There was an error reading the kubernetes API: {}'.format(e)
             )
 
+    @tenacity.retry(
+        stop=tenacity.stop_after_attempt(3),
+        wait=tenacity.wait_exponential(),
+        reraise=True
+    )
+    def read_pod_status(self, pod):
+        """Reads statuses from the POD"""
+        try:
+            return self._client.read_namespaced_pod_status(
+                namespace=pod.metadata.namespace,
+                name=pod.metadata.name
+            )
+        except BaseHTTPError as e:
+            raise AirflowException(

Review comment:
       We don't need to catch and rethrow this as an AirflowException. So, let's remove the try/except here

##########
File path: airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
##########
@@ -177,6 +181,7 @@ def __init__(self,  # pylint: disable=too-many-arguments,too-many-locals
                  full_pod_spec: Optional[k8s.V1Pod] = None,
                  init_containers: Optional[List[k8s.V1Container]] = None,
                  log_events_on_failure: bool = False,
+                 log_container_statuses_on_failure: bool = False,

Review comment:
       wouldn't True be more useful a default (and for log events too)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stale[bot] closed pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
stale[bot] closed pull request #8734:
URL: https://github.com/apache/airflow/pull/8734


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Sinsin1367 commented on pull request #8734: Added optional logging for pod container statuses once a pod fails. T…

Posted by GitBox <gi...@apache.org>.
Sinsin1367 commented on pull request #8734:
URL: https://github.com/apache/airflow/pull/8734#issuecomment-630346426


   @ashb @chrismclennon Do you mind taking a quick look and approving this if no more comments?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org