You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/27 03:49:46 UTC

[GitHub] [airflow] chrismclennon commented on pull request #10315: Add retry_only_on_pod_launching_failure and log_container_statuses_on…

chrismclennon commented on pull request #10315:
URL: https://github.com/apache/airflow/pull/10315#issuecomment-681338417


   Chiming in since I work with Adil and we make use of this in our Airflow installation. Our users have found this helpful to debug any Kubernetes related errors that would crop up. Things like OOMKill, or any event that would prevent the pod from launching, such as the image not being found, which are otherwise unreported in the logs.
   
   An example of what the logs look like:
   ```
   [2020-08-19 22:22:34,666] {IndeedKubernetesPodOperator.py:324} ERROR - Pod Event: Scheduled - Successfully assigned airflow--airflow/print-numbers-e3e77321 to ip-10-104-4-119.ec2.internal
   [2020-08-19 22:22:34,666] {IndeedKubernetesPodOperator.py:324} ERROR - Pod Event: Pulling - Pulling image "alpine"
   [2020-08-19 22:22:34,666] {IndeedKubernetesPodOperator.py:324} ERROR - Pod Event: Pulled - Successfully pulled image "alpine"
   [2020-08-19 22:22:34,666] {IndeedKubernetesPodOperator.py:324} ERROR - Pod Event: Created - Created container base
   [2020-08-19 22:22:34,667] {IndeedKubernetesPodOperator.py:324} ERROR - Pod Event: Started - Started container base
   [2020-08-19 22:22:34,762] {IndeedKubernetesPodOperator.py:330} ERROR - Pod has not succeeded, look for OOMKilled in containers statuses.
   [2020-08-19 22:22:34,762] {IndeedKubernetesPodOperator.py:332} ERROR - [{'container_id': 'docker://52431fc6b1f67074d92804785a73005d83cde2091cac9c3ada2edcb806cca489',
    'image': 'alpine:latest',
    'image_id': 'docker-pullable://alpine@sha256:185518070891758909c9f839cf4ca393ee977ac378609f700f60a771a2dfe321',
    'last_state': {'running': None, 'terminated': None, 'waiting': None},
    'name': 'base',
    'ready': True,
    'restart_count': 0,
    'state': {'running': {'started_at': datetime.datetime(2020, 8, 19, 22, 21, 5, tzinfo=tzlocal())},
              'terminated': None,
              'waiting': None}}]
   ```
   
   I agree, no need to have the boolean flag. This should be a default. Let's remove that.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org