You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/08 07:28:34 UTC

[GitHub] [airflow] RyanSiu1995 opened a new issue #8780: Missing logs if the pod was killed by Kubernetes

RyanSiu1995 opened a new issue #8780:
URL: https://github.com/apache/airflow/issues/8780


   **Apache Airflow version**: 1.10.10
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): 
   ```
   > kubectl version
   Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.7", GitCommit:"8fca2ec50a6133511b771a11559e24191b1aa2b4", GitTreeState:"clean", BuildDate:"2019-09-18T14:47:22Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
   Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.27", GitCommit:"145f9e21a4515947d6fb10819e5a336aff1b6959", GitTreeState:"clean", BuildDate:"2020-02-21T18:01:40Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}
   ```
   
   **Environment**:
   - GKE
   - Linux
   - `Linux pod-name-597d575d8-xrmkn 4.14.138+ #1 SMP Tue Sep 3 02:58:08 PDT 2019 x86_64 GNU/Linu`
   - `pip install`
   
   **What happened**:
   We used Kubernetes executor to execute our pipeline with a GCS remote log storage. We encountered the log was not found in the GCS bucket for some DAGs.
   It shares the same error like this.
   ```
   *** Unable to read remote log from gs://xxxx/zzz_test_on_hold/on_hold/2020-05-08T07:06:18.991443+00:00/1.log
   *** 404 GET https://storage.googleapis.com/download/storage/v1/b/xxxx/o/zzz_test_on_hold%2Fon_hold%2F2020-05-08T07%3A06%3A18.991443%2B00%3A00%2F1.log?alt=media: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
   ```
   And we have gone into the log on stdout and found that there was error like this.
   ```
   Please make sure that airflow[gcp] is installed and the GCS connection exists.
   Could not write logs to gs://xxxx/zzz_test_on_hold/on_hold/2020-05-08T07:06:18.991443+00:00/1.log: 'NoneType' object has no attribute 'upload'
   ```
   
   **What you expected to happen**:
   I expected the logs can still upload to GCS if the pod was unexpectedly killed by Kubernetes.
   
   **How to reproduce it**:
   Write a DAG with the following function.
   ```python
   def on_hold():
       """On hold the process"""
       while True:
           logger.warn("this is on holding now...")
           time.sleep(1000)
   ```
   When the task pod is spawned, kill the pod with kubectl.
   The log then will not be able to be uploaded.
   
   **Anything else we need to know**:
   Nope


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RyanSiu1995 commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
RyanSiu1995 commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-625729851


   I have tried to remove all the try except block and get this stacktrace.
   ```
   Traceback (most recent call last):
     File "/usr/local/lib/python3.6/logging/__init__.py", line 1919, in shutdown
       h.close()
     File "/data/clover_pipeline/src/airflow/airflow/utils/log/gcs_task_handler.py", line 81, in close
       self.gcs_write(log, remote_loc)
     File "/data/clover_pipeline/src/airflow/airflow/utils/log/gcs_task_handler.py", line 154, in gcs_write
       self.hook.upload(bkt, blob, tmpfile.name)
     File "/usr/local/lib/python3.6/site-packages/cached_property.py", line 35, in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File "/data/clover_pipeline/src/airflow/airflow/utils/log/gcs_task_handler.py", line 48, in hook
       google_cloud_storage_conn_id=remote_conn_id
     File "/data/clover_pipeline/src/airflow/airflow/contrib/hooks/gcs_hook.py", line 43, in __init__
       delegate_to)
     File "/data/clover_pipeline/src/airflow/airflow/contrib/hooks/gcp_api_base_hook.py", line 131, in __init__
       self.extras = self.get_connection(self.gcp_conn_id).extra_dejson
     File "/data/clover_pipeline/src/airflow/airflow/hooks/base_hook.py", line 84, in get_connection
       conn = random.choice(list(cls.get_connections(conn_id)))
     File "/data/clover_pipeline/src/airflow/airflow/hooks/base_hook.py", line 80, in get_connections
       return secrets.get_connections(conn_id)
     File "/data/clover_pipeline/src/airflow/airflow/secrets/__init__.py", line 52, in get_connections
       conn_list = secrets_backend.get_connections(conn_id=conn_id)
     File "/data/clover_pipeline/src/airflow/airflow/utils/db.py", line 72, in wrapper
       with create_session() as session:
     File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
       return next(self.gen)
     File "/data/clover_pipeline/src/airflow/airflow/utils/db.py", line 42, in create_session
       session = settings.Session()
   TypeError: 'NoneType' object is not callable
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RyanSiu1995 commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
RyanSiu1995 commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-625756560


   @msumit Thank you for your reply.
   I am thinking a change like this.
   We reinitiate the database session dynamically in this util files.
   https://github.com/apache/airflow/blob/1.10.10/airflow/utils/db.py#L38-L50
   The code will be like this.
   ```
   def create_session():
       """
       Contextmanager that will create and teardown a session.
       """
       have_session = True
       if settings.Session == None:
           settings.configure_orm(disable_connection_pool=True)
       session = settings.Session()
       try:
           yield session
           session.commit()
       except Exception:
           session.rollback()
           raise
       finally:
           session.close()
           if not have_session:
               settings.dispose_orm()
   ```
   This actually resolves in my case. But not pretty sure if it is proper to initiate a dynamic session with the `create_session()` function.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] msumit commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
msumit commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-625768894


   @RyanSiu1995 so it means that the worker pod is trying to upload the logs to GCS when being killed, but failing due to some issue in getting DB session?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RosterIn commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
RosterIn commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-1072309308


   I do see from time to time on Airflow 2.2.3 (Kubernetes) that logs are missing with similar message.
   I wonder if this issue is the cause for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RosterIn commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
RosterIn commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-1072309308


   I do see from time to time on Airflow 2.2.3 (Kubernetes) that logs are missing with similar message.
   I wonder if this issue is the cause for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RyanSiu1995 commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
RyanSiu1995 commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-625769729


   Yes, that's exactly what happen with my case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] msumit commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
msumit commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-625748213


   AFAIK the tasks logs are uploaded to remote storage once the task is completed, be it Kubernetes, Celery, or even Local executors. One simple solution is to make use of multi read-writes Persistent volume groups to store these logs, so the logs won't be lost even if pod is killed in between. 
   Not sure if Kubernetes has some way to catch the pod kill request and execute some code before actually killing it. In that case, you can write some code to upload the logs manually before the pod goes away. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RyanSiu1995 commented on issue #8780: Missing logs if the pod was killed by Kubernetes

Posted by GitBox <gi...@apache.org>.
RyanSiu1995 commented on issue #8780:
URL: https://github.com/apache/airflow/issues/8780#issuecomment-625761275


   But, to be honest, it is pretty odd that the function `create_session()` will not actually have any action of creation. It just tries to return the session object in `settings.py`. 
   So, this change actually makes sense to me through giving back the creation function to `create_session()`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org