You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/08 09:10:40 UTC

[GitHub] [airflow] Javier162380 opened a new issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Javier162380 opened a new issue #11345:
URL: https://github.com/apache/airflow/issues/11345


   <!--
   
   I upgraded my airflow cluster from 1.10.10 to 1.10.12 and the python version from 3.7 to 3.8. The cluster is deployed on EKS using the K8s executor and syncing logs using s3. I found a repeatable bug on the UI. When I try to get the logs from a dag with just one attempt, it tries to get the logs from the pod, and as we erased the pods by default when they finished using the airflow.cfg options, so as the pod is erased, it cannot fetch logs from the pod, and it raised an error instead of searching the logs from S3. 
   ```*** Trying to get logs (last 100 lines) from worker pod hermesemaileventshermesemaileventstransationalloadredshift-7ffa ***
   
   *** Unable to fetch logs from worker pod hermesemaileventshermesemaileventstransationalloadredshift-7ffa ***
   (403)
   Reason: Forbidden
   ```
   When the task has two attempts, I can sync the logs from the UI easily
   
   ```
   *** Reading remote log from s3://wt-prod-euwest1-zephyr/logs/adsales/adsales-shutterstock-image-report/2020-10-06T08:30:00+00:00/8.log.
   ```
   I think this bug is related to this PR. https://github.com/apache/airflow/pull/8626
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**:
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   <!-- (please include exact error messages if you can) -->
   
   **What you expected to happen**:
   
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   <!---
   
   As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
   
   If you are using kubernetes, please attempt to recreate the issue using minikube or kind.
   
   ## Install minikube/kind
   
   - Minikube https://minikube.sigs.k8s.io/docs/start/
   - Kind https://kind.sigs.k8s.io/docs/user/quick-start/
   
   If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action
   
   You can include images using the .md style of
   ![alt text](http://url/to/img.png)
   
   To record a screencast, mac users can use QuickTime and then create an unlisted youtube video with the resulting .mov file.
   
   --->
   
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   
   Any relevant logs to include? Put them here in side a detail tag:
   <details><summary>x.log</summary> lots of stuff </details>
   
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] abhishekshenoy edited a comment on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
abhishekshenoy edited a comment on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-760583797


   Hi Team ,
    
        
   We are facing similar issue on. GCP.
   
   Apache Airflow version: 2.0
   
   Kubernetes version (kubectl version):v1.18.2
   
   Environment:
   Cloud provider or hardware configuration: GCP
   OS (e.g. from /etc/os-release): Container Optimized OS.
   
   We use GCS to store our logs. Recently we are seeing an issue wherein while executing DataprocOperators (CreateCluster , SubmitJob, DeleteCluster) , though the operator has successfully executed. Airflow is unable to read the logs from the pod and write it to GCS. Which intern results in task failure but in the background the actual DataProc operator has successfully completed.(We are seeing that not all runs are having this issue but intermittently in some runs we see this.)
   
   This becomes a major issue wherein it runs the SparkSubmitJob twice resulting in duplicate data as well as in the below case it respawned another task to stop the cluster but the new task failed because the cluster was already stopped by the previous task which was marked as failed because it was unable to read logs.
   
   Exception Stack Trace of such failed task is as below
   ```
   *** Unable to read remote log from gs://hnw-airflow-prod-ba7642cd7876f2/logs/extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log
   *** 404 GET https://storage.googleapis.com/download/storage/v1/b/hnw-airflow-prod-ba7642cd7876f2/o/logs%2Fextraction_workflow%2Fstop_cluster%2F2021-01-12T10%3A00%3A00%2B00%3A00%2F1.log?alt=media: No such object: hnw-airflow-prod-ba7642cd7876f2/logs/df_raw_file_extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
   *** Trying to get logs (last 100 lines) from worker pod dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   *** Unable to fetch logs from worker pod dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b677a80e-a9f8-4794-9085-ffb74aa2f443', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 13 Jan 2021 17:58:33 GMT', 'Content-Length': '294'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464\\" not found","reason":"NotFound","details":{"name":"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464","kind":"pods"},"code":404}\n'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] himabindu07 commented on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
himabindu07 commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-788357999


   Hi I verified this issues using 2.0.1, unable to reproduceble and can you please try with 2.0.1.
   Thanks 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Reddy1990 commented on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
Reddy1990 commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-874042330


   Hi @himabindu07  in Version: v2.0.2 I'm getting similar issues. The pod is getting deleted immediately once its has ran. As per the error message its correct but how can I retrieve the logs?  
   
   *** Trying to get logs (last 100 lines) from worker pod filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4 ***
   
   *** Unable to fetch logs from worker pod filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4 ***
   (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Mon, 05 Jul 2021 11:31:14 GMT', 'Content-Length': '274'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4\\" not found","reason":"NotFound","details":{"name":"filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4","kind":"pods"},"code":404}\n'
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-1036815958


   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #11345:
URL: https://github.com/apache/airflow/issues/11345


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-1005973300


   Is this issue reproducible in latest ariflow version and latest kubernetes provider?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Reddy1990 commented on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
Reddy1990 commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-874042330


   Hi @himabindu07  in Version: v2.0.2 I'm getting similar issues. The pod is getting deleted immediately once its has ran. As per the error message its correct but how can I retrieve the logs?  
   
   *** Trying to get logs (last 100 lines) from worker pod filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4 ***
   
   *** Unable to fetch logs from worker pod filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4 ***
   (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Mon, 05 Jul 2021 11:31:14 GMT', 'Content-Length': '274'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4\\" not found","reason":"NotFound","details":{"name":"filecopierfilecopier.b3d90ee2f9fc4ef7b5ee7edda63d55f4","kind":"pods"},"code":404}\n'
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-1030449112


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] abhishekshenoy commented on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

Posted by GitBox <gi...@apache.org>.
abhishekshenoy commented on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-760583797


   Hi Team ,
    
        
   We are facing similar issue on. GCP.
   
   Apache Airflow version: 2.0
   
   Kubernetes version (kubectl version):v1.18.2
   
   Environment:
   Cloud provider or hardware configuration: GCP
   OS (e.g. from /etc/os-release): Container Optimized OS.
   
   We use GCS to store our logs. Recently we are seeing an issue wherein while executing DataprocOperators (CreateCluster , SubmitJob, DeleteCluster) , though the operator has successfully executed. Airflow is unable to read the logs from the pod and write it to GCS. Which intern results in task failure but in the background the actual DataProc operator has successfully completed.
   
   This becomes a major issue wherein it runs the SparkSubmitJob twice resulting in duplicate data as well as in the below case it respawned another task to stop the cluster but the new task failed because the cluster was already stopped by the previous task which was marked as failed because it was unable to read logs.
   
   Exception Stack Trace of such failed task is as below
   ```
   *** Unable to read remote log from gs://hnw-airflow-prod-ba7642cd7876f2/logs/extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log
   *** 404 GET https://storage.googleapis.com/download/storage/v1/b/hnw-airflow-prod-ba7642cd7876f2/o/logs%2Fextraction_workflow%2Fstop_cluster%2F2021-01-12T10%3A00%3A00%2B00%3A00%2F1.log?alt=media: No such object: hnw-airflow-prod-ba7642cd7876f2/logs/df_raw_file_extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
   *** Trying to get logs (last 100 lines) from worker pod dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   *** Unable to fetch logs from worker pod dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b677a80e-a9f8-4794-9085-ffb74aa2f443', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 13 Jan 2021 17:58:33 GMT', 'Content-Length': '294'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464\\" not found","reason":"NotFound","details":{"name":"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464","kind":"pods"},"code":404}\n'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org