You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/15 01:31:11 UTC

[GitHub] [airflow] abhishekshenoy edited a comment on issue #11345: Fix Sync Logs from S3 in Airflow 1.10.12 with the Kubernetes Executor

abhishekshenoy edited a comment on issue #11345:
URL: https://github.com/apache/airflow/issues/11345#issuecomment-760583797


   Hi Team ,
    
        
   We are facing similar issue on. GCP.
   
   Apache Airflow version: 2.0
   
   Kubernetes version (kubectl version):v1.18.2
   
   Environment:
   Cloud provider or hardware configuration: GCP
   OS (e.g. from /etc/os-release): Container Optimized OS.
   
   We use GCS to store our logs. Recently we are seeing an issue wherein while executing DataprocOperators (CreateCluster , SubmitJob, DeleteCluster) , though the operator has successfully executed. Airflow is unable to read the logs from the pod and write it to GCS. Which intern results in task failure but in the background the actual DataProc operator has successfully completed.(We are seeing that not all runs are having this issue but intermittently in some runs we see this.)
   
   This becomes a major issue wherein it runs the SparkSubmitJob twice resulting in duplicate data as well as in the below case it respawned another task to stop the cluster but the new task failed because the cluster was already stopped by the previous task which was marked as failed because it was unable to read logs.
   
   Exception Stack Trace of such failed task is as below
   ```
   *** Unable to read remote log from gs://hnw-airflow-prod-ba7642cd7876f2/logs/extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log
   *** 404 GET https://storage.googleapis.com/download/storage/v1/b/hnw-airflow-prod-ba7642cd7876f2/o/logs%2Fextraction_workflow%2Fstop_cluster%2F2021-01-12T10%3A00%3A00%2B00%3A00%2F1.log?alt=media: No such object: hnw-airflow-prod-ba7642cd7876f2/logs/df_raw_file_extraction_workflow/stop_cluster/2021-01-12T10:00:00+00:00/1.log: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
   *** Trying to get logs (last 100 lines) from worker pod dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   *** Unable to fetch logs from worker pod dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464 ***
   (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 'b677a80e-a9f8-4794-9085-ffb74aa2f443', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 13 Jan 2021 17:58:33 GMT', 'Content-Length': '294'})
   HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464\\" not found","reason":"NotFound","details":{"name":"dfrawfileextractionworkflowstopcluster-6f91dc0d7c26450c980c5464","kind":"pods"},"code":404}\n'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org