You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Allison Wang (JIRA)" <ji...@apache.org> on 2017/10/07 05:25:00 UTC

[jira] [Comment Edited] (AIRFLOW-1667) Remote log handlers don't upload logs

    [ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195581#comment-16195581 ] 

Allison Wang edited comment on AIRFLOW-1667 at 10/7/17 5:24 AM:
----------------------------------------------------------------

I agree that we shouldn't rely on the logging module's close to upload the log since we have no control when it's called. Instead of calling close, we could explicitly invoke a post_task_run method in handlers that handles any additional clean up/operations upon task completion. This change only requires modifying a small amount of current code. I am not exactly sure how the to upload the log to remote storage like S3/GCS periodically upon task execution, but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g ElasticSearch) in real time. 


was (Author: allisonwang):
I agree that we shouldn't rely on the logging module's close to upload the log since we have no control when it's called. Instead of calling close, we could explicitly invoke a post_task_run method that handles any additional clean up/operations upon task completion. This change only requires modifying a small amount of current code. I am not exactly sure how the to upload the log to remote storage like S3/GCS periodically upon task execution, but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g ElasticSearch) in real time. 

> Remote log handlers don't upload logs
> -------------------------------------
>
>                 Key: AIRFLOW-1667
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: logging
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: Arthur Vigil
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log handlers (S3TaskHandler and GCSTaskHandler) only upload on close (flush is left at the default implementation provided by `logging.FileHandler`). A handler will be closed on process exit by `logging.shutdown()`, but depending on the Executor used worker processes may not regularly shutdown, and can very likely persist between tasks. This means during normal execution log files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but without hitting the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)