You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Allison Wang (JIRA)" <ji...@apache.org> on 2017/07/21 21:13:00 UTC

[jira] [Updated] (AIRFLOW-1325) Make Airflow Logging Backed By Elasticsearch

     [ https://issues.apache.org/jira/browse/AIRFLOW-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allison Wang updated AIRFLOW-1325:
----------------------------------
    Description: 
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers.

This change adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented.

This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified.

  was:
Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers.

This change adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented.

Having Elasticsearch as logging backend enables the development of more advanced logging related features. Those are features that will be implemented in the future:
- Streaming logs without refresh the page
- Separate logs by attempts
- Filter log with excluded phrases

This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified.


> Make Airflow Logging Backed By Elasticsearch
> --------------------------------------------
>
>                 Key: AIRFLOW-1325
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1325
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Allison Wang
>            Assignee: Allison Wang
>
> Currently, Airflow uses S3/GCS as the log storage backend. Workers, when executing the task, flushes logs into local files. When tasks are completed, those log files will be uploaded to the remote storage system like S3 or GCS. This approach makes log streaming and analysis difficult. Also when worker servers are down while executing the task, the entire task log will be lost until worker servers are recovered. It's also considered a bad practice for airflow webserver to communicate directly with worker servers.
> This change adds functionality to use customized logging backend. Users are able to configure logging backend that supports streaming logs and more advanced queries. Currently, Elasticsearch logging backend is implemented.
> This feature will also be backward compatible. It will direct users to the old logging flow if logging_backend_url is not set. A new UI will be created to support above features and old page won't be modified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)