You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/02/21 21:50:44 UTC

[jira] [Commented] (AIRFLOW-880) Fix remote log functionality inconsistencies for Webservers

    [ https://issues.apache.org/jira/browse/AIRFLOW-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876790#comment-15876790 ] 

ASF subversion and git services commented on AIRFLOW-880:
---------------------------------------------------------

Commit 974b75e93ec827b5f45c273f141fee9e188d46ee in incubator-airflow's branch refs/heads/master from [~aoen]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=974b75e ]

[AIRFLOW-880] Make webserver serve logs in a sane way for remote logs

There are two major problems with remote logs in
Airflow right now:
1. Lack of Complete Logs: Remote logs should be
the default instead of the log that is only loaded
if the local log is not present, because the
remote log will have the logs for all of the tries
of a task, whereas the local log is only
guaranteed to have the most recent one
2. Lack of Consistency: The logs returned will
always be the same from all the webservers (right
now different logs can be returned if a webserver
has a log vs doesn't, and there can be different
logs between webservers that have the log).
Right now log functionality is not consistent when
it comes to remote logs.

This PR addresses these issues by ALWAYS reading
from remote logs and then also reading logs from
worker hosts if the task is already running (to
get in-flight logs). The one issue with this PR is
that if a task is running on a worker it already
ran on, then you will get duplicate logs for all
of the previous runs of the task that already
completed (delimited by something like "***
Getting remote logs" "*** Getting logs on local
worker"). This can be fixed later (either by
streaming logs to the log server or by creating a
proper abstraction for multiple task instance
runs), and is still better than the current
behavior (duplicate info is better than omitting
previous task instance logs from the webserver
log).

Testing Done:
Tested on staging cluster:
- Task instance doesn't exist
- Task instance is running and has previous remote
log
- Task instance is running for first time
- Task instance has completed and has remote log

Closes #2086 from aoen/ddavydov/fix_s3_logging


> Fix remote log functionality inconsistencies for Webservers
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-880
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-880
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: webserver
>            Reporter: Dan Davydov
>            Assignee: Dan Davydov
>
> Right now log functionality is not consistent when it comes to remote logs.
> 1. Lack of Complete Logs: Remote logs should be the default instead of the log that is only loaded if the local log is not present, because the remote log will have the logs for all of the tries of a task, whereas the local log is only guaranteed to have the most recent one
> 2. Lack of Consistency: The logs returned will always be the same from all the webservers (right now different logs can be returned if a webserver has a log vs doesn't, and there can be different logs between webservers that have the log).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)