You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2019/04/24 15:08:00 UTC

[jira] [Updated] (AIRFLOW-3449) Airflow DAG parsing logs aren't written when using S3 logging

     [ https://issues.apache.org/jira/browse/AIRFLOW-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ash Berlin-Taylor updated AIRFLOW-3449:
---------------------------------------
    Fix Version/s: 1.10.4

> Airflow DAG parsing logs aren't written when using S3 logging
> -------------------------------------------------------------
>
>                 Key: AIRFLOW-3449
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3449
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: logging, scheduler
>    Affects Versions: 1.10.0, 1.10.1
>            Reporter: James Meickle
>            Assignee: Ash Berlin-Taylor
>            Priority: Critical
>             Fix For: 1.10.4
>
>
> The default Airflow logging class outputs provides some logs to stdout, some to "task" folders, and some to "processor" folders (generated during DAG parsing). The 1.10.0 logging update broke this, but only for users who are also using S3 logging. This is because of this feature in the default logging config file:
> {code:python}
> if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
>         DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3'])
> {code}
> That replaces this functioning handlers block:
> {code:python}
>         'task': {
>             'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
>             'formatter': 'airflow',
>             'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
>             'filename_template': FILENAME_TEMPLATE,
>         },
>         'processor': {
>             'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
>             'formatter': 'airflow',
>             'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
>             'filename_template': PROCESSOR_FILENAME_TEMPLATE,
>         },
> {code}
> With this non-functioning block:
> {code:python}
>         'task': {
>             'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
>             'formatter': 'airflow',
>             'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
>             's3_log_folder': REMOTE_BASE_LOG_FOLDER,
>             'filename_template': FILENAME_TEMPLATE,
>         },
>         'processor': {
>             'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
>             'formatter': 'airflow',
>             'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
>             's3_log_folder': REMOTE_BASE_LOG_FOLDER,
>             'filename_template': PROCESSOR_FILENAME_TEMPLATE,
>         },
> {code}
> The key issue here is that both "task" and "processor" are being given a "S3TaskHandler" class to use for logging. But that is not a generic S3 class; it's actually a subclass of FileTaskHandler! https://github.com/apache/incubator-airflow/blob/1.10.1/airflow/utils/log/s3_task_handler.py#L26
> Since the template vars don't match the template string, the path to log to evaluates to garbage. The handler then silently fails to log anything at all. It is likely that anyone using a default-like logging config, plus the remote S3 logging feature, stopped getting DAG parsing logs (either locally *or* in S3) as of 1.10.0
> Commenting out the DAG parsing section of the S3 block fixed this on my instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)