You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Bolke de Bruin (JIRA)" <ji...@apache.org> on 2017/12/19 20:05:00 UTC

[jira] [Resolved] (AIRFLOW-1916) S3 Task logs end up duplicated in the file.

     [ https://issues.apache.org/jira/browse/AIRFLOW-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bolke de Bruin resolved AIRFLOW-1916.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Issue resolved by pull request #2880
[https://github.com/apache/incubator-airflow/pull/2880]

> S3 Task logs end up duplicated in the file.
> -------------------------------------------
>
>                 Key: AIRFLOW-1916
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1916
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Ash Berlin-Taylor
>             Fix For: 1.9.1, 2.0.0
>
>
> If using the S3TaskHandler logger the contents of the log file in the S3 bucket end up duplicated - once from when `airflow run --raw` finalizes the task, and again from when `airflow run --local` finalizes it's logger.
> Log from the UI included below. The file on disk does not have the repetition.
> There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not meant to upload, but something is.
> {noformat}
> *** Reading remote log from s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log.
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value was: None
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for <TaskInstance: tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing <Task(PythonOperator): test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value was: None
> [2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:54,922] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): sts.amazonaws.com
> [2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:55,498] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com
> [2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:55,784] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)