You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Soeren Laursen (Jira)" <ji...@apache.org> on 2020/02/14 08:11:00 UTC

[jira] [Closed] (AIRFLOW-2195) Task get terminatet before timeout reached

     [ https://issues.apache.org/jira/browse/AIRFLOW-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Soeren Laursen closed AIRFLOW-2195.
-----------------------------------
    Resolution: Fixed

> Task get terminatet before timeout reached
> ------------------------------------------
>
>                 Key: AIRFLOW-2195
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2195
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>         Environment: linux, ubuntu 16.04
>            Reporter: Soeren Laursen
>            Priority: Major
>
> We have a task which is running for more than 6 hours. It is a backup job that uses dirvish.
>  
> After precise 6 hours it gets terminated:
> [2018-03-07 11:55:31,928] \{base_task_runner.py:98} INFO - Subtask: [2018-03-07 11:55:31,928] \{bash_operator.py:80} INFO - Temporary script location: /tmp/airflowtmpwx57pf7q//tmp/airflowtmpwx57pf7q/Backup_of_arch-fcoo-getm-ns1cdvo33ldi [2018-03-07 11:55:31,928] \{base_task_runner.py:98} INFO - Subtask: [2018-03-07 11:55:31,928] \{bash_operator.py:88} INFO - Running command: sudo /backup/dirvish/scripts/airflow-dirvish.sh arch-fcoo-getm-ns1c [2018-03-07 11:55:31,933] \{base_task_runner.py:98} INFO - Subtask: [2018-03-07 11:55:31,933] \{bash_operator.py:97} INFO - Output: [2018-03-07 17:57:56,378] \{cli.py:374} INFO - Running on host storage-bck02 [2018-03-07 17:57:56,421] \{models.py:1190} INFO - Dependencies not met for <TaskInstance: Dirvish_on_storage-bck02.Dirvish_job_FCOO_GETM.Backup_of_arch-fcoo-getm-ns1c 2018-03-06 07:00:00 [running]>, dependency 'Task Instance Not Already Running' FAILED: Task is already running, it started on 2018-03-07 10:55:30.946641. [2018-03-07 17:57:56,421] \{models.py:1190} INFO - Dependencies not met for <TaskInstance: Dirvish_on_storage-bck02.Dirvish_job_FCOO_GETM.Backup_of_arch-fcoo-getm-ns1c 2018-03-06 07:00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run. [2018-03-07 17:58:05,261] \{helpers.py:233} INFO - Terminating descendant processes of ['/usr/bin/python3 /usr/local/bin/airflow run Dirvish_on_storage-bck02.Dirvish_job_FCOO_GETM Backup_of_arch-fcoo-getm-ns1c 2018-03-06T07:00:00 --job_id 32501 --raw -sd /home/airflow/airflow/airflow/dags/dirvish_on_storage-bck02.py'] PID: 5169 [2018-03-07 17:58:05,261] \{helpers.py:237} INFO - Terminating descendant process ['bash', '/tmp/airflowtmpwx57pf7q/Backup_of_arch-fcoo-getm-ns1cdvo33ldi'] PID: 5180 [2018-03-07 17:58:05,268] \{helpers.py:195} ERROR - b''
> The dirvish scripts continues in the background af finish as it should, but task that depends on the backup jobs stops.
> Even if:
> execution_timeout=None
> I have made a small test dag to test execution_timeout, it works as expected. Tasks get stopped if they reach the timeout. Bash script that use sleep.
> My college has found a reference:
> https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_celery.py
> There we have visibility_timeout=21600
> In the default airflow.cfg it is described as:
> [celery_broker_transport_options]
> # The visibility timeout defines the number of seconds to wait for the worker
> # to acknowledge the task before the message is redelivered to another worker.
> # Make sure to increase the visibility timeout to match the time of the longest
> # ETA you're planning to use. Especially important in case of using Redis or SQS
> visibility_timeout = 21600
> Is the problem that our tasks are not acknowledged somehow is celery?
> best regards
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)