You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Michal TOMA (JIRA)" <ji...@apache.org> on 2016/05/30 13:46:12 UTC
[jira] [Created] (AIRFLOW-194) Task kangs in up_for_retry state for
very long
Michal TOMA created AIRFLOW-194:
-----------------------------------
Summary: Task kangs in up_for_retry state for very long
Key: AIRFLOW-194
URL: https://issues.apache.org/jira/browse/AIRFLOW-194
Project: Apache Airflow
Issue Type: Bug
Components: scheduler
Affects Versions: Airflow 1.7.0
Environment: Airflow 1.7.0 on RHEL 7 and OpenSuse 13.2
Reporter: Michal TOMA
I can observe this problem on 2 separate Airflow installations.
The symptoms are:
- One (and only one) task stays in up_for_retry state even when the last of the retries finished with an OK stays.
- It is yellow in the tree view.
- The execution somehow resumes several hours later automatically
- It seems (not a certitude) related to a mode when the task execution is "lagging" behind normal execution.
Here is an example of a task that should run every hour "0 * * * *":
Current date : 2016-05-30T15:31:00+0200
----- Run 1 ------
Run ID: 2016-05-05T21:00:00
Task start: 2015-05-30T07:38:XX.XXX
Task end: 2015-05-30T08:23:XX.XXX
Marked as success
----- Run 2 ------
Run ID: 2016-05-05T22:00:00
Task start: 2015-05-30T11:10:XX.XXX
Task end: 2015-05-30T11:56:XX.XXX
Marked as success
----- Run 3 ------
Run ID: 2016-05-05T23:00:00
Task start: 2015-05-30T11:56:XX.XXX
Task end: 2015-05-30T12:41:XX.XXX
Marked as success
----- Run 4 ------
Run ID: 2016-05-06T00:00:00
Task start: 2015-05-30T15:12:XX.XXX
Task end: (Still running now)
Marked as success
There are nearly 2 hours between Run-1 and Run-2, and nearly 2 hours as well between Run-3 and Run-4.
Only Run-3 starts immediately after the end of Run-2 what is the expected behavior as the Runs are very late on schedule (Run ID is 2016-05-06 while we are on 2016-05-30)
This is a high priority issue for our setup. I could try to dig more in depth into this problem but I have no idea where to look to debug this issue.
Any pointers would be more than welcome.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)