You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Jacob Ward (Jira)" <ji...@apache.org> on 2019/11/06 17:06:00 UTC

[jira] [Updated] (AIRFLOW-5859) Tasks locking and heartbeat warnings

     [ https://issues.apache.org/jira/browse/AIRFLOW-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacob Ward updated AIRFLOW-5859:
--------------------------------
    Description: 
Having two potentially related issues.

Issue 1:

Some of my tasks (has only been PythonOperators so far) are starting and then doing nothing. I had a task that usually executes in 10-30 minutes running for over 24hrs without any error messages in the logs (other than the heartbeat warnings shown below) and without failing.

So far this has only happened to tasks inside sub-dags, not sure if that's to do with it?

The logs for the sub-dag show: 


{code:}
[2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7
[2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,830] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s
[2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7
[2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,859] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s
{code}
repeated during that 24hr period.


Issue 2:

In all of the logs this warning message is printed every 5 seconds:


{code:}
[2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 14:25:29,465] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code}


  was:
Having two potentially related issues.

Issue 1:

Some of my tasks (has only been PythonOperators so far) are starting and then doing nothing. I had a task that usually executes in 10-30 minutes running for over 24hrs without any error messages in the logs (other than the heartbeat warnings shown below) and without failing.

So far this has only happened to tasks inside sub-dags, not sure if that's to do with it?

The logs for the sub-dag show: 


{code:java}
[2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7
[2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,830] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s
[2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7
[2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,859] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s
{code}
repeated during that 24hr period.


Issue 2:

In all of the logs this warning message is printed every 5 seconds:

{{}}
{code:java}
[2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 14:25:29,465] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code}
{{}}


> Tasks locking and heartbeat warnings
> ------------------------------------
>
>                 Key: AIRFLOW-5859
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5859
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>    Affects Versions: 1.10.6
>         Environment: Airflow using LocalExecutor and Postgres
>            Reporter: Jacob Ward
>            Priority: Major
>
> Having two potentially related issues.
> Issue 1:
> Some of my tasks (has only been PythonOperators so far) are starting and then doing nothing. I had a task that usually executes in 10-30 minutes running for over 24hrs without any error messages in the logs (other than the heartbeat warnings shown below) and without failing.
> So far this has only happened to tasks inside sub-dags, not sure if that's to do with it?
> The logs for the sub-dag show: 
> {code:}
> [2019-11-05 16:41:34,364] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,364] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7
> [2019-11-05 16:41:34,831] {logging_mixin.py:112} INFO - [2019-11-05 16:41:34,830] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986693 s
> [2019-11-05 16:41:39,376] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,376] {backfill_job.py:363} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 7 | succeeded: 5 | running: 1 | failed: 0 | skipped: 1 | deadlocked: 0 | not ready: 7
> [2019-11-05 16:41:39,859] {logging_mixin.py:112} INFO - [2019-11-05 16:41:39,859] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.986141 s
> {code}
> repeated during that 24hr period.
> Issue 2:
> In all of the logs this warning message is printed every 5 seconds:
> {code:}
> [2019-11-06 14:25:29,466] {logging_mixin.py:112} INFO - [2019-11-06 14:25:29,465] {local_task_job.py:124} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.987354 s {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)