You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "aoi-lucario (JIRA)" <ji...@apache.org> on 2018/12/10 10:54:00 UTC
[jira] [Created] (AIRFLOW-3491) job.latest_heartbeat become to the
same value as job.start_date when job has done
aoi-lucario created AIRFLOW-3491:
------------------------------------
Summary: job.latest_heartbeat become to the same value as job.start_date when job has done
Key: AIRFLOW-3491
URL: https://issues.apache.org/jira/browse/AIRFLOW-3491
Project: Apache Airflow
Issue Type: Bug
Components: database
Affects Versions: 1.8.1, 1.7.1
Environment: # airflow.cfg
[scheduler]
JOB_HEARTBEAT_SEC = 5
Reporter: aoi-lucario
At table {{job}} in DB of airflow, when a job has done, this job's {{latest_heartbeat}} will become to the same value as {{start_date}}
It a bug, or just by design?
h3. Detail:
in DB of airflow, it's always like this:
h5. when job is running, latest_heartbeat is refreshed normally:
{code:java}
SELECT id, state, start_date, end_date, latest_heartbeat, (latest_heartbeat - start_date) as diff_sec
FROM job
WHERE state='RUNNING'
ORDER BY start_date LIMIT 5;{code}
{code:java}
| id | state | start_date | end_date | latest_heartbeat | diff_sec |
+---------+---------+---------------------+----------+---------------------+----------+
| 1837272 | running | 2018-12-10 18:23:57 | NULL | 2018-12-10 18:37:02 | 1345 |
| 1837270 | running | 2018-12-10 18:23:57 | NULL | 2018-12-10 18:37:02 | 1345 |
| 1837273 | running | 2018-12-10 18:25:58 | NULL | 2018-12-10 18:37:03 | 1145 |
| 1837275 | running | 2018-12-10 18:25:58 | NULL | 2018-12-10 18:37:03 | 1145 |
| 1837279 | running | 2018-12-10 18:25:59 | NULL | 2018-12-10 18:37:04 | 1145 |
{code}
h5. but when job has done, latest_heartbeat will return to the init value (same as start_date) :
{code:java}
SELECT id, state, start_date, end_date, latest_heartbeat, (latest_heartbeat - start_date) as diff_sec
FROM job
WHERE id IN (1837272, 1837270, 1837273, 1837275, 1837279)
ORDER BY start_date;{code}
{code:java}
| id | state | start_date | end_date | latest_heartbeat | diff_sec |
+---------+---------+---------------------+----------+---------------------+----------+
| 1837270 | success | 2018-12-10 18:23:57 | 2018-12-10 18:39:07 | 2018-12-10 18:23:57 | 0 |
| 1837272 | success | 2018-12-10 18:23:57 | 2018-12-10 18:39:07 | 2018-12-10 18:23:57 | 0 |
| 1837273 | success | 2018-12-10 18:25:58 | 2018-12-10 18:41:08 | 2018-12-10 18:25:58 | 0 |
| 1837275 | success | 2018-12-10 18:25:58 | 2018-12-10 18:41:08 | 2018-12-10 18:25:58 | 0 |
| 1837279 | success | 2018-12-10 18:25:59 | 2018-12-10 18:41:09 | 2018-12-10 18:25:59 | 0 |
{code}
------------------
here method heartbeat() just update latest_heartbeat at db, but not update object him self:
[https://github.com/apache/incubator-airflow/blob/e947c6c034238ede29a6c8f51307458d3e40c1b5/airflow/jobs.py#L180-L184]
and when job done, session will merge the objects of him self which has init value of latest_heartbeat:
[https://github.com/apache/incubator-airflow/blob/e947c6c034238ede29a6c8f51307458d3e40c1b5/airflow/jobs.py#L194-L215|https://github.com/apache/incubator-airflow/blob/e947c6c034238ede29a6c8f51307458d3e40c1b5/airflow/jobs.py#L202-L215]
then latest_heartbeat return to init value at DB.
I think {{latest_heartbeat}} is a very important information on debugging where some enviroment's logging level be set on INFO.
So I don't understand why do this. could someone explain to it?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)