You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "aoi-lucario (JIRA)" <ji...@apache.org> on 2018/12/10 10:54:00 UTC

[jira] [Created] (AIRFLOW-3491) job.latest_heartbeat become to the same value as job.start_date when job has done

aoi-lucario created AIRFLOW-3491:
------------------------------------

             Summary: job.latest_heartbeat become to the same value as job.start_date when job has done
                 Key: AIRFLOW-3491
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3491
             Project: Apache Airflow
          Issue Type: Bug
          Components: database
    Affects Versions: 1.8.1, 1.7.1
         Environment: # airflow.cfg
[scheduler]
JOB_HEARTBEAT_SEC = 5
            Reporter: aoi-lucario


At table {{job}} in DB of airflow, when a job has done, this job's {{latest_heartbeat}} will become  to the same value as {{start_date}}
 It a bug, or just by design?
h3. Detail:

in DB of airflow, it's always like this:
h5. when job is running, latest_heartbeat is refreshed normally:
{code:java}
SELECT id, state, start_date, end_date, latest_heartbeat, (latest_heartbeat - start_date) as diff_sec
FROM job
WHERE state='RUNNING'
ORDER BY start_date LIMIT 5;{code}
{code:java}
| id      | state   | start_date          | end_date | latest_heartbeat    | diff_sec |
+---------+---------+---------------------+----------+---------------------+----------+
| 1837272 | running | 2018-12-10 18:23:57 | NULL | 2018-12-10 18:37:02 | 1345 |
| 1837270 | running | 2018-12-10 18:23:57 | NULL | 2018-12-10 18:37:02 | 1345 |
| 1837273 | running | 2018-12-10 18:25:58 | NULL | 2018-12-10 18:37:03 | 1145 |
| 1837275 | running | 2018-12-10 18:25:58 | NULL | 2018-12-10 18:37:03 | 1145 |
| 1837279 | running | 2018-12-10 18:25:59 | NULL | 2018-12-10 18:37:04 | 1145 |
{code}
h5. but when job has done, latest_heartbeat will return to the init value (same as start_date) :
{code:java}
SELECT id, state, start_date, end_date, latest_heartbeat, (latest_heartbeat - start_date) as diff_sec
FROM job
WHERE id IN (1837272, 1837270, 1837273, 1837275, 1837279)
ORDER BY start_date;{code}
{code:java}
| id      | state   | start_date          | end_date | latest_heartbeat    | diff_sec |
+---------+---------+---------------------+----------+---------------------+----------+
| 1837270 | success | 2018-12-10 18:23:57 | 2018-12-10 18:39:07 | 2018-12-10 18:23:57 | 0 |
| 1837272 | success | 2018-12-10 18:23:57 | 2018-12-10 18:39:07 | 2018-12-10 18:23:57 | 0 |
| 1837273 | success | 2018-12-10 18:25:58 | 2018-12-10 18:41:08 | 2018-12-10 18:25:58 | 0 |
| 1837275 | success | 2018-12-10 18:25:58 | 2018-12-10 18:41:08 | 2018-12-10 18:25:58 | 0 |
| 1837279 | success | 2018-12-10 18:25:59 | 2018-12-10 18:41:09 | 2018-12-10 18:25:59 | 0 |
{code}
------------------

here method heartbeat() just update latest_heartbeat at db, but not update object him self:

[https://github.com/apache/incubator-airflow/blob/e947c6c034238ede29a6c8f51307458d3e40c1b5/airflow/jobs.py#L180-L184]

and when job done, session will merge the objects of him self which has init value of latest_heartbeat:
 [https://github.com/apache/incubator-airflow/blob/e947c6c034238ede29a6c8f51307458d3e40c1b5/airflow/jobs.py#L194-L215|https://github.com/apache/incubator-airflow/blob/e947c6c034238ede29a6c8f51307458d3e40c1b5/airflow/jobs.py#L202-L215]

then latest_heartbeat return to init value at DB.

I think {{latest_heartbeat}} is a very important information on debugging where some enviroment's logging level be set on INFO.
 So I don't understand why do this. could someone explain to it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)