You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/17 19:11:11 UTC

[GitHub] [airflow] rafidka edited a comment on issue #13026: Scheduler heartbeat adds one second if it is is configured to be 5 or less

rafidka edited a comment on issue #13026:
URL: https://github.com/apache/airflow/issues/13026#issuecomment-747639539


   I repeated the experiment on Airflow 2.0.0RC2 and produced more statistical data which you can find in the Excel sheet below, but here is a summary:
   
   scheduler_heartbeat_sec value|Average Frequency of scheduler_heartbeat metric
   -|-
   1|2.52057172
   2|3.85960122
   3|4.76818162
   5|6.49515588
   10|11.36376658
   30|30.89824894
   
   I tried this on the same machine I mentioned above (Amazon r5.4xlarge machine) so it is pretty powerful and I did confirm there isn't much load on the CPU (below 10% which is mainly the use of Airflow). I can retry this on a personal laptop if you feel you don't have strong confidence about results generated from a single machine (which, admittedly, I also feel so.)
   
   I cannot tell whether this is just a metrics issue or not, but I did look at the code and I do feel it is an actual scheduling issue, not just metrics (though I must admit my understanding of Airflow code base is still limited.) In my opinion, this justifies some investigation to see what is going on. In particular, I would like to suggest:
   
   1. Investigate whether this is just a metrics issue or indeed an issue with the scheduler.
   2. If it is a metric issue, I think it is important to fix. The scheduler_heartbeat is an important metric that can be used to judge the health of the system.
   3. If it is not just a metric issue, then that's probably even more important 😊 
   4. Admittedly, the higher the scheduler_heartbet_sec value, the more accurate the metric is (which suggests this is an actual scheduling issue not just metrics). So if no fix is intended, then at least the default value in airflow.cfg should be updated to, say, 30 seconds. In fact, if no fix is intended, I would even suggest putting a minimum value on `scheduler_heartbat_sec` config, or at least log a warning if the user specify a low value; there isn't much point in allowing the user to specify 5 seconds when they will get an average of 6.5 instead.
   
   I can help with this investigation if you agree with me that it is important to do (though probably won't be able to do so before the new year). Otherwise, feel free to resolve (though I still think at least point 4 above is important if we think that scheduler interval accuracy is not important.)
   
   ## Statistical Data for Different Runs
   
   Below is a snapshot of the Excel sheet I mentioned above. I can upload the Excel sheet file itself if you like, in which case please advise where I should upload it to.
   
   ![image](https://user-images.githubusercontent.com/442447/102529079-44c31c00-4054-11eb-81c5-08a4a17efac1.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org