You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/04/06 05:37:59 UTC

[GitHub] [airflow] KevinYang21 commented on issue #5050: [AIRFLOW-4251] Instrument DagRun schedule delay

KevinYang21 commented on issue #5050: [AIRFLOW-4251] Instrument DagRun schedule delay
URL: https://github.com/apache/airflow/pull/5050#issuecomment-480476229

IMO this can be useful when we're analyzing delays, so we know where the delay comes from, scheduler or executor. On the other hand, I think the story would be more comprehensive if we have task instance level metrics--for a DAG with 1k tasks we now get only 1 data point per dag run just like a DAG with 1 task, which make it less representitive. In th end people may be more interested about task instance delays instead of DAG run delay.

About the performance, I think it is not too bad to have it here since we do it in the dag parsing subprocess so it is effectively O(# DAG/# subprocesses). If data points coming from TI level stat are too many maybe we can try some random sampling? Also if it is TI level stat intuitively it would be in the main scheduler loop which performance matters more.

All this reminds me about some old discussion we had earlier. If we do want to start having a story around scheduling performance, we might need to consider the parsing time of DAG files and even kick that out from our metrics--otherwise if I as a user introduce u a large # of large DAG files then ur metrics will spike.

Just some random thoughts around this topic :D

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services