You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/13 19:11:58 UTC

[GitHub] [airflow] vitaly-krugl opened a new issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

vitaly-krugl opened a new issue #12348:
URL: https://github.com/apache/airflow/issues/12348


   Regarding the FAQ note "How to reduce airflow dag scheduling latency in production" -
   Per https://github.com/apache/airflow/blame/f097ae39a7243bd25d4d26664bc259981b2ba217/docs/faq.rst#L209:
   > User should consider to increase ``scheduler_heartbeat_sec`` config to a higher value (e.g 60 secs) which controls how frequent the airflow scheduler gets the heartbeat and updates the job's entry in database.
   
   However, since `scheduler_heartbeat_sec` is used as a duration (not as heartbeats/sec rate), increasing it to 60 (from the default 5 sec) would actually cause scheduling to become more sluggish, thus increasing latency.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vitaly-krugl edited a comment on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
vitaly-krugl edited a comment on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727083189


   Hi @ashb - The reason I was looking at `scheduler_heartbeat_sec` is that I am trying to improve the performance of system-level tests in my airflow-based app. 
   
   What I am seeing is that even under very low utilization, Airflow adds latency of 4-5 seconds for executing each task. I was looking at airflow.cfg options to tune to eliminate this 4-5 second per task latency on my testing setup. I haven't been able to find any combination of options that would reduce the latency below 4-5 seconds.
   
   **Any suggestions about how to eliminate these Airflow latencies for my testbed?**
   
   My test setup: one DAG with two tasks: Task A and Task B, with `A >> B` relationship. Implementation is in python. Each one of the python callbacks does minimum (almost no-op) work that shows up in the logs at < 1 sec. However, each dagrun takes upwards of 9 seconds and I observe from the timestamps (dagrun, Task A start/end and Task B start/end) that there are 3-5 second gaps between dagrun and Task A start, as well as between Task A end and Task B start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vitaly-krugl edited a comment on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
vitaly-krugl edited a comment on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727103987


   Thanks @ashb. What is "AIP-15"? Is the latency fix available only in 2.0? Not in 1.x.x?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vitaly-krugl edited a comment on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
vitaly-krugl edited a comment on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727083189


   @ashb - The reason I was looking at `scheduler_heartbeat_sec` is that I am trying to improve the performance of system-level tests in my airflow-based app. 
   
   What I am seeing is that even under very low utilization, Airflow adds latency of 4-5 seconds for executing each task. I was looking at airflow.cfg options to tune to eliminate this 4-5 second per task latency on my testing setup. I haven't been able to find any combination of options that would reduce the latency below 4-5 seconds.
   
   **Any suggestions about how to eliminate these latencies?**
   
   My test setup: one DAG with two tasks: Task A and Task B, with `A >> B` relationship. Implementation is in python. Each one of the python callbacks does minimum (almost no-op) work that shows up in the logs at < 1 sec. However, each dagrun takes upwards of 9 seconds and I observe from the timestamps (dagrun, Task A start/end and Task B start/end) that there are 3-5 second gaps between dagrun and Task A start, as well as between Task A end and Task B start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vitaly-krugl commented on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
vitaly-krugl commented on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727083189


   @ashb - The reason I was looking at `scheduler_heartbeat_sec` is that I am trying to improve the performance of system-level tests in my airflow-based app. 
   
   What I am seeing is that even under very low utilization, Airflow adds latency of 4-5 seconds for executing each task. I was looking at airflow.cfg options to tune to eliminate this 4-5 second per task latency on my testing setup. I haven't been able to find any combination of options that would reduce the latency below 4-5 seconds.
   
   Any suggestions about how to eliminate these latencies?
   
   My test setup: one DAG with two tasks: Task A and Task B, with `A >> B` relationship. Implementation is in python. Each one of the python callbacks does minimum (almost no-op) work that shows up in the logs at < 1 sec. However, each dagrun takes upwards of 9 seconds and I observe from the timestamps (dagrun, Task A start/end and Task B start/end) that there are 3-5 second gaps between dagrun and Task A start, as well as between Task A end and Task B start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vitaly-krugl commented on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
vitaly-krugl commented on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727080574


   I am running airflow 1.10.11, and empirically just realized that `scheduler_heartbeat_sec` has no effect on latencies.
   
   I just examined `BaseJob.heartbeat()` and see that it just updates the job's heartbeat timestamp in db and checks for SHUTDOWN.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vitaly-krugl edited a comment on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
vitaly-krugl edited a comment on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727083189


   Hi @ashb - The reason I was looking at `scheduler_heartbeat_sec` is that I am trying to improve the performance of system-level tests in my airflow-based app. 
   
   What I am seeing is that even under very low utilization, Airflow adds latency of 4-5 seconds for executing each task. I was looking at airflow.cfg options to tune to eliminate this 4-5 second per task latency on my testing setup. I haven't been able to find any combination of options that would reduce the latency below 4-5 seconds.
   
   **Any suggestions about how to eliminate these latencies?**
   
   My test setup: one DAG with two tasks: Task A and Task B, with `A >> B` relationship. Implementation is in python. Each one of the python callbacks does minimum (almost no-op) work that shows up in the logs at < 1 sec. However, each dagrun takes upwards of 9 seconds and I observe from the timestamps (dagrun, Task A start/end and Task B start/end) that there are 3-5 second gaps between dagrun and Task A start, as well as between Task A end and Task B start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727038929


   Yeah, and also in 2.0 that is basically not true anymore. Don't think it was true for 1.10.4+ either!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryw closed issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
ryw closed issue #12348:
URL: https://github.com/apache/airflow/issues/12348


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727038350


   @vitaly-krugl I think you are right. @ashb ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vitaly-krugl commented on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
vitaly-krugl commented on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727103987


   Thanks @ashb. What is "AIP-15"? Is the latency fix available only in 2.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727094176


   Yes. The task lag was fixed by AIP-15 and in my tests the delay between tasks is down to 0.18s
   
   Try out 2.0.0beta2?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #12348: faq.rst provides incorrect instructions for reducing scheduling latency

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #12348:
URL: https://github.com/apache/airflow/issues/12348#issuecomment-727186951


   AIP = Airflow Improvement Proposal.
   
   The most important bit of AIP-15 was this https://github.com/apache/airflow/pull/10956
   
   And given how big a change that was, no, it's not available in 1.10.x, sorry 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org