You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/15 00:59:03 UTC

[GitHub] [airflow] GagandeepS edited a comment on issue #19548: Option to include subdir in trigger_dag to not make scheduler scan the whole dag folder

GagandeepS edited a comment on issue #19548:
URL: https://github.com/apache/airflow/issues/19548#issuecomment-968409201


   So we have a use case where multiple dynamic dags are getting added to the dagbag and I believe there will always be a latency between dropping a new dag into the dagbag folder and operator checking if the path/record of that new dag exists in the table or not using trigger_dag. 
   
   So, to let scheduler take as much time it needs to insert the record into the table, we trigger the new dag and 
   check if it gets triggered without error or not. If there is an error (usually 'Dag xxx does not exists') then it retries again in some time. So far so good, except when there is a peak load (10s of DAGs are getting generated dynamically and getting saved in the DAG bag). In this case scheduler gets slow coz it needs to insert multiple record and hence trigger_dag (coz of retry) takes 3-10min. I want to minimize this 3-10min.
   
   Proposed solution: Potentially, either add a table in airflow backend data model or use an index or bulk insert or similar so that the performance of scheduler, while inserting the new record, does not gets hampered and searching of the new dag gets faster.
   
   Just a thought: May be if we can have the provision to change the type of DB of Airflow so that instead of postgres, we can change it to a NoSQL with index matching that in postgres right now (I am hoping) so that inserting and searching gets faster and one can manage a not-so-fast update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org