You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "sunank200 (via GitHub)" <gi...@apache.org> on 2024/02/26 13:32:01 UTC

[PR] Optimize DAG run scheduling based on dataset triggers and batching [airflow]

sunank200 opened a new pull request, #37707:
URL: https://github.com/apache/airflow/pull/37707

   We need optimisations to the dags_needing_dagruns method in the DagModel class. The change should include the implementation of batch processing to efficiently handle large sets of DAG IDs. The motivation behind this change is to address the performance issues associated with processing a large number of DAGs, which can lead to significant memory usage and slow down the scheduler.
   
   Changes:
   - Batch Processing: The method now processes DatasetDagRunQueue records in batches, reducing memory usage and improving efficiency. This approach minimizes the overhead of loading and processing large numbers of DAGs simultaneously.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Optimize DAG run scheduling based on dataset triggers and batching [airflow]

Posted by "sunank200 (via GitHub)" <gi...@apache.org>.
sunank200 commented on PR #37707:
URL: https://github.com/apache/airflow/pull/37707#issuecomment-2009128304

   Based on the conversation with @vatsrahul1001 this approach did not improve the performance hence not going ahead with this approach. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Optimize DAG run scheduling based on dataset triggers and batching [airflow]

Posted by "sunank200 (via GitHub)" <gi...@apache.org>.
sunank200 closed pull request #37707: Optimize DAG run scheduling based on dataset triggers and batching
URL: https://github.com/apache/airflow/pull/37707


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Optimize DAG run scheduling based on dataset triggers and batching [airflow]

Posted by "sunank200 (via GitHub)" <gi...@apache.org>.
sunank200 commented on PR #37707:
URL: https://github.com/apache/airflow/pull/37707#issuecomment-1969493458

   Based on conversation with @dstandish - we can park this PR and work on more pressing issue first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org