You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/25 16:57:53 UTC

[GitHub] [airflow] SamWheating opened a new pull request #22531: Remove dag parsing from `airflow db init` command

SamWheating opened a new pull request #22531:
URL: https://github.com/apache/airflow/pull/22531


   When running `airflow db init`, Airflow will parse all of the DAGs in the configured dag folder sequentially in a single process. When there are a large number of DAGs present this can _significantly_ slow down the time it takes for the `db init` command to run. 
   
   In my opinion, initializing the DB and populating it with data are separate tasks and shouldn't be combined into a single function. The background DAG processor is also much faster at parsing files and populating the DB due to using multiprocessing. 
   
   I propose splitting the bootstrapping of the DagBag out into a separate function (so as to not introduce any changes to the test setup / teardown process) and removing it from the `db init` and `db reset` commands.
   
   Let me know if there's anything I'm missing here, or if there's an explanation for parsing DAGs here which I may have missed. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #22531: Remove dag parsing from `airflow db init` command

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #22531:
URL: https://github.com/apache/airflow/pull/22531#issuecomment-1084640421


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #22531: Remove dag parsing from `airflow db init` command

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #22531:
URL: https://github.com/apache/airflow/pull/22531


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #22531: Remove dag parsing from `airflow db init` command

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #22531:
URL: https://github.com/apache/airflow/pull/22531#issuecomment-1084635790


   Nice one! @kaxil @ashb @ephraimbuddy -> WDYT? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org