You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/02 07:37:46 UTC

[GitHub] [airflow] yansfil opened a new issue #17372: Dag Serialization Speed is so slow

yansfil opened a new issue #17372:
URL: https://github.com/apache/airflow/issues/17372


   **Apache Airflow version**:
   2.1.0
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   1.20
   
   **Environment**:
   gke
   
   **What happened**:
   when first initializing airflow project in 2.1.0, the time to complete serialization of dags is too slow.
   it takes 30 minutes to complete serialize all dags (600 dags in my project). because of it, I should wait to execute all dags  at the first time. 
   dag parsing process of airflow v1 is more faster because of no loading serialization dags to database. How can I make it more faster? 
   
   **What you expected to happen**:
   make dag serialization process more faster than now
   
   
   **How to reproduce it**:
   I assigned schedeuler resources, 1g for cpu and 1g for memory. 
   and I configured airflow.cfg like below
   
   AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL: "5"
   AIRFLOW__CORE__MIN_SERIALIZED_DAG_FETCH_INTERVAL: "5"
   AIRFLOW__SCHEDULER__PARSING_PROCESSES: "8"
   AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: "0" 
   AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "10"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17372: Dag Serialization Speed is so slow

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17372:
URL: https://github.com/apache/airflow/issues/17372#issuecomment-890881685


   This problem is already solved. You should run multiple schedulers.
   
   One of the reasons Airflow HA scheduling is implemented is to allow scaling of serialization part. You can read more about HA scheduler, how it works and it's scalability characteristics in this blog post from Astronomer,
   
   https://www.astronomer.io/blog/airflow-2-scheduler#:~:text=As%20part%20of%20Apache%20Airflow,once%20their%20dependencies%20are%20met.
   
   Please try the same with multiple schedulers and let us know your experience.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17372: Dag Serialization Speed is so slow

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17372:
URL: https://github.com/apache/airflow/issues/17372#issuecomment-890796386


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #17372: Dag Serialization Speed is so slow

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #17372:
URL: https://github.com/apache/airflow/issues/17372


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org