You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/26 16:13:41 UTC

[GitHub] [airflow] anitakar edited a comment on issue #15306: Support Serialized DAGs on CLI Commands

anitakar edited a comment on issue #15306:
URL: https://github.com/apache/airflow/issues/15306#issuecomment-848862444


   At least for Airflow 1.10.15 not using serialization leads to very ineffective task execution in which the whole dagbag is parsed locally before a task is executed.
   Here is the excerpt from the code/stacktrace that proves my point:
   1. `airflow worker` command starts celery_executor (https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L1554)
   2. Then worker executes `airflow tasks run` as set by scheduler (https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L596)
   3. And in there get_dag is called without specifying store_serialized_dags which is false by default (https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L618)
   4. In there the whole dags directory is parsed locally on worker (https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L164)
   
   It seems very inefficient to parse all dags before each task execution.
   
   I have committed a few fixes to dag serialization. I would be happy to fix at least the path for task execution within worker.
   
   Sorry, my mistake. The dag is pickled: https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org