You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by "ahuynh@symphonyrm.com" <ah...@symphonyrm.com> on 2017/11/27 19:11:37 UTC

Ignore Processing DAG Definition Python Files for Paused DAGs

Hi all,

I wanted to gauge community interest in this idea we have. We are currently running a modified version of Airflow 1.9 RC3 where we ignore processing DAG definition Python files for paused DAGs. By default, list_py_file_paths traverses the dags subdirectory to look for Python files, and the scheduler processes all these files, regardless of whether the DAGs defined in these files are paused or not. Our proposed
modification was to query the fileloc column in the dag table, filtering on is_paused=1 and is_active=1 to get a list of file paths for paused DAGs. Then, we can exclude these files from the known_file_paths, so that the scheduler does not process these files. This feature can be set on and off via a scheduler config variable.

If anyone is interested, we already have the code written, so we'd be happy to package up our changes and create a PR.

Thanks!
-Andy

Re: Ignore Processing DAG Definition Python Files for Paused DAGs

Posted by Alek Storm <al...@gmail.com>.
What's the advantage of this change? Performance?

Alek

On Mon, Nov 27, 2017 at 1:11 PM, ahuynh@symphonyrm.com <
ahuynh@symphonyrm.com> wrote:

> Hi all,
>
> I wanted to gauge community interest in this idea we have. We are
> currently running a modified version of Airflow 1.9 RC3 where we ignore
> processing DAG definition Python files for paused DAGs. By default,
> list_py_file_paths traverses the dags subdirectory to look for Python
> files, and the scheduler processes all these files, regardless of whether
> the DAGs defined in these files are paused or not. Our proposed
> modification was to query the fileloc column in the dag table, filtering
> on is_paused=1 and is_active=1 to get a list of file paths for paused DAGs.
> Then, we can exclude these files from the known_file_paths, so that the
> scheduler does not process these files. This feature can be set on and off
> via a scheduler config variable.
>
> If anyone is interested, we already have the code written, so we'd be
> happy to package up our changes and create a PR.
>
> Thanks!
> -Andy
>