You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2019/08/03 16:15:00 UTC

[jira] [Commented] (AIRFLOW-5100) Airflow scheduler does not respect safe mode setting

    [ https://issues.apache.org/jira/browse/AIRFLOW-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899481#comment-16899481 ] 

Ash Berlin-Taylor commented on AIRFLOW-5100:
--------------------------------------------

Thanks, yes something like that looks right

 

And yes the Agent etc class is a bit twisty right now. I've got a half written blog post that describes how it all hangs together, what runs in which process etc.

 

> Airflow scheduler does not respect safe mode setting
> ----------------------------------------------------
>
>                 Key: AIRFLOW-5100
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5100
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.3
>            Reporter: Jonathan Lange
>            Priority: Major
>
> We recently disabled safe mode in our Airflow 1.10.3 deployment and then removed some needless comments from our DAGs that mentioned "airflow" and "DAG".
> After deploying (and after several days!), we found that although these DAGs still appeared in the UI, they were not running. They didn't have "squares" in the tree view indicating that they should be run. 
> We restored the words "airflow" and "DAG" to these jobs, and they were scheduled again.
> After digging into the code, it looks like the {{SchedulerJob}} calls {{list_py_file_paths}} without specifying {{safe_mode}}, and {{list_py_file_paths}} defaults to {{safe_mode=True}}, rather than consulting the configuration as it does for {{include_examples}}:
> [https://github.com/apache/airflow/blob/master/airflow/jobs/scheduler_job.py#L1278]
> [https://github.com/apache/airflow/blob/master/airflow/utils/dag_processing.py#L291-L304]
> I suggest the following change, to make the behaviour of {{list_py_file_paths}} more consistent with itself:
> {code:python}
> modified   airflow/utils/dag_processing.py
> @@ -287,7 +287,7 @@ def correct_maybe_zipped(fileloc):
>  COMMENT_PATTERN = re.compile(r"\s*#.*")
>  
>  
> -def list_py_file_paths(directory, safe_mode=True,
> +def list_py_file_paths(directory, safe_mode=None,
>                         include_examples=None):
>      """
>      Traverse a directory and look for Python files.
> @@ -299,6 +299,8 @@ def list_py_file_paths(directory, safe_mode=True,
>      :return: a list of paths to Python files in the specified directory
>      :rtype: list[unicode]
>      """
> +    if safe_mode is None:
> +        safe_mode = conf.getboolean('core', 'DAG_DISCOVERY_SAFE_MODE')
>      if include_examples is None:
>          include_examples = conf.getboolean('core', 'LOAD_EXAMPLES')
>      file_paths = []
> {code}
> I tried to find a way to write tests for this, but I couldn't figure it out. I sort of expected a function that looked at a bunch of files and returned a collection of DAGs, but I couldn't find it, and couldn't really get the theme behind {{DagFileProcessorAgent}} and friends.
>  
> I haven't tried to produce a minimal example of this error, and have not confirmed that the above patch fixes the problem.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)