You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Andrey Klochkov (Jira)" <ji...@apache.org> on 2019/12/10 02:50:00 UTC

[jira] [Comment Edited] (AIRFLOW-6171) airflow ignore file with .* located in a subdirectory ignores dags in other dirs

    [ https://issues.apache.org/jira/browse/AIRFLOW-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992134#comment-16992134 ] 

Andrey Klochkov edited comment on AIRFLOW-6171 at 12/10/19 2:49 AM:
--------------------------------------------------------------------

This is happening due to the following defect in {{dag_processing.list_py_file_paths}}. In the loop that walks through subdirectories the same object {{patterns}} is written to dictionary {{patterns_by_dir under different keys. When the loop goes through the top level dags directory, it puts the same object }}{{patterns}} under keys corresponding to each of the subdirectories. Then when the look goes through subdirectories it fetches the same list from the map and so airflowignore present in one of the directories effectively is applied to all other subdirectories processed later. 

The fix is to add ".copy()" as shown here:
{code:java}
    # We want patterns defined in a parent folder's .airflowignore to
    # apply to subdirs too
    for d in dirs:
        patterns_by_dir[os.path.join(root, d)] = patterns.copy() {code}


was (Author: aklochkov):
This is happening due to the following defect in {{dag_processing.list_py_file_paths}}. In the look that walks through subdirectories the same object {{patterns}} is written to dictionary {{patterns_by_dir under different keys. When the loop goes through the top level dags directory, it puts the same object }}{{patterns}} under keys corresponding to each of the subdirectories. Then when the look goes through subdirectories it fetches the same list from the map and so airflowignore present in one of the directories effectively is applied to all other subdirectories processed later. 

The fix is to add ".copy()" as shown here:
{code:java}
    # We want patterns defined in a parent folder's .airflowignore to
    # apply to subdirs too
    for d in dirs:
        patterns_by_dir[os.path.join(root, d)] = patterns.copy() {code}

> airflow ignore file with .* located in a subdirectory ignores dags in other dirs
> --------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-6171
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6171
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: core, DAG
>    Affects Versions: 1.10.5, 1.10.6
>         Environment: Ubuntu 18.04
>            Reporter: Andrey Kateshov
>            Priority: Major
>
> I have an airflow dags directory looking like this: x/... y/... z/.... I.e. all dags are placed in subdirectories.
> If I place an .airflowignore with a single line of .* in directory z/ the dags in other directories (e.g x/ and y/) are also ignored. Which is already a big issue. What makes it even stranger that only some of them are ignored, potentially masking the effects of this behaviour. 
> What makes it even worse you won't see that these dags are now disabled in airflow UI unless you completely restart it(possibly together with the scheduler, we restarted both, didn't try to see if only the UI is enough).
> This issue was not present in 1.10.3, but appears in 1.10.5. I didn't test 1.10.4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)