You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/03/28 21:17:00 UTC

[jira] [Commented] (AIRFLOW-1729) Ignore whole directories in .airflowignore

    [ https://issues.apache.org/jira/browse/AIRFLOW-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418126#comment-16418126 ] 

ASF subversion and git services commented on AIRFLOW-1729:
----------------------------------------------------------

Commit 721bc09271856b0a52e22fbcb7bb8232eae800d3 in incubator-airflow's branch refs/heads/master from [~abhishek0812]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=721bc09 ]

[AIRFLOW-1729] improve dagBag time

Closes #3171 from q2w/master


> Ignore whole directories in .airflowignore
> ------------------------------------------
>
>                 Key: AIRFLOW-1729
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1729
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: Airflow 2.0
>            Reporter: Cedric Hourcade
>            Assignee: Kamil Sambor
>            Priority: Minor
>
> The .airflowignore file allows to prevent scanning files for DAG. But even if we blacklist fulldirectory the {{os.walk}} will still go through them no matter how deep they are and skip files one by one, which can be an issue when you keep around big .git or virtualvenv directories.
> I suggest to add something like:
> {code}
> dirs[:] = [d for d in dirs if not any([re.findall(p, os.path.join(root, d)) for p in patterns])]
> {code}
> to prune the directories here: https://github.com/apache/incubator-airflow/blob/cfc2f73c445074e1e09d6ef6a056cd2b33a945da/airflow/utils/dag_processing.py#L208-L209 and in {{list_py_file_paths}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)