You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/26 10:57:03 UTC

[GitHub] [airflow] j-y-matsubara opened a new pull request #9531: Support .airflowignore for plugins : .pluginingore

j-y-matsubara opened a new pull request #9531:
URL: https://github.com/apache/airflow/pull/9531


   
   This PR is to support .pluginignore. 
   Specifies intentionally files / directories in the PlUGINS_FOLDER to ignore by Airflow.
   
   Issues Link:
   #9413 
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara edited a comment on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara edited a comment on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-651678532


   I think the one failure of tests doesn't seem to have anything to do with this PR.....
   The same error occurs in other PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448466908



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]
+                patterns += [re.compile(line) for line in lines_no_comments if line]
+                patterns = list(set(patterns))
+
+        dirs[:] = [
+            subdir
+            for subdir in dirs
+            if not any(p.search(
+                os.path.join(os.path.relpath(root, str(base_dir_path)), subdir)) for p in patterns)
+        ]
+
+        for subdir in dirs:
+            patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()
+
+        for file in files:  # type: ignore
+            if file == ignore_list_file:
+                continue
+            file_path = os.path.join(root, str(file))
+            if any([re.findall(p, file_path) for p in patterns]):

Review comment:
       Yes.
   I fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449633646



##########
File path: tests/plugins/test_plugin_ignore.py
##########
@@ -0,0 +1,96 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+    """
+    Test that the .airflowignore work and whether the file is properly ignored.
+    """
+
+    def setUp(self):
+        """
+        Make tmp folder and files that should be ignored. And set base path.
+        """
+        self.test_dir = tempfile.mkdtemp()
+        self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+        self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+        os.mkdir(os.path.join(self.test_dir, "test_ignore"))
+        with open(os.path.join(self.plugin_folder_path, "test_load.py"), "w") as file:
+            file.write("#Should not be ignored file")
+        with open(os.path.join(self.plugin_folder_path, ".airflowignore"), "w") as file:
+            file.write("#ignore test\nnot\nsubdir2")
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir1"))
+        with open(os.path.join(self.plugin_folder_path, "subdir1/.airflowignore"), "w") as file:
+            file.write("#ignore test\nnone")
+        with open(os.path.join(self.plugin_folder_path, "subdir1/test_load_sub1.py"), "w") as file:
+            file.write("#Should not be ignored file")
+        with open(os.path.join(self.plugin_folder_path, "test_notload_sub.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')
+        with open(os.path.join(self.plugin_folder_path, "subdir1/test_noneload_sub1.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+        with open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')
+        with open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')

Review comment:
       Is content of those files important? There's a lot of repeated code so I would opt for some loop like:
   ```python
   for file_path, content in files_content:
       with open(file_path) as f:
           f.wrtie(content)
   ```
   Do you think it will make the code clearer? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446508436



##########
File path: airflow/plugins_manager.py
##########
@@ -164,8 +163,28 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
     # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+    for root, dirs, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+        ignore_file = os.path.join(root, '.pluginignore')
+
+        if os.path.isfile(ignore_file):
+            with open(ignore_file, 'r') as file:

Review comment:
       You are right.
   In fact, this is pretty much the same code.
   
    I made a generator. What do you think of this? 
   (Perhaps we can make the generator func share with file.py but....)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446609058



##########
File path: docs/concepts.rst
##########
@@ -1465,3 +1465,13 @@ would not be scanned by Airflow at all. This improves efficiency of DAG finding)
 The scope of a ``.airflowignore`` file is the directory it is in plus all its subfolders.
 You can also prepare ``.airflowignore`` file for a subfolder in ``DAG_FOLDER`` and it
 would only be applicable for that subfolder.
+
+
+.pluginignore
+''''''''''''''
+
+A ``.pluginignore`` file specifies the directories or files in ``PLUGINS_FOLDER``

Review comment:
       Why the new file name? In my opinion, you can use `.airflowignore` and it will be easier to use.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446210550



##########
File path: airflow/plugins_manager.py
##########
@@ -164,8 +163,28 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
     # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+    for root, dirs, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+        ignore_file = os.path.join(root, '.pluginignore')
+
+        if os.path.isfile(ignore_file):
+            with open(ignore_file, 'r') as file:

Review comment:
       Hello.
   This code looks very similar to:
   https://github.com/apache/airflow/blob/master/airflow/utils/file.py#L125
   I think you can make a generator that will take the starting path and yield for each file to load. 
   Best regards,
   Kamil




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-652534766


   There was an error.
   I will fix it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448452969



##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"

Review comment:
       It's not particularly necessary.
   I fixed!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448459428



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]

Review comment:
       It is not necessary.
   I fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-651678532


   I think the one failure of tests doesn't seem to have anything to do with this PR.....


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448993794



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,47 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.

Review comment:
       I'm sorry.
   It's a simple mistake on my part.
   I fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-653371144


   The same k8s error that has nothing to do with this PR as last time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448366593



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]
+                patterns += [re.compile(line) for line in lines_no_comments if line]
+                patterns = list(set(patterns))
+
+        dirs[:] = [
+            subdir
+            for subdir in dirs
+            if not any(p.search(
+                os.path.join(os.path.relpath(root, str(base_dir_path)), subdir)) for p in patterns)
+        ]
+
+        for subdir in dirs:
+            patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()

Review comment:
       ```suggestion
   
           patterns_by_dir  = {os.path.join(root, sd): patterns.copy() for sd in dirs}
   ```
   WDYT? Also, do we have to create copy of `patterns` each time?

##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"

Review comment:
       Should we make it a constant?

##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]
+                patterns += [re.compile(line) for line in lines_no_comments if line]
+                patterns = list(set(patterns))
+
+        dirs[:] = [
+            subdir
+            for subdir in dirs
+            if not any(p.search(
+                os.path.join(os.path.relpath(root, str(base_dir_path)), subdir)) for p in patterns)
+        ]
+
+        for subdir in dirs:
+            patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()
+
+        for file in files:  # type: ignore
+            if file == ignore_list_file:
+                continue
+            file_path = os.path.join(root, str(file))
+            if any([re.findall(p, file_path) for p in patterns]):

Review comment:
       ```suggestion
               if any(re.findall(p, file_path) for p in patterns):
   ```
   Should work also

##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"
+
+    for file_path in find_path_from_directory(  # pylint: disable=too-many-nested-blocks
+            str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+        try:
+            if not os.path.isfile(file_path):
+                continue
+            mod_name, file_ext = os.path.splitext(
+                os.path.split(file_path)[-1])
+            if file_ext != '.py':
+                continue
+
+            log.info('Importing plugin module %s', file_path)
+
+            loader = importlib.machinery.SourceFileLoader(mod_name, file_path)
+            spec = importlib.util.spec_from_loader(mod_name, loader)
+            mod = importlib.util.module_from_spec(spec)
+            sys.modules[spec.name] = mod
+            loader.exec_module(mod)
+            for mod_attr_value in list(mod.__dict__.values()):
+                if is_valid_plugin(mod_attr_value):
+                    plugin_instance = mod_attr_value()
+                    plugins.append(plugin_instance)

Review comment:
       ```suggestion
               for mod_attr_value in (m for m in mod.__dict__.values() if is_valid_plugin(m)):
                   plugin_instance = mod_attr_value()
                   plugins.append(plugin_instance)
   ```
   No strong opinion here but in this way we may be able to fix the `# pylint: disable=too-many-nested-blocks`

##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"
+
+    for file_path in find_path_from_directory(  # pylint: disable=too-many-nested-blocks
+            str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+        try:
+            if not os.path.isfile(file_path):
+                continue
+            mod_name, file_ext = os.path.splitext(
+                os.path.split(file_path)[-1])
+            if file_ext != '.py':
+                continue

Review comment:
       Does this part have to be in `try` clause? 

##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"
+
+    for file_path in find_path_from_directory(  # pylint: disable=too-many-nested-blocks
+            str(settings.PLUGINS_FOLDER), str(ignore_list_file)):

Review comment:
       Is this cast to `str` necessary?

##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]

Review comment:
       Should we compile this only once?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##########
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+    """
+    Test load operator
+    """
+    @apply_defaults
+    def __init__(
+            self,
+            *args,
+            **kwargs):
+        super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+    def execute(self, context):
+        pass

Review comment:
       These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 95)of the test_plugin_ignore.py. )
   
   But....
   I fixed these files and ".airflowignore" files to be generated by test_plugin_ignore.py, and delete pull files!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-652898769


   .... I think the two failure of k8s tests doesn't seem to have anything to do with this PR. The same error occurs in other PRs. :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara removed a comment on pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
j-y-matsubara removed a comment on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-650793451


   Many errors will be fixed at a later date.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446609259



##########
File path: airflow/plugins_manager.py
##########
@@ -164,8 +163,28 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
     # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+    for root, dirs, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+        ignore_file = os.path.join(root, '.pluginignore')
+
+        if os.path.isfile(ignore_file):
+            with open(ignore_file, 'r') as file:

Review comment:
       I like this code now. :-)
   
   Moving this method to airflow.utils.file is a good idea. This will allow us to delete the duplicate code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara removed a comment on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara removed a comment on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-651621009


   I think the one failure of tests doesn't seem to have anything to do with this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449419305



##########
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+    """
+    Test load operator
+    """
+    @apply_defaults
+    def __init__(
+            self,
+            *args,
+            **kwargs):
+        super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+    def execute(self, context):
+        pass

Review comment:
       What is the purpose of those files? I think I don't see it tests




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-652901310


   Please ignore it. It is not related. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448453473



##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"
+
+    for file_path in find_path_from_directory(  # pylint: disable=too-many-nested-blocks
+            str(settings.PLUGINS_FOLDER), str(ignore_list_file)):

Review comment:
       It is not necessary.
   I fixed.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449418389



##########
File path: tests/plugins/test_plugin_ignore.py
##########
@@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+    """
+    Test that the .airflowignore work and whether the file is properly ignored.
+    """
+
+    def setUp(self):
+        """
+        Make tmp folder and files that should be ignored. And set base path.
+        """
+        self.test_dir = tempfile.mkdtemp()
+        self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+        self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+        shutil.copytree(os.path.join(settings.PLUGINS_FOLDER, 'test_ignore'), self.plugin_folder_path)
+        file = open(os.path.join(self.plugin_folder_path, "test_notload_sub.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        file = open(os.path.join(self.plugin_folder_path, "subdir1/test_noneload_sub1.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+        file = open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        self.mock_plugins_folder = patch.object(
+            settings, 'PLUGINS_FOLDER', return_value=self.plugin_folder_path
+        )
+
+    def tearDown(self):
+        """
+        Delete tmp folder
+        """
+        shutil.rmtree(self.test_dir)
+
+    def test_find_not_should_ignore_path(self):
+        """
+        Test that the .airflowignore work and whether the file is properly ignored.
+        """
+
+        detected_files = set()
+        should_ignore_files = {
+            'test_notload.py',
+            'test_notload_sub.py',
+            'test_noneload_sub1.py',
+            'test_shouldignore.py'
+        }
+        should_not_ignore_files = {
+            'test_load.py',
+            'test_load_sub1.py'
+        }
+        ignore_list_file = ".airflowignore"
+        for file_path in find_path_from_directory(self.plugin_folder_path, ignore_list_file):
+            if not os.path.isfile(file_path):
+                continue
+            _, file_ext = os.path.splitext(os.path.split(file_path)[-1])
+            if file_ext != '.py':
+                continue
+            detected_files.add(os.path.basename(file_path))
+        self.assertEqual(detected_files, should_not_ignore_files)
+        for path in detected_files:
+            self.assertNotIn(path, should_ignore_files)

Review comment:
       ```suggestion
           self.assertEqual(detected_files & should_ignore_files, set())
   ```
   WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448456168



##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"
+
+    for file_path in find_path_from_directory(  # pylint: disable=too-many-nested-blocks
+            str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+        try:
+            if not os.path.isfile(file_path):
+                continue
+            mod_name, file_ext = os.path.splitext(
+                os.path.split(file_path)[-1])
+            if file_ext != '.py':
+                continue

Review comment:
       No....
   I fixed!

##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"
+
+    for file_path in find_path_from_directory(  # pylint: disable=too-many-nested-blocks
+            str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+        try:
+            if not os.path.isfile(file_path):
+                continue
+            mod_name, file_ext = os.path.splitext(
+                os.path.split(file_path)[-1])
+            if file_ext != '.py':
+                continue

Review comment:
       No....
   I fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-650781887


   > Could you add some tests? For plugin managers testing, tests.test_utils.mock_plugins.mock_plugin_manager may be helpful. A perfect test can create the required files in a temporary directory (NamedTemporaryFile) and then check if the plugins have been loaded.
   
   Thank you for your advice.
   I added a test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448905152



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,47 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.

Review comment:
       ```suggestion
           ignore_file_name: str) -> Generator[str, None, None]:
       """
       Search the file and return the path of the file that should not be ignored.
       :param base_dir_path: the base path to be searched for.
       :param ignore_file_name: the file name in which specifies a regular expression pattern is written.
   ```
   WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-653675958


   @j-y-matsubara would you mind a rebase? The k8s tests should be fixed now 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448993794



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,47 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.

Review comment:
       I'm sorry.
   It's my simple mistake.
   I fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448466461



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]
+                patterns += [re.compile(line) for line in lines_no_comments if line]
+                patterns = list(set(patterns))
+
+        dirs[:] = [
+            subdir
+            for subdir in dirs
+            if not any(p.search(
+                os.path.join(os.path.relpath(root, str(base_dir_path)), subdir)) for p in patterns)
+        ]
+
+        for subdir in dirs:
+            patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()

Review comment:
       This is necessary.
   A canonical pattern that is evaluated in a parent directory must also be evaluated in its parent's child directories. 
   At least that's how .airflowignore (selection of dag) is currently specificated.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara edited a comment on pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
j-y-matsubara edited a comment on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-650781887


   > Could you add some tests? For plugin managers testing, tests.test_utils.mock_plugins.mock_plugin_manager may be helpful. A perfect test can create the required files in a temporary directory (NamedTemporaryFile) and then check if the plugins have been loaded.
   
   Thank you for your advice.
   I added a test.
   And moved method to airflow.utils.file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449419305



##########
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+    """
+    Test load operator
+    """
+    @apply_defaults
+    def __init__(
+            self,
+            *args,
+            **kwargs):
+        super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+    def execute(self, context):
+        pass

Review comment:
       What is the purpose of those files? I think I don't see in tests




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615262



##########
File path: tests/plugins/test_plugin_ignore.py
##########
@@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+    """
+    Test that the .airflowignore work and whether the file is properly ignored.
+    """
+
+    def setUp(self):
+        """
+        Make tmp folder and files that should be ignored. And set base path.
+        """
+        self.test_dir = tempfile.mkdtemp()
+        self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+        self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+        shutil.copytree(os.path.join(settings.PLUGINS_FOLDER, 'test_ignore'), self.plugin_folder_path)
+        file = open(os.path.join(self.plugin_folder_path, "test_notload_sub.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        file = open(os.path.join(self.plugin_folder_path, "subdir1/test_noneload_sub1.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+        file = open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()

Review comment:
       Of course!
   I fixed.

##########
File path: tests/plugins/test_plugin_ignore.py
##########
@@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+    """
+    Test that the .airflowignore work and whether the file is properly ignored.
+    """
+
+    def setUp(self):
+        """
+        Make tmp folder and files that should be ignored. And set base path.
+        """
+        self.test_dir = tempfile.mkdtemp()
+        self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+        self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+        shutil.copytree(os.path.join(settings.PLUGINS_FOLDER, 'test_ignore'), self.plugin_folder_path)
+        file = open(os.path.join(self.plugin_folder_path, "test_notload_sub.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        file = open(os.path.join(self.plugin_folder_path, "subdir1/test_noneload_sub1.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+        file = open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        self.mock_plugins_folder = patch.object(
+            settings, 'PLUGINS_FOLDER', return_value=self.plugin_folder_path
+        )
+
+    def tearDown(self):
+        """
+        Delete tmp folder
+        """
+        shutil.rmtree(self.test_dir)
+
+    def test_find_not_should_ignore_path(self):
+        """
+        Test that the .airflowignore work and whether the file is properly ignored.
+        """
+
+        detected_files = set()
+        should_ignore_files = {
+            'test_notload.py',
+            'test_notload_sub.py',
+            'test_noneload_sub1.py',
+            'test_shouldignore.py'
+        }
+        should_not_ignore_files = {
+            'test_load.py',
+            'test_load_sub1.py'
+        }
+        ignore_list_file = ".airflowignore"
+        for file_path in find_path_from_directory(self.plugin_folder_path, ignore_list_file):
+            if not os.path.isfile(file_path):
+                continue
+            _, file_ext = os.path.splitext(os.path.split(file_path)[-1])
+            if file_ext != '.py':
+                continue
+            detected_files.add(os.path.basename(file_path))
+        self.assertEqual(detected_files, should_not_ignore_files)
+        for path in detected_files:
+            self.assertNotIn(path, should_ignore_files)

Review comment:
       It's efficient!
   I fixed!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446508436



##########
File path: airflow/plugins_manager.py
##########
@@ -164,8 +163,28 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
     # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+    for root, dirs, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
+
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+        ignore_file = os.path.join(root, '.pluginignore')
+
+        if os.path.isfile(ignore_file):
+            with open(ignore_file, 'r') as file:

Review comment:
       You are right.
   In fact, this is pretty much the same code.
   
    I make a generator. What do you think of this? 
   (Perhaps we can make the generator func share with file.py but....)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449417007



##########
File path: tests/plugins/test_plugin_ignore.py
##########
@@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+    """
+    Test that the .airflowignore work and whether the file is properly ignored.
+    """
+
+    def setUp(self):
+        """
+        Make tmp folder and files that should be ignored. And set base path.
+        """
+        self.test_dir = tempfile.mkdtemp()
+        self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+        self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+        shutil.copytree(os.path.join(settings.PLUGINS_FOLDER, 'test_ignore'), self.plugin_folder_path)
+        file = open(os.path.join(self.plugin_folder_path, "test_notload_sub.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        file = open(os.path.join(self.plugin_folder_path, "subdir1/test_noneload_sub1.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+        file = open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w')
+        file.write('raise Exception("This file should have been ignored!")')
+        file.close()

Review comment:
       Can you please use the ctx manager?
   ```python
   with open(...) as file:
       file.write()
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448457433



##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,34 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    ignore_list_file = ".airflowignore"
+
+    for file_path in find_path_from_directory(  # pylint: disable=too-many-nested-blocks
+            str(settings.PLUGINS_FOLDER), str(ignore_list_file)):
+
+        try:
+            if not os.path.isfile(file_path):
+                continue
+            mod_name, file_ext = os.path.splitext(
+                os.path.split(file_path)[-1])
+            if file_ext != '.py':
+                continue
+
+            log.info('Importing plugin module %s', file_path)
+
+            loader = importlib.machinery.SourceFileLoader(mod_name, file_path)
+            spec = importlib.util.spec_from_loader(mod_name, loader)
+            mod = importlib.util.module_from_spec(spec)
+            sys.modules[spec.name] = mod
+            loader.exec_module(mod)
+            for mod_attr_value in list(mod.__dict__.values()):
+                if is_valid_plugin(mod_attr_value):
+                    plugin_instance = mod_attr_value()
+                    plugins.append(plugin_instance)

Review comment:
       Beautiful code!
   I reflect this. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-651621009


   I think the one failure of tests doesn't seem to have anything to do with this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449667794



##########
File path: tests/plugins/test_plugin_ignore.py
##########
@@ -0,0 +1,96 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import os
+import shutil
+import tempfile
+import unittest
+from unittest.mock import patch
+
+from airflow import settings  # type: ignore
+from airflow.utils.file import find_path_from_directory  # type: ignore
+
+
+class TestIgnorePluginFile(unittest.TestCase):
+    """
+    Test that the .airflowignore work and whether the file is properly ignored.
+    """
+
+    def setUp(self):
+        """
+        Make tmp folder and files that should be ignored. And set base path.
+        """
+        self.test_dir = tempfile.mkdtemp()
+        self.test_file = os.path.join(self.test_dir, 'test_file.txt')
+        self.plugin_folder_path = os.path.join(self.test_dir, 'test_ignore')
+        os.mkdir(os.path.join(self.test_dir, "test_ignore"))
+        with open(os.path.join(self.plugin_folder_path, "test_load.py"), "w") as file:
+            file.write("#Should not be ignored file")
+        with open(os.path.join(self.plugin_folder_path, ".airflowignore"), "w") as file:
+            file.write("#ignore test\nnot\nsubdir2")
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir1"))
+        with open(os.path.join(self.plugin_folder_path, "subdir1/.airflowignore"), "w") as file:
+            file.write("#ignore test\nnone")
+        with open(os.path.join(self.plugin_folder_path, "subdir1/test_load_sub1.py"), "w") as file:
+            file.write("#Should not be ignored file")
+        with open(os.path.join(self.plugin_folder_path, "test_notload_sub.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')
+        with open(os.path.join(self.plugin_folder_path, "subdir1/test_noneload_sub1.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')
+        os.mkdir(os.path.join(self.plugin_folder_path, "subdir2"))
+        with open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')
+        with open(os.path.join(self.plugin_folder_path, "subdir2/test_shouldignore.py"), 'w') as file:
+            file.write('raise Exception("This file should have been ignored!")')

Review comment:
       Yes!
   The notation you suggest is better than the existing my code.
   
   I fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r446663722



##########
File path: docs/concepts.rst
##########
@@ -1465,3 +1465,13 @@ would not be scanned by Airflow at all. This improves efficiency of DAG finding)
 The scope of a ``.airflowignore`` file is the directory it is in plus all its subfolders.
 You can also prepare ``.airflowignore`` file for a subfolder in ``DAG_FOLDER`` and it
 would only be applicable for that subfolder.
+
+
+.pluginignore
+''''''''''''''
+
+A ``.pluginignore`` file specifies the directories or files in ``PLUGINS_FOLDER``

Review comment:
       I agree with your opinion. 
   >In my opinion, you can use .airflowignore and it will be easier to use.
   
   It was fixed from `.pluginignore` to `.airflowignore` . :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448459428



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]

Review comment:
       It is not necessary.
   And I have consolidated the wasteful looping process into one by making modifications!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-653715178


   > @j-y-matsubara would you mind a rebase? The k8s tests should be fixed now
   
   Sure.
   
   All tests were passed!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##########
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+    """
+    Test load operator
+    """
+    @apply_defaults
+    def __init__(
+            self,
+            *args,
+            **kwargs):
+        super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+    def execute(self, context):
+        pass

Review comment:
       These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 95)of the test_plugin_ignore.py. )
   
   But....
   I fixed these files to be generated by test_plugin_ignore.py, and delete pull files!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-652396856


   @turbaszek Can I ask for review?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##########
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+    """
+    Test load operator
+    """
+    @apply_defaults
+    def __init__(
+            self,
+            *args,
+            **kwargs):
+        super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+    def execute(self, context):
+        pass

Review comment:
       These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 95)of the test_plugin_ignore.py. )
   
   But....
   I fixed these files to be generated by test_plugin_ignore.py, and delete these files!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448633985



##########
File path: airflow/plugins_manager.py
##########
@@ -164,34 +165,31 @@ def load_plugins_from_plugin_directory():
     global plugins  # pylint: disable=global-statement
     log.debug("Loading plugins from directory: %s", settings.PLUGINS_FOLDER)
 
-    # Crawl through the plugins folder to find AirflowPlugin derivatives
-    for root, _, files in os.walk(settings.PLUGINS_FOLDER, followlinks=True):  # noqa # pylint: disable=too-many-nested-blocks
-        for f in files:
-            filepath = os.path.join(root, f)
-            try:
-                if not os.path.isfile(filepath):
-                    continue
-                mod_name, file_ext = os.path.splitext(
-                    os.path.split(filepath)[-1])
-                if file_ext != '.py':
-                    continue
-
-                log.debug('Importing plugin module %s', filepath)
-
-                loader = importlib.machinery.SourceFileLoader(mod_name, filepath)
-                spec = importlib.util.spec_from_loader(mod_name, loader)
-                mod = importlib.util.module_from_spec(spec)
-                sys.modules[spec.name] = mod
-                loader.exec_module(mod)
-                for mod_attr_value in list(mod.__dict__.values()):
-                    if is_valid_plugin(mod_attr_value):
-                        plugin_instance = mod_attr_value()
-                        plugins.append(plugin_instance)
-            except Exception as e:  # pylint: disable=broad-except
-                log.exception(e)
-                path = filepath or str(f)
-                log.error('Failed to import plugin %s', path)
-                import_errors[path] = str(e)
+    for file_path in find_path_from_directory(
+            settings.PLUGINS_FOLDER, ".airflowignore"):
+
+        if not os.path.isfile(file_path):
+            continue
+        mod_name, file_ext = os.path.splitext(os.path.split(file_path)[-1])
+        if file_ext != '.py':
+            continue

Review comment:
       ```suggestion
   ```
   Should this not be part of the find_path_from_directory function? I would like to see that there is no code in the plugins_manager.py file that is responsible for the file selection. Plugins_manager should only load the module. WDYT?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ldacey commented on pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
ldacey commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-985688307


   Is it better to use a .airflowignore in the dags folder _and_ the plugins folder?
   
   My plugins folder is like this:
   
   ```
   /plugins
     - /operators
     - /hooks
     - /sensors
     - /sql
     - /tests
   ```
   
   Airflow needs access to the hooks and operators (my DAGs import them). It does not need access to the tests folder, and maybe not the SQL folder (some DAGs/operators read the SQL scripts from this path).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-650793451


   Many errors will be fixed at a later date.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9531: Support .airflowignore for plugins : .pluginingore

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#issuecomment-650706759


   Could you add some tests? For plugin managers testing, tests.test_utils.mock_plugins.mock_plugin_manager may be helpful. A perfect test can create the required files in a temporary directory (NamedTemporaryFile) and then check if the plugins have been loaded.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##########
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+    """
+    Test load operator
+    """
+    @apply_defaults
+    def __init__(
+            self,
+            *args,
+            **kwargs):
+        super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+    def execute(self, context):
+        pass

Review comment:
       These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 (now 95)of the test_plugin_ignore.py. )
   
   But....
   I fixed these files to be generated by test_plugin_ignore.py.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #9531:
URL: https://github.com/apache/airflow/pull/9531


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r448466461



##########
File path: airflow/utils/file.py
##########
@@ -90,6 +90,48 @@ def open_maybe_zipped(fileloc, mode='r'):
         return io.open(fileloc, mode=mode)
 
 
+def find_path_from_directory(
+        base_dir_path: str,
+        ignore_list_file: str) -> Generator[str, None, None]:
+    """
+    Search the file and return the path of the file that should not be ignored.
+    :param base_dir_path: the base path to be searched for.
+    :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
+
+    :return : file path not to be ignored
+    """
+
+    patterns_by_dir: Dict[str, List[Pattern[str]]] = {}
+
+    for root, dirs, files in os.walk(str(base_dir_path), followlinks=True):
+        patterns: List[Pattern[str]] = patterns_by_dir.get(root, [])
+
+        ignore_list_file_path = os.path.join(root, ignore_list_file)
+        if os.path.isfile(ignore_list_file_path):
+            with open(ignore_list_file_path, 'r') as file:
+                lines_no_comments = [re.compile(r"\s*#.*").sub("", line) for line in file.read().split("\n")]
+                patterns += [re.compile(line) for line in lines_no_comments if line]
+                patterns = list(set(patterns))
+
+        dirs[:] = [
+            subdir
+            for subdir in dirs
+            if not any(p.search(
+                os.path.join(os.path.relpath(root, str(base_dir_path)), subdir)) for p in patterns)
+        ]
+
+        for subdir in dirs:
+            patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()

Review comment:
       This is necessary.
   A canonical pattern that is evaluated in a parent directory must also be evaluated in its parent's child directories. At least that's how .airflowignore (selection of dag) is currently specificated.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] j-y-matsubara commented on a change in pull request #9531: Support .airflowignore for plugins

Posted by GitBox <gi...@apache.org>.
j-y-matsubara commented on a change in pull request #9531:
URL: https://github.com/apache/airflow/pull/9531#discussion_r449615325



##########
File path: tests/plugins/test_ignore/subdir1/test_load_sub1.py
##########
@@ -0,0 +1,35 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Import module"""
+from airflow.models.baseoperator import BaseOperator  # type: ignore
+from airflow.utils.decorators import apply_defaults  # type: ignore
+
+
+class Sub1TestLoadOperator(BaseOperator):
+    """
+    Test load operator
+    """
+    @apply_defaults
+    def __init__(
+            self,
+            *args,
+            **kwargs):
+        super(Sub1TestLoadOperator, self).__init__(*args, **kwargs)
+
+    def execute(self, context):
+        pass

Review comment:
       These files were used to files that should not be ignored.
   ( `self.assertEqual(detected_files, should_not_ignore_files)` Line 87 of the test_plugin_ignore.py. )
   
   But....
   I fixed these files to be generated by test_plugin_ignore.py.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org