You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/16 17:28:06 UTC

[GitHub] [airflow] potiuk opened a new pull request #12383: Adds mechanism for provider package discovery.

potiuk opened a new pull request #12383:
URL: https://github.com/apache/airflow/pull/12383


   This is a simple mechanism that will allow us to dynamically
   discover and register all provider packages in the Airflow core.
   
   Closes: #11422
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-729044887


   [The Workflow run](https://github.com/apache/airflow/actions/runs/368511513) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728996243


   @ashb -> any more comments on this one?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525076896



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary
+                self._provider_directory[provider_info['package-name']] = provider_info
+            except FileNotFoundError:
+                # This is OK - this is not a provider package
+                pass
+            except TypeError as e:
+                if "is not a package" not in str(e):
+                    log.warning("Error when loading 'provider.yaml' file from %s:%s}", module_info.name, e)
+                # Otherwise this is OK - this is likely a module
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when loading 'provider.yaml' file from %s:%s", module_info.name, e)
+
+    def __init__(self):

Review comment:
       Sure. No problem :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728211000


   This is a much better/faster/more versatile version of provider manager - which will allow for discovery and registration of provider packages! It is based on the provider.yaml files that we have in each provider for documentation but we should be able also to extend it to contain the classes that should be registered.
   
   Let me know your comments!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r524964573



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(pkg_resources.read_text(imported_module, 'provider.yaml'))
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary
+                self._provider_directory[provider_info['package-name']] = provider_info
+            except FileNotFoundError:
+                # This is OK - this is not a provider package
+                pass
+            except TypeError as e:
+                if "is not a package" not in str(e):
+                    log.warning("Error when loading 'provider.yaml' file from %s:%s}", module_info.name, e)
+                # Otherwise this is OK - this is likely a module
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when loading 'provider.yaml' file from %s:%s", module_info.name, e)
+
+    def __init__(self):
+        self._provider_directory = {}
+        try:
+            from airflow import providers
+        except ImportError as e:
+            log.warning("No providers are present or error when importing them! :%s", e)
+            return
+        self.__find_all_providers(providers.__path__)
+
+    # TODO(potiuk) - add refresh mechanism to re-import providers and update the dictionary

Review comment:
       Yeah. that's why I not even looked at it yet, I have not thought that through if we actually need it or not. It's certainly not something for 2.0, so I remove this comment indeed. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728907430


   [The Workflow run](https://github.com/apache/airflow/actions/runs/368121038) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r524535879



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:

Review comment:
       How about moving this in to `airflow/providers/manager.py` -- with the namespace packages this won't cause any problems to installing other providers.

##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):

Review comment:
       So I think this is my only real issue with this approach: this only works for packages installed in/under airflow.providers, which won't always be the case with third party providers (i.e. https://github.com/great-expectations/airflow-provider-great-expectations)
   
   That said, this approach is "Good Enough" for 2.0, so happy to discuss this later.

##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(pkg_resources.read_text(imported_module, 'provider.yaml'))
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary
+                self._provider_directory[provider_info['package-name']] = provider_info
+            except FileNotFoundError:
+                # This is OK - this is not a provider package
+                pass
+            except TypeError as e:
+                if "is not a package" not in str(e):
+                    log.warning("Error when loading 'provider.yaml' file from %s:%s}", module_info.name, e)
+                # Otherwise this is OK - this is likely a module
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when loading 'provider.yaml' file from %s:%s", module_info.name, e)
+
+    def __init__(self):
+        self._provider_directory = {}
+        try:
+            from airflow import providers
+        except ImportError as e:
+            log.warning("No providers are present or error when importing them! :%s", e)
+            return
+        self.__find_all_providers(providers.__path__)
+
+    # TODO(potiuk) - add refresh mechanism to re-import providers and update the dictionary

Review comment:
       I don't think we really need this -- with multiple webserver worker processes any refresh would lead to confusing state (some on new list, some on old, and no way to tell which is which) unless we have some way to make _all_ workers perform it.

##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources

Review comment:
       ```suggestion
       import importlib.resources as importlib_resources
   except ImportError:
       # Try backported to PY<37 `importlib_resources`.
       import importlib_resources as importlib_resources
   ```
   
   Let's _not_ call it pkg_resources as that is the old quasi-deprecated name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525078051



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary

Review comment:
       I think those provider.yaml should be verified at commit time (pre-commit) rather than in runtime




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525151225



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary

Review comment:
       At some point we're just going to end up running our entire test suite in pre-commit :grin: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-729046551


   [The Workflow run](https://github.com/apache/airflow/actions/runs/368514083) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525072040



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary

Review comment:
       Can you add JSON schema validation? 
   ![image](https://user-images.githubusercontent.com/12058428/99383072-ab2a1280-28cd-11eb-8d37-708358c87f7c.png)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-729013882






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525082615



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary

Review comment:
       If we also want to support [third parties](https://github.com/apache/airflow/pull/12383#discussion_r524970486), we cannot expect these files to have the correct structure. The user can always damage this file. Besides, there are Airflow forks that do no ru pre-commit on  CI and can also create a damaged package.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728213703


   [The Workflow run](https://github.com/apache/airflow/actions/runs/366498365) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-729042702


   [The Workflow run](https://github.com/apache/airflow/actions/runs/368504158) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r524965824



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources

Review comment:
       Yep.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-729094057


   I'd merge it now and I can address any more comments in the follow-up PRs where I add those discovery "actions"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525134530



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary

Review comment:
       Agreee. It is now both pre-commit and runtime :). Better safe than sorry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728774245


   Comments applied @ashb 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728907603


   [The Workflow run](https://github.com/apache/airflow/actions/runs/368121748) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525112929



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary

Review comment:
       What Kamil said.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r524970486



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):

Review comment:
       Yeah. This is non-goal for 2.0 and we can certainly and easily change it in the future. 
   
   However, I'd say we **CAN** put such constraint on the providers that they are all in "airflow.providers" package. There is nothing to prevent anyone from issuing their own packages providing - say - "airflow.providers.great-expectations" package. I think there are no limits/ownership problems in PyPI, nor ASF rules preventing from doing it (but I might be wrong here).
   
   Even if we would not like to share "airflow.providers" with 3rd-parties, we can propose another package if we want to separate it out as a convention. I believe everything for airflow (including 3rd-party packages) should begin with 'airflow". For example "airflow-3rd-party.providers" or "airflow-contrib.providers" could do the job and adding several different packages to provider manager should not be a problem.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r524963780



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as pkg_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources as pkg_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:

Review comment:
       I think that might work but it will complicate the whole selection of when we install providers from sources and when as packages (the setup.py selection criteria will have to be much more complex, because you will always want to install airflow/providers/manager.py but not instal anything else there.
   
   Also In this case we will not be able to detect the case "no providers  installed" easily. Right now I log a warning if "airflow.providers" cannot be imported.
   
   Also it will complicate #11435. One of the checks we want to do is to make sure none of the code in core is importing my of the "airflow.providers" stuff. I plan to use that class in webserver, CLI and API which would mean that we will have to manually filter out those imports if we move it to "airflow.providers"
   
   I think "providers" package should be purely for provider implementation (see below as well), not for provider's management - similarly as we do 'plugin_manger,py". You even commented that in my POC that it should be moved to where 'pliugin" management is. I think it's a good place where it is.now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #12383:
URL: https://github.com/apache/airflow/pull/12383


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728247155


   Let me know WDYT


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728845274


   The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728244324


   Hey @ashb @kaxil  @dimberman  @turbaszek  @mik-laj  - we discussed it at the 2.0 call, I  know there was my POC done differently and there were some ideas with pluggr, but I think using the metadata from `provider.yaml` that @mik-laj added is a super-simple and very powerful.
   
   I added the provider discovery mechanism - I made sure it works with implicit packages and supports multiple location of installation of provider packages (so you can have provider packages installed in several locations as well as locally and it should work). Also I am reading the provider.yaml using importlb.resources (from 3.7 but I added optional backport for 3.6). 
   
   This means that this provider.yaml discovery should work regardless of the installation method of the packages (so it will even work if the packages are installed as .zip files, which is theoretically supported). 
   
   This is very versatile and simple and it should allow us to do things like:
   
   * developing API to discover which providers are installed  - including description/version/ even all the information that is now part of the documentation
   * displaying installed providers in the UI (including all the extra information)
   * CLI to inspect the providers installed as well
   * refreshing list of providers on signal or webserver reload or periodically (it is super-fast)
   
   Providing that the idea will be ok for others, I think I can have most of the discovery and registration done by tomorrow.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728244324


   Hey @ashb @kaxil  @dimberman  @turbaszek  - we discussed it at the 2.0 call, I  know there was my POC done differently and there were some ideas with pluggr, but I think using the metadata from `provider.yaml` that @mik-laj added is a super-simple and a very powerfulidea. 
   
   I added the provider discovery mechanism - I made sure it works with implicit packages and supports multiple location of installation of provider packages (so you can have provider packages installed in several locations as well as locally and it should work). Also I am reading the provider.yaml using importlb.resources (from 3.7 but I added optional backport for 3.6). 
   
   This means that this provider.yaml discovery should work regardless of the installation method of the packages (so it will even work if the packages are installed as .zip files, which is theoretically supported). 
   
   This is very versatile and simple and it should allow us to do things like:
   
   * developing API to discover which providers are installed  - including description/version/ even all the information that is now part of the documentation
   * displaying installed providers in the UI (including all the extra information)
   * CLI to inspect the providers installed as well
   * refreshing list of providers on signal or webserver reload or periodically (it is super-fast)
   
   Providing that the idea will be ok for others, I think I can have most of the discovery and registration done by tomorrow.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-728244324


   Hey @ashb @kaxil  @dimberman  @turbaszek  @mik-laj  - we discussed it at the 2.0 call, I  know there was my POC done differently and there were some ideas with pluggr, but I think using the metadata from `provider.yaml` that @mik-laj added is a super-simple and a very powerfulidea. 
   
   I added the provider discovery mechanism - I made sure it works with implicit packages and supports multiple location of installation of provider packages (so you can have provider packages installed in several locations as well as locally and it should work). Also I am reading the provider.yaml using importlb.resources (from 3.7 but I added optional backport for 3.6). 
   
   This means that this provider.yaml discovery should work regardless of the installation method of the packages (so it will even work if the packages are installed as .zip files, which is theoretically supported). 
   
   This is very versatile and simple and it should allow us to do things like:
   
   * developing API to discover which providers are installed  - including description/version/ even all the information that is now part of the documentation
   * displaying installed providers in the UI (including all the extra information)
   * CLI to inspect the providers installed as well
   * refreshing list of providers on signal or webserver reload or periodically (it is super-fast)
   
   Providing that the idea will be ok for others, I think I can have most of the discovery and registration done by tomorrow.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#discussion_r525072740



##########
File path: airflow/providers_manager.py
##########
@@ -0,0 +1,78 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Manages all providers."""
+import importlib
+import logging
+import pkgutil
+import traceback
+
+import yaml
+
+try:
+    import importlib.resources as importlib_resources
+except ImportError:
+    # Try backported to PY<37 `importlib_resources`.
+    import importlib_resources
+
+
+log = logging.getLogger(__name__)
+
+
+class ProvidersManager:
+    """Manages all provider packages."""
+
+    def __find_all_providers(self, paths: str):
+        def onerror(_):
+            exception_string = traceback.format_exc()
+            log.warning(exception_string)
+
+        for module_info in pkgutil.walk_packages(paths, prefix="airflow.providers.", onerror=onerror):
+            try:
+                imported_module = importlib.import_module(module_info.name)
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when importing %s:%s", module_info.name, e)
+                continue
+            try:
+                provider_info = yaml.safe_load(
+                    importlib_resources.read_text(imported_module, 'provider.yaml')
+                )
+                # TODO(potiuk): map to a class maybe? Or we might stick to a dictionary
+                self._provider_directory[provider_info['package-name']] = provider_info
+            except FileNotFoundError:
+                # This is OK - this is not a provider package
+                pass
+            except TypeError as e:
+                if "is not a package" not in str(e):
+                    log.warning("Error when loading 'provider.yaml' file from %s:%s}", module_info.name, e)
+                # Otherwise this is OK - this is likely a module
+            except Exception as e:  # noqa pylint: disable=broad-except
+                log.warning("Error when loading 'provider.yaml' file from %s:%s", module_info.name, e)
+
+    def __init__(self):

Review comment:
       It would be great if this was the first method in the class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-729014002






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12383: Adds mechanism for provider package discovery.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12383:
URL: https://github.com/apache/airflow/pull/12383#issuecomment-729016266


   [The Workflow run](https://github.com/apache/airflow/actions/runs/368425751) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org