You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/29 07:50:33 UTC

[GitHub] [airflow] potiuk opened a new pull request #17304: Optimize providers manager lazy loading

potiuk opened a new pull request #17304:
URL: https://github.com/apache/airflow/pull/17304


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r679008413



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()

Review comment:
       I disagree -- the intent here is to preload the hooks and links -- that we need to load the list to do that is to me an implementation detail that we don't strictly care about.
   
   But I don't feel all that strongly about it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#issuecomment-888991656


   Tansient errors only - seems so local_task_job became flaky after recent "QUEUED" state intro - might be worth looking at @ephraimbuddy - I will make issues for those quickly (and qurarantine them)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697919342



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       Thanks! I did not expect my first contribution to be this quick 😂 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#issuecomment-888980642


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697919342



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       Thanks! I did not expect my first contribution to be this quick 🙂 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #17304:
URL: https://github.com/apache/airflow/pull/17304


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697797596



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what

Review comment:
       @potiuk Was this line meant to be removed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697912460



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       Merged! Quickest contribution EVER!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697911224



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       @potiuk Thanks! I opened #17884 to fix it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r678991756



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()

Review comment:
       ```suggestion
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697797801



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       @potiuk Was this line meant to be removed? (Sorry if I don't fully understand this PR, the comment just seemed strange as I was looking at the code.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#issuecomment-889014577


   Should this be in 2.1.3 vs 2.2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#issuecomment-889115510


   Easy cherry-pick and it is rather save. I added it to 2.1.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697832013



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       Not at all - you can add PR to bring it back :).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697797801



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       @potiuk Was this line meant to be removed? (Sorry if I don't fully understand this PR, the comment just reads a bit funny.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r678986950



##########
File path: airflow/providers_manager.py
##########
@@ -112,30 +115,57 @@ def __init__(self):
         self._customized_form_fields_schema_validator = (
             _create_customized_form_field_behaviours_schema_validator()
         )
-        self._initialized = False
+        self._providers_list_initialized = False
+        self._providers_hooks_initialized = False
+        self._providers_extra_links_initialized = False
 
-    def initialize_providers_manager(self):
-        """Lazy initialization of provider data."""
+    def initialize_providers_list(self):
+        """Lazy initialization of providers list."""
         # We cannot use @cache here because it does not work during pytest, apparently each test
         # runs it it's own namespace and ProvidersManager is a different object in each namespace
-        # even if it is singleton but @cache on the initialize_providers_manager message still works in the
+        # even if it is singleton but @cache on the initialize_providers_*  still works in the
         # way that it is called only once for one of the objects (at least this is how it looks like
         # from running tests)
-        if self._initialized:
+        if self._providers_list_initialized:
             return
+        start_time = perf_counter()
+        self.log.debug("Initializing Providers Manager list")
         # Local source folders are loaded first. They should take precedence over the package ones for
         # Development purpose. In production provider.yaml files are not present in the 'airflow" directory
         # So there is no risk we are going to override package provider accidentally. This can only happen
         # in case of local development
         self._discover_all_airflow_builtin_providers_from_local_sources()
         self._discover_all_providers_from_packages()
-        self._discover_hooks()
         self._provider_dict = OrderedDict(sorted(self._provider_dict.items()))
+        self.log.debug(f"Initialization of Providers Manager list took {perf_counter() - start_time} seconds")

Review comment:
       ah yeah. changed it from info :)

##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()

Review comment:
       right. 

##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()

Review comment:
       I think better to make it explicit here. It will work of course if we remove it but, I think this case is special as we know we want to initialize everything and it shows the reader our intent better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697797801



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       @potiuk Was this line meant to be removed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r678967590



##########
File path: airflow/providers_manager.py
##########
@@ -112,30 +115,57 @@ def __init__(self):
         self._customized_form_fields_schema_validator = (
             _create_customized_form_field_behaviours_schema_validator()
         )
-        self._initialized = False
+        self._providers_list_initialized = False
+        self._providers_hooks_initialized = False
+        self._providers_extra_links_initialized = False
 
-    def initialize_providers_manager(self):
-        """Lazy initialization of provider data."""
+    def initialize_providers_list(self):
+        """Lazy initialization of providers list."""
         # We cannot use @cache here because it does not work during pytest, apparently each test
         # runs it it's own namespace and ProvidersManager is a different object in each namespace
-        # even if it is singleton but @cache on the initialize_providers_manager message still works in the
+        # even if it is singleton but @cache on the initialize_providers_*  still works in the
         # way that it is called only once for one of the objects (at least this is how it looks like
         # from running tests)
-        if self._initialized:
+        if self._providers_list_initialized:
             return
+        start_time = perf_counter()
+        self.log.debug("Initializing Providers Manager list")
         # Local source folders are loaded first. They should take precedence over the package ones for
         # Development purpose. In production provider.yaml files are not present in the 'airflow" directory
         # So there is no risk we are going to override package provider accidentally. This can only happen
         # in case of local development
         self._discover_all_airflow_builtin_providers_from_local_sources()
         self._discover_all_providers_from_packages()
-        self._discover_hooks()
         self._provider_dict = OrderedDict(sorted(self._provider_dict.items()))
+        self.log.debug(f"Initialization of Providers Manager list took {perf_counter() - start_time} seconds")

Review comment:
       Can you use %s formatting here? For logs, it is recommended. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697797801



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       @potiuk Was this line meant to be removed? (Sorry if I don't fully understand this PR, the comment just read a bit funny.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] edwardwang888 commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
edwardwang888 commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r697797801



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()
+    manager.initialize_providers_hooks()
+    manager.initialize_providers_extra_links()
 
 
 # This is never executed, but tricks static analyzers (PyDev, PyCharm,)
-# into knowing the types of these symbols, and what
 # they contain.

Review comment:
       @potiuk Was this line meant to be removed? (Sorry if I don't fully understand this PR, the comment just read funny as I was looking at the code.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r678967590



##########
File path: airflow/providers_manager.py
##########
@@ -112,30 +115,57 @@ def __init__(self):
         self._customized_form_fields_schema_validator = (
             _create_customized_form_field_behaviours_schema_validator()
         )
-        self._initialized = False
+        self._providers_list_initialized = False
+        self._providers_hooks_initialized = False
+        self._providers_extra_links_initialized = False
 
-    def initialize_providers_manager(self):
-        """Lazy initialization of provider data."""
+    def initialize_providers_list(self):
+        """Lazy initialization of providers list."""
         # We cannot use @cache here because it does not work during pytest, apparently each test
         # runs it it's own namespace and ProvidersManager is a different object in each namespace
-        # even if it is singleton but @cache on the initialize_providers_manager message still works in the
+        # even if it is singleton but @cache on the initialize_providers_*  still works in the
         # way that it is called only once for one of the objects (at least this is how it looks like
         # from running tests)
-        if self._initialized:
+        if self._providers_list_initialized:
             return
+        start_time = perf_counter()
+        self.log.debug("Initializing Providers Manager list")
         # Local source folders are loaded first. They should take precedence over the package ones for
         # Development purpose. In production provider.yaml files are not present in the 'airflow" directory
         # So there is no risk we are going to override package provider accidentally. This can only happen
         # in case of local development
         self._discover_all_airflow_builtin_providers_from_local_sources()
         self._discover_all_providers_from_packages()
-        self._discover_hooks()
         self._provider_dict = OrderedDict(sorted(self._provider_dict.items()))
+        self.log.debug(f"Initialization of Providers Manager list took {perf_counter() - start_time} seconds")

Review comment:
       Can you use %s formatting here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #17304: More optimized lazy-loading of provider information

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #17304:
URL: https://github.com/apache/airflow/pull/17304#discussion_r678968079



##########
File path: airflow/__init__.py
##########
@@ -74,11 +74,13 @@ def __getattr__(name):
 if not settings.LAZY_LOAD_PROVIDERS:
     from airflow import providers_manager
 
-    providers_manager.ProvidersManager().initialize_providers_manager()
+    manager = providers_manager.ProvidersManager()
+    manager.initialize_providers_list()

Review comment:
       Rather than having to "remember" call this everywhere we need it, I think it would be best if init hooks and init extra_links called this itself.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org