You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/03 07:03:21 UTC

[GitHub] [airflow] ferruzzi opened a new pull request, #27823: Amazon Provider Package user agent

ferruzzi opened a new pull request, #27823:
URL: https://github.com/apache/airflow/pull/27823

   Adding a couple of fields to the boto3 user agent will allow the AWS team to better understand which services and operators to focus improvements on in the future.  This is similar to the user agent fields added by [Databricks](https://github.com/apache/airflow/blob/main/airflow/providers/databricks/hooks/databricks_base.py#L141), [Google](https://github.com/apache/airflow/blob/main/airflow/providers/google/common/hooks/base_google.py#L322), [Yandex](https://github.com/apache/airflow/blob/main/airflow/providers/yandex/hooks/yandex.py#L93), and others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1031868722


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Sorry for the delay, just got back from vacation and it took a little longer to get back into gear.  I ended up wrapping it in a try/except as mentioned and dropped the `if not hook`.  If `hook` is falsy, then it'll error out on the next line at `hook.package_name` anyway and get caught by the `except`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1332599124

   @uranusjr @eladkal Any other thoughts or suggestions on this one?  I'm going to be going to be away for vacation for the holidays in a couple of weeks and would love to get this sorted out before I go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1338368799

   Tried to run them locally to see if the changes from @Taragolis fixed but it looks like breeze is having a hicup now :P  
   
   If I run `breeze testing tests --test-type providers[amazon]` it fails to run the tests with the message `Only 'Providers' test type can specify actual tests with [\]  if I capitalize Providers and run `breeze testing tests --test-type Providers[amazon]` it fails to run the tests and says 
   
   ```
   ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
   pytest: error: unrecognized arguments: --output=/files/warnings-Providers-sqlite.txt
     inifile: /opt/airflow/pytest.ini
     rootdir: /opt/airflow
   
   No stopped containers
   ```?
   
   In both cases it fails before tests are actually run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029015437


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]
+        provider = manager.providers[provider_name]
+        return provider.version
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # While it is generally considered poor form to catch "Exception", under
+            # no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key():
+        dag_id = os.getenv("AIRFLOW_CTX_DAG_ID", "default_dag_id")
+        # The Object Identifier (OID) namespace is used to salt the dag_id value.
+        # That salted value is used to generate a SHA-1 hash which, by definition,
+        # can not (reasonably) be reversed.  No personal data can be inferred or
+        # extracted from the resulting UUID.
+        return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))

Review Comment:
   Just an idea, might if it run DAG_ID context not set then return NIL UUID `00000000-0000-0000-0000-000000000000`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1322718524

   Two failing S3-relatd tests are also failing in main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1336105773

   Two tests are failing in `providers/amazon/aws/hooks/test_s3.py` but they are also failing in main on my side and shouldn't be related to any changes I've made.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028636765


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   `ProvidersManager().hooks` is an `Optional[HookInfo]`, so MyPy will complain that
    ```
   error: Item "None" of "Optional[HookInfo]" has no attribute "package_name"  [union-attr]
   ```
    
    
    I could do something like this instead.  It feels more confusing, but it does catch the (unlikely??) exception case where ProvidersManager breaks?
   
   
   ```
       @classmethod
       def _get_provider_version(cls) -> str:
           """Checks the Providers Manager for the package version."""
           manager = ProvidersManager()
           hook = manager.hooks[cls.conn_type]
           if not hook:
               return "Unknown"
           provider = manager.providers[hook.provider_name]
           return provider.version
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029661905


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   > I thought we also might add/merge config in AwsCredentialHelper rather than in hook
   
   What would be the advantage of adding it there instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029768308


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   Right now AwsCredentialHelper resolve connection configurations between Hook args and Connection parameters.
   So I assume that a good place to keep it there



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043574991


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   Nil UUID is special form of UUID, it is not include any specific data, see: https://datatracker.ietf.org/doc/html/rfc4122.html#section-4-1-7
   
   So it is good point to check if any exception happen then all bits are 0
   
   ```python
   from uuid import UUID
   
   nil = "00000000-0000-0000-0000-000000000000"
   assert UUID(int=0) == UUID(nil)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #27823:
URL: https://github.com/apache/airflow/pull/27823


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1342254484

   Sorry for the holdup, this should pass now.  All tests are passing on my side now in Breeze.
   
   Here's what I came up with:  on [L414 here](https://github.com/apache/airflow/pull/27823/commits/47b69dfada6daa2440c498e38f54c6f8262b1542#diff-4e7fd0a0e31b60e534f7cab71ee4d3591c0758c1e76054f2403b6669546cb2d2L414) I was defaulting to an empty Config() thinking that would be 'falsy' [here](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/utils/connection_wrapper.py#L231) in the Connection wrapper, but it doesn't so it overrides the user-provided values if the user creates a Connection.  So I moved the default to the BaseAws [config property](https://github.com/apache/airflow/pull/27823/commits/47b69dfada6daa2440c498e38f54c6f8262b1542#diff-4e7fd0a0e31b60e534f7cab71ee4d3591c0758c1e76054f2403b6669546cb2d2R532) and removed the option for that to be None.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043599564


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   I didn't know that one of the bit is actually a version until you told that `¯\_(ツ)_/¯`  👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1331099938

   Any other thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1036976868


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   @ferruzzi Sorry miss notification.
   
   This more about preventing circular imports. Right now if user try to obtain _config_  from Secret Backend ([with `_secret` suffix](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html#setting-configuration-options)) it actually happen before some airflow modules properly imported and in this case local import might not help.
   
   It should not affect if user get _variables_ and _connections_ from Secrets Backends.
   
   That mean right now it is fine that this imports inside methods of AwsGenericHook/AwsBaseHook



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028718476


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Alright.   I'm grabbing dinner, I'll make the change ~tomorrow~ soon.   Thanks for catching that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029635094


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]
+        provider = manager.providers[provider_name]
+        return provider.version
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # While it is generally considered poor form to catch "Exception", under
+            # no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key():
+        dag_id = os.getenv("AIRFLOW_CTX_DAG_ID", "default_dag_id")
+        # The Object Identifier (OID) namespace is used to salt the dag_id value.
+        # That salted value is used to generate a SHA-1 hash which, by definition,
+        # can not (reasonably) be reversed.  No personal data can be inferred or
+        # extracted from the resulting UUID.
+        return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))

Review Comment:
   Yeah, that's reasonable.  To be honest, it doesn't really matter.  A hash of two known/fixed values would make a known fixed value, but I suppose all zeroes would be even more obvious.  If I change the provider version to return "Unknown" per the discussion above with [uranusjr](https://github.com/uranusjr), then maybe having this default to "Unknown" or all zeroes is a better option as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1031868722


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Sorry for the delay, just got back from vacation and it took a little longer to get back into gear.  I ended up wrapping it in a try/except as mentioned and dropped the `if not hook`.  If `hook` is falsy, then it'll error out on the next line at `hook.package_name` anyway and get caught by the `except`.
   
   Addressed in https://github.com/apache/airflow/pull/27823/commits/a5fc3bc4a39855b9f3c7fc5ac26505709d191a98



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1036336566


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   @Taragolis What do you think?   Is this alright or do you still want that change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028678124


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Simply checking `if not hook` (or `if hook is None`) makes sense and looks simple enough to me.



##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Checking `if not hook` (or `if hook is None`) makes sense and looks simple enough to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1336217469

   >  so there must be something in your changes that cause it
   
   I don't see how that's possible.   If I check out main, git pull --rebase, and run the tests, they fail.  My code is all in a different branch.
   
   ```
   (airflow-env) ferruzzi:~/workplace/airflow (ferruzzi/boto-user-agent)
   $ git checkout main
   Switched to branch 'main'
   Your branch is up to date with 'apache/main'.
   (airflow-env) ferruzzi:~/workplace/airflow (main)
   $ git pull --rebase
   Already up to date.
   Current branch main is up to date.
   (airflow-env) ferruzzi:~/workplace/airflow (main)
   $ pytest tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id -W ignore::DeprecationWarning -W  ignore::FutureWarning
   =========================================================================== test session starts ===========================================================================
   platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/ANT.AMAZON.COM/ferruzzi/.pyenv/versions/airflow-env/bin/python
   cachedir: .pytest_cache
   rootdir: /home/ANT.AMAZON.COM/ferruzzi/workplace/airflow, configfile: pytest.ini
   plugins: anyio-3.6.2, xdist-2.5.0, forked-1.4.0, requests-mock-1.9.3, flaky-3.7.0, instafail-0.4.2, rerunfailures-9.1.1, cov-3.0.0, timeouts-1.2.1, httpx-0.15.0, asyncio-0.16.0
   setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
   collected 1 item                                                                                                                                                          
   
   tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id FAILED                                    [100%]
   
   ================================================================================ FAILURES =================================================================================
   _______________________________________________ TestAwsS3HookNoMock.test_check_for_bucket_raises_error_with_invalid_conn_id _______________________________________________
   
   self = <tests.providers.amazon.aws.hooks.test_s3.TestAwsS3HookNoMock object at 0x7f7d4a39c610>, monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f7d4a39ca30>
   
       def test_check_for_bucket_raises_error_with_invalid_conn_id(self, monkeypatch):
           monkeypatch.delenv("AWS_PROFILE", raising=False)
           monkeypatch.delenv("AWS_ACCESS_KEY_ID", raising=False)
           monkeypatch.delenv("AWS_SECRET_ACCESS_KEY", raising=False)
           hook = S3Hook(aws_conn_id="does_not_exist")
           # We're mocking all actual AWS calls and don't need a connection. This
           # avoids an Airflow warning about connection cannot be found.
           hook.get_connection = lambda _: None
           with pytest.raises(NoCredentialsError):
   >           hook.check_for_bucket("test-non-existing-bucket")
   E           Failed: DID NOT RAISE <class 'botocore.exceptions.NoCredentialsError'>
   
   tests/providers/amazon/aws/hooks/test_s3.py:63: Failed
   -------------------------------------------------------------------------- Captured stdout setup --------------------------------------------------------------------------
   ========================= AIRFLOW ==========================
   Home of the user: /home/ANT.AMAZON.COM/ferruzzi
   Airflow home /home/ANT.AMAZON.COM/ferruzzi/airflow
   Skipping initializing of the DB as it was initialized already.
   You can re-initialize the database by adding --with-db-init flag when running tests.
   -------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
   [2022-12-03 10:47:34,076] {base_aws.py:121} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name=None). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
   [2022-12-03 10:47:34,087] {credentials.py:1251} INFO - Found credentials in shared credentials file: ~/.aws/credentials
   [2022-12-03 10:47:34,593] {s3.py:221} INFO - Bucket "test-non-existing-bucket" does not exist
   ---------------------------------------------------------------------------- Captured log call ----------------------------------------------------------------------------
   INFO     airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory:base_aws.py:121 No connection ID provided. Fallback on boto3 credential strategy (region_name=None). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
   INFO     botocore.credentials:credentials.py:1251 Found credentials in shared credentials file: ~/.aws/credentials
   INFO     airflow.providers.amazon.aws.hooks.s3.S3Hook:s3.py:221 Bucket "test-non-existing-bucket" does not exist
   ============================================================================ warnings summary =============================================================================
   ../../.pyenv/versions/airflow-env/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
     /home/ANT.AMAZON.COM/ferruzzi/.pyenv/versions/airflow-env/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: asyncio_mode
     
       self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")
   
   -- Docs: https://docs.pytest.org/en/stable/warnings.html
   ========================================================================= short test summary info =========================================================================
   FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id - Failed: DID NOT RAISE <class 'botocor...
   ====================================================================== 1 failed, 1 warning in 1.73s =======================================================================
   (airflow-env) ferruzzi:~/workplace/airflow (main)
   $ 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1336234191

   I am afraid you are one of the few unlucky ones who will have to deal with side-effects.
   
   It is very likely that those tests that fail, rely on side effects from other tests.  It's also possible that your tests are clearing/removing those side effects - that's why those tests start to fail.
   
   Did you run all tessts from providers? 
   
   `breeze testing tests --test-type providers[amazon]`
   
   Those should run full "amazon providers" suite in the same sequence they are run in CI - run them on `main` and see they don't fail. 
   
   If you run those tests individually and they fail - it means they are relying on side effects from other tests. This is because most of our unit tests rely on dags/dagruns/connections etc. created in the unit test DB and the DB is re-used across all the tests.
   
   The only way to solve it for now is to investigate and fix those tests that indeed rely on side effects.
   
   That might involve writing a fixture or setUp method that restores the expected state of the DB records that the test expects - many tests have those, but some of them not and rely on the DB being populated from previous tests - and this is the problem.
   
   Yes. It's not your fault. But also yes - it, unfortunately, falls on your shoulders essentially if your newly added tests or the modified ones interfere with it. 
   
   Ideally by finding and resolving the side-effects in the other tests so that the situation gets improved for the future.
   
   It happens rarely enough that it is not a "common" problem, but unfortunately, you fell a victim of it likely.
   
   And yes it is NOT how it should be. But - unfortunately, we are not living in perfect world and it is, what it is. And hopefully one day we will be able to solve it better and get rid even of the possibility of the side effects to happen, but that would likely require a lot of investment into complete refactoring on how 100s if not 1000s of tests are implemented, so  it's totally not feasible to solve it now once and for all, I am afraid.
   
   We can complain about it (I do) but this is - unfortuntely - quite an effort to fix. And if anyone has an idea how to solve it quickly - that would be fantastic We have tried in the past to just clean the db for every unit tests but with many thousands of them this slows down the whole suite to a crawl. 
   
   Hopefully some day we will figure how to fix it better. But I am afraid quoting King Theoden - "this is not that day".
   
   BTW. This is actually one af a good ideas on how to make an overall improvement in our tests suite to fix this problem permanently. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1338334941

   > Hey @ferruzzi - please rebase now
   
   Pushing now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] shubham22 commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
shubham22 commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1030738579


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   1. I agree that if you already know a list of known dags, you can figure it out. If there is a better suggestion that makes salt unknown but keeps it same across the DAG runs, I think that would be preferred.
   
   2. I like the idea that we are not adding this config on the user. But, does it solve the problem we started with i.e. providing more privacy to the users? Would we end up adding this config for every provider out there? I still think it is not needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029660736


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   Moving to code comments so these can be threaded discussions:
   
   @Taragolis :
   > Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
   dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent
   
   Yeah, it is possible that this may raise an eyebrow and that is something I put a lot of thought into.  In the end I decided that Databricks is already adding the calling method to their user agent field [see link in the PR description] so I felt it was already approved by the community.  And as you mentioned, the Dag ID is a hashed value that never contained any kind of secret/personal/private information even if it could somehow be "decoded".  I am definitely open to hear what others think, but I think we're safe here (obviously, or I wouldn't have submitted it)
   
   In the long term I was thinking it would be neat to try to consolidate these various user agent collection/building methods that various different providers are doing and make some kind of contribution to core that vends these values.  That would make the "let the user opt in to what they are willing to share" concept much easier.  They could easily see what they are sharing universally across providers in one place, and providers who wish to add these values to their agent don't all have to rewrite the book.   I basically copy/pasted the "get package version" method from Yandex [link in PR description] and if another provider wants to include it in their user agent, that's yet another bowl of copypasta.  But a central provider/builder is outside the scope of this PR.  At least three other providers are doing this their own way already, I'd like to get AWS added to that before we start trying to sort out how to standardize everything across providers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1323376936

   @ferruzzi find also new circular imports, same as previously fixed in #26784
   
   It is only happen if the end user stored secrets configuration in secrets backend.
   And unfortunetly it can't be unit tested (in general way) because during unit tests most of the imports already imported
   
   ```python
   ❯ export AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
   ❯ export AIRFLOW__SECRETS__BACKEND_KWARGS='{"config_prefix": "/airflow/config", "region_name": "eu-west-1"}'
   ❯ export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN_SECRET=boom
   ❯ airflow version
   Traceback (most recent call last):
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 449, in _get_secret
       response = self.client.get_secret_value(
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/functools.py", line 993, in __get__
       val = self.func(instance)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 170, in client
       from airflow.providers.amazon.aws.hooks.base_aws import SessionFactory
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/hooks/base_aws.py", line 49, in <module>
       from airflow import __version__ as airflow_version
   ImportError: cannot import name '__version__' from partially initialized module 'airflow' (most likely due to a circular import) (/Users/taragolis/Projects/common/airflow/airflow/__init__.py)
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 123, in _get_config_value_from_secret_backend
       return secrets_client.get_config(config_key)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 434, in get_config
       return self._get_secret(self.config_prefix, key)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 453, in _get_secret
       except self.client.exceptions.ResourceNotFoundException:
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/functools.py", line 993, in __get__
       val = self.func(instance)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 170, in client
       from airflow.providers.amazon.aws.hooks.base_aws import SessionFactory
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/hooks/base_aws.py", line 49, in <module>
       from airflow import __version__ as airflow_version
   ImportError: cannot import name '__version__' from partially initialized module 'airflow' (most likely due to a circular import) (/Users/taragolis/Projects/common/airflow/airflow/__init__.py)
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/Users/taragolis/.pyenv/versions/3.9.10/envs/airflow-dev-env-39/bin/airflow", line 33, in <module>
       sys.exit(load_entry_point('apache-airflow', 'console_scripts', 'airflow')())
     File "/Users/taragolis/.pyenv/versions/3.9.10/envs/airflow-dev-env-39/bin/airflow", line 25, in importlib_load_entry_point
       return next(matches).load()
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/importlib/metadata.py", line 77, in load
       module = import_module(match.group('module'))
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/importlib/__init__.py", line 127, in import_module
       return _bootstrap._gcd_import(name[level:], package, level)
     File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
     File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
     File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
     File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
     File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
     File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
     File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
     File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
     File "<frozen importlib._bootstrap_external>", line 850, in exec_module
     File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
     File "/Users/taragolis/Projects/common/airflow/airflow/__init__.py", line 34, in <module>
       from airflow import settings
     File "/Users/taragolis/Projects/common/airflow/airflow/settings.py", line 37, in <module>
       from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 1685, in <module>
       conf.validate()
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 349, in validate
       self._validate_config_dependencies()
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 449, in _validate_config_dependencies
       is_sqlite = "sqlite" in self.get("database", "sql_alchemy_conn")
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 578, in get
       option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 657, in _get_environment_variables
       option = self._get_env_var_option(section, key)
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 502, in _get_env_var_option
       return _get_config_value_from_secret_backend(os.environ[env_var_secret_path])
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 125, in _get_config_value_from_secret_backend
       raise AirflowConfigException(
   airflow.exceptions.AirflowConfigException: Cannot retrieve config from alternative secrets backend. Make sure it is configured properly and that the Backend is accessible.
   cannot import name '__version__' from partially initialized module 'airflow' (most likely due to a circular import) (/Users/taragolis/Projects/common/airflow/airflow/__init__.py)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029660736


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   Moving to code comments so these can be threaded discussions:
   
   > Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
   dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent
   
   Yeah, it is possible that this may raise an eyebrow and that is something I put a lot of thought into.  In the end I decided that Databricks is already adding the calling method to their user agent field [see link in the PR description] so I felt it was already approved by the community.  And as you mentioned, the Dag ID is a hashed value that never contained any kind of secret/personal/private information even if it could somehow be "decoded".  I am definitely open to hear what others think, but I think we're safe here (obviously, or I wouldn't have submitted it)
   
   In the long term I was thinking it would be neat to try to consolidate these various user agent collection/building methods that various different providers are doing and make some kind of contribution to core that vends these values.  That would make the "let the user opt in to what they are willing to share" concept much easier.  They could easily see what they are sharing universally across providers in one place, and providers who wish to add these values to their agent don't all have to rewrite the book.   I basically copy/pasted the "get package version" method from Yandex [link in PR description] and if another provider wants to include it in their user agent, that's yet another bowl of copypasta.  But a central provider/builder is outside the scope of this PR.  At least three other providers are doing this their own way already, I'd like to get AWS added to that before we start trying to sort out how to standardize everything across providers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1324043073

   > I thought we also might add/merge config in AwsCredentialHelper rather than in hook
   
   What would be the advantage of adding it there instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1030111141


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   1. I can't say that hashes with known algorithm and salt is "completely private". It can't be easily revert however can be easily identify if has a list of known dags, e.g. "5ee762d8-7bac-5131-9887-03dcab3b7111" stands for "example_s3".
   We could talk about this kind of stuff for ages and the result would be that more people would like make/buy "tin foil hat".
   
   2. This config intend to use by vendors of Airflow-as-a-service not end users it also could be not a general Airflow Config it might be just an environment variable "AIRFLOW_BOTOCORE_ADDITIONAL_UA". So this callable could return any vendor specific info, for example:
      - Version of MWAA
      - Version of local runner
      - Version of Google Composer
      - Version of Astronomer
      - Any DAG/Callable related stuff
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1038750746


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   No concern from my side



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1338289139

   Hey @ferruzzi - please rebase now. https://github.com/apache/airflow/pull/28129 that @Taragolis implemented added extra protection for your local env files and moved the test elsewhere.
   
   the problem was real, but you likely have not seen it locally, becuse you have AWS environment variables in your breeze configuration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1030890863


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   I just say about my ideas and suggestion If we talk about me I'm a person who usual check "Send Anonymous Statistic to <CompanyName> for improve <Software Name>" and allow all cookies on every website 😆 .
   
   In general it might be a problem to decide what is actually PII data and what allowed to collect and usual it regulate by different way in different Countries / Regions. 
   
   So idea if some additional data for user agent required than vendor decide what should be enabled depend on endusers regions. It just my assumption because I'm not a lawyer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028718476


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Alright.   I'm grabbing dinner, I'll make the change tomorrow.   Thanks for catching that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1032706825


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   Sorry for the wait, I wanted to think this through and make sure I understood the implications.  If I am understanding correctly: doing it the way it is currently written means that we do not get any data from when Airflow itself connects to SecretsManager.  That was somewhat accidental, but I consider that an absolute win.  I/We don't want to be "snooping" on traffic where Airflow is retrieving secrets.  The main goal here is to understand which services and endpoints are being used in DAGs so we can prioritize adding new hooks and operators accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1336100390

   CI build timed out.   Going to bump it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi closed pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi closed pull request #27823: Amazon Provider Package user agent
URL: https://github.com/apache/airflow/pull/27823


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028613990


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   What is this ignoring?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043588007


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   And actually I thought we do not need any regex. We could just pass value to UUID object and compare with version
   
   ```python
   from uuid import UUID
   
   random = "80eb85a0-5aef-45b6-abbb-f16d62d3db42"
   uuid_v5 = "bf428e1d-f221-55de-a77f-a61755a4d727"
   nil = "00000000-0000-0000-0000-000000000000"
   
   assert UUID(random).version == 4
   assert UUID(uuid_v5).version == 5
   assert UUID(nil).version is None
   ```
   
   And nil uuid not possible to get by use uuid5 however in theory (and infinity time) it is possible to get `00000000-0000-5000-0000-000000000000`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043659232


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   and then realized that without mocking the environment variable that generated the UUID, it's always returning the exception case so I've parameterized the test so it's testing both cases.  Tests should be done in a sec.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029801751


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   Ohhh... and I just figure out that amazon secrets backends do not use `AwsBaseHook` and directly use helper and session factory



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029794228


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   About calling method/function I mean something like that (just simplified version)
   
   ```python
   from airflow.configuration import conf
   
   
   def user_agent() -> dict:
       # Construct some generic part, like airflow version and provider version
       ua = {
           "airflow": "3.0.0",
           "amazon-provider": "15.0.0"
       }
       
       fn = conf.getimport("aws", "extra_botocore_user_agent_callable", fallback=None)
       if fn and callable(fn):
           # Vendor user-agent
           ua.update(fn())
       
       return ua
   ```
   
   Something similar that snowflake provider has
   
   https://github.com/apache/airflow/blob/035315f5cf331a0fc5b1800f708e2f83fcfb2d7a/airflow/providers/snowflake/hooks/snowflake.py#L233-L234



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029794228


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   About calling method/function I mean something like that
   
   ```python
   from airflow.configuration import conf
   
   
   def user_agent() -> dict:
       # Construct some generic part, like airflow version and provider version
       ua = {
           "airflow": "3.0.0",
           "amazon-provider": "15.0.0"
       }
       
       fn = conf.getimport("aws", "extra_botocore_user_agent_callable", fallback=None)
       if fn and callable(fn):
           # Vendor user-agent
           ua.update(fn())
       
       return ua
   ```
   
   Something similar that snowflake provider has
   
   https://github.com/apache/airflow/blob/035315f5cf331a0fc5b1800f708e2f83fcfb2d7a/airflow/providers/snowflake/hooks/snowflake.py#L233-L234



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] shubham22 commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
shubham22 commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1030926167


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   Hahaha really appreciate the help here. Let me consult with our legal counsel to ensure that we are not crossing any legal boundaries here with respect to particular countries/regions :)
   
   And I haven't met a person who doesn't click "accept all cookies" yet. No idea what purpose that annoying as hell pop-up ever achieved. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1038663173


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   cool, thanks @shubham22 
   
   @Taragolis  - Are we good to proceed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029661905


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   > I thought we also might add/merge config in AwsCredentialHelper rather than in hook
   
   @Taragolis  What would be the advantage of adding it there instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1343187709

   @Taragolis and @potiuk  Thank you both for your help on this one.  :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1036337355


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   @shubham22 @Taragolis  - Did you guys work this out?  Can I resolve this thread or is there still some discussion to work through here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] shubham22 commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
shubham22 commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1037558888


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   I confirmed with the managed service legal counsel that we aren't crossing any legal boundaries here. I think we can close it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1031869533


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -42,11 +46,13 @@
 from dateutil.tz import tzlocal
 from slugify import slugify
 
+from airflow import __version__ as airflow_version

Review Comment:
   I think I addressed this in https://github.com/apache/airflow/pull/27823/commits/a5fc3bc4a39855b9f3c7fc5ac26505709d191a98 by moving it to a local import in the helper method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] shubham22 commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
shubham22 commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029876812


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -25,9 +25,13 @@
 from __future__ import annotations
 

Review Comment:
   >     1. Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
   >        dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. `[aws] extra_botocore_user_agent_callable` with full qualified path to callable e.g. `some_vendor.safe.get_user_agent`
   
   While I understand where you're coming from, I disagree with this on multiple aspects - 
   1. Hashed Id doesn't really give us any information about what a user could be naming. It is completely private, and doesn't incur any cost. It rather allows us to make better decisions on our end to serve the users better. If it wasn't hashed, I would agree with your suggestion.
   
   2. No more additional configs, please. We already give a lot of configs to the users, and we should try to minimize the overuse. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] shubham22 commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
shubham22 commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1324312284

   >     1. Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
   >        dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. `[aws] extra_botocore_user_agent_callable` with full qualified path to callable e.g. `some_vendor.safe.get_user_agent`
   
   While I understand where you're coming from, I disagree with this on multiple aspects - 
   1. Hashed Id doesn't really give us any information about what a user could be naming. It is completely private, and doesn't incur any cost. It rather allows us to make better decisions on our end to serve the users better. If it wasn't hashed, I would agree with your suggestion.
   
   2. No more additional configs, please. We already give a lot of configs to the users, and we should try to minimize the overuse. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1324041939

   > Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
   dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent
   
   Yeah, it is possible that this may raise an eyebrow and that is something I put a lot of thought into.  In the end I decided that Databricks is already adding the calling method to their user agent field [see link in the PR description] so I felt it was already approved by the community.  And as you mentioned, the Dag ID is a hashed value that never contained any kind of secret/personal/private information even if it could somehow be "decoded".  I am definitely open to hear what others think, but I think we're safe here (obviously, or I wouldn't have submitted it)
   
   In the long term I was thinking it would be neat to try to consolidate these various user agent collection/building methods that various different providers are doing and make some kind of contribution to core that vends these values.  That would make the "let the user opt in to what they are willing to share" concept much easier.  They could easily see what they are sharing universally across providers in one place, and providers who wish to add these values to their agent don't all have to rewrite the book.   I basically copy/pasted the "get package version" method from Yandex [link in PR description] and if another provider wants to include it in their user agent, that's yet another bowl of copypasta.  But a central provider/builder is outside the scope of this PR.  At least three other providers are doing this their own way already, I'd like to get AWS added to that before we start trying to sort out how to standardize everything across providers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1041523771


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -451,22 +542,36 @@ def get_session(self, region_name: str | None = None) -> boto3.session.Session:
             conn=self.conn_config, region_name=region_name, config=self.config
         ).create_session()
 
+    def _get_config(self, config: Config | None = None) -> Config:
+        """
+        No AWS Operators use the config argument to this method.
+        Keep backward compatibility with other users who might use it
+        """
+        if config is None:
+            config = deepcopy(self.config)
+
+        # ignore[union-attr] is required for this block to appease MyPy
+        # because the user_agent_extra field is generated at runtime.
+        user_agent_config = Config(
+            user_agent_extra=self._generate_user_agent_extra_field(
+                existing_user_agent_extra=config.user_agent_extra  # type: ignore[union-attr]

Review Comment:
   After a while trying to figure out the problem with the S3 test, this is the root.   If I remove the default value on line 414 above, then the S3 test passes, but then conifg here is possibly a NoneType and causes issues.  
   
   I think the solution is to find a better default value for L414 which passes the values from Connection() in.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028636765


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   `manager.hooks` is an `Optional[HookInfo]`, so MyPy will complain that
    ```
   error: Item "None" of "Optional[HookInfo]" has no attribute "package_name"  [union-attr]
   ```
    
    
    I could do something like this instead.  It feels more confusing, but it does catch the (unlikely??) exception case where ProvidersManager breaks?
   
   
   ```
       @classmethod
       def _get_provider_version(cls) -> str:
           """Checks the Providers Manager for the package version."""
           manager = ProvidersManager()
           hook = manager.hooks[cls.conn_type]
           if not hook:
               return "Unknown"
           provider = manager.providers[hook.provider_name]
           return provider.version
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028645709


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Or, following the precedent in `_get_caller`, since this should NEVER cause an issue to bubble up to the user, maybe this is better?
   
   ```
       @classmethod
       def _get_provider_version(cls) -> str:
           """Checks the Providers Manager for the package version."""
           try:
               manager = ProvidersManager()
               hook = manager.hooks[cls.conn_type]
               if not hook:
                   raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
               provider = manager.providers[hook.package_name]
               return provider.version
           except Exception:
               # While it is generally considered poor form to catch "Exception", under
               # no condition should an error here ever cause an issue for the user.
               return "Unknown"
   ```            



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028651854


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   Both of those solutions feel a bit unnecessarily complicated, but I can make the change if you have a preference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1323920887

   This is my personal thoughts
   
   1. Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
   dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. `[aws] extra_botocore_user_agent_callable` with full qualified path to callable e.g. `some_vendor.safe.get_user_agent`
   
   2. I thought we also might add/merge config in AwsCredentialHelper rather than in hook
   https://github.com/apache/airflow/blob/093345cb61cea623bffe6af5a41d6ba417dfcf6f/airflow/providers/amazon/aws/utils/connection_wrapper.py#L229-L235


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043606996


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   Alright, I made a small tweak and rerunning the tests locally, but what I ended up with is `assert UUID(dag_run_key).version in {5, None}`.  UUID().version also verifies that it is a valid format so that will catch a poorly formed UUID or anything thast is a valid UUID but not v5 or nil 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043592808


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   I'm just running the static checks locally and I'll push the no-regex version.  I actually didn't know about the UUID().version check, that's very handy. :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1028636765


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -405,9 +411,68 @@ def __init__(
         self.resource_type = resource_type
 
         self._region_name = region_name
-        self._config = config
+        self._config = config or botocore.config.Config()
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        manager = ProvidersManager()
+        provider_name = manager.hooks[cls.conn_type].package_name  # type: ignore[union-attr]

Review Comment:
   `ProvidersManager().hooks` is an `Optional[HookInfo]`, so MyPy will complain that
    ```
   error: Item "None" of "Optional[HookInfo]" has no attribute "package_name"  [union-attr]
   ```
    
    
    I could do something like this instead.  It feels more confusing, but it does catch the (unlikely??) exception case where ProvidersManager breaks?
   
   
   ```
       @classmethod
       def _get_provider_version(cls) -> str:
           """Checks the Providers Manager for the package version."""
           manager = ProvidersManager()
           hook = manager.hooks[cls.conn_type]
           if not hook:
               return "Unknown"
           provider = manager.providers[hook.package_name]
           return provider.version
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1029094848


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -42,11 +46,13 @@
 from dateutil.tz import tzlocal
 from slugify import slugify
 
+from airflow import __version__ as airflow_version

Review Comment:
   @ferruzzi find also new circular imports, same as previously fixed in #26784
   
   It is only happen if the end user stored secrets configuration in secrets backend.
   And unfortunetly it can't be unit tested (in general way) because during unit tests most of the imports already imported
   
   ```python
   ❯ export AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
   ❯ export AIRFLOW__SECRETS__BACKEND_KWARGS='{"config_prefix": "/airflow/config", "region_name": "eu-west-1"}'
   ❯ export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN_SECRET=boom
   ❯ airflow version
   Traceback (most recent call last):
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 449, in _get_secret
       response = self.client.get_secret_value(
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/functools.py", line 993, in __get__
       val = self.func(instance)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 170, in client
       from airflow.providers.amazon.aws.hooks.base_aws import SessionFactory
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/hooks/base_aws.py", line 49, in <module>
       from airflow import __version__ as airflow_version
   ImportError: cannot import name '__version__' from partially initialized module 'airflow' (most likely due to a circular import) (/Users/taragolis/Projects/common/airflow/airflow/__init__.py)
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 123, in _get_config_value_from_secret_backend
       return secrets_client.get_config(config_key)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 434, in get_config
       return self._get_secret(self.config_prefix, key)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 453, in _get_secret
       except self.client.exceptions.ResourceNotFoundException:
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/functools.py", line 993, in __get__
       val = self.func(instance)
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/secrets/secrets_manager.py", line 170, in client
       from airflow.providers.amazon.aws.hooks.base_aws import SessionFactory
     File "/Users/taragolis/Projects/common/airflow/airflow/providers/amazon/aws/hooks/base_aws.py", line 49, in <module>
       from airflow import __version__ as airflow_version
   ImportError: cannot import name '__version__' from partially initialized module 'airflow' (most likely due to a circular import) (/Users/taragolis/Projects/common/airflow/airflow/__init__.py)
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/Users/taragolis/.pyenv/versions/3.9.10/envs/airflow-dev-env-39/bin/airflow", line 33, in <module>
       sys.exit(load_entry_point('apache-airflow', 'console_scripts', 'airflow')())
     File "/Users/taragolis/.pyenv/versions/3.9.10/envs/airflow-dev-env-39/bin/airflow", line 25, in importlib_load_entry_point
       return next(matches).load()
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/importlib/metadata.py", line 77, in load
       module = import_module(match.group('module'))
     File "/Users/taragolis/.pyenv/versions/3.9.10/lib/python3.9/importlib/__init__.py", line 127, in import_module
       return _bootstrap._gcd_import(name[level:], package, level)
     File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
     File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
     File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
     File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
     File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
     File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
     File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
     File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
     File "<frozen importlib._bootstrap_external>", line 850, in exec_module
     File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
     File "/Users/taragolis/Projects/common/airflow/airflow/__init__.py", line 34, in <module>
       from airflow import settings
     File "/Users/taragolis/Projects/common/airflow/airflow/settings.py", line 37, in <module>
       from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 1685, in <module>
       conf.validate()
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 349, in validate
       self._validate_config_dependencies()
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 449, in _validate_config_dependencies
       is_sqlite = "sqlite" in self.get("database", "sql_alchemy_conn")
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 578, in get
       option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 657, in _get_environment_variables
       option = self._get_env_var_option(section, key)
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 502, in _get_env_var_option
       return _get_config_value_from_secret_backend(os.environ[env_var_secret_path])
     File "/Users/taragolis/Projects/common/airflow/airflow/configuration.py", line 125, in _get_config_value_from_secret_backend
       raise AirflowConfigException(
   airflow.exceptions.AirflowConfigException: Cannot retrieve config from alternative secrets backend. Make sure it is configured properly and that the Backend is accessible.
   cannot import name '__version__' from partially initialized module 'airflow' (most likely due to a circular import) (/Users/taragolis/Projects/common/airflow/airflow/__init__.py)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1032600892


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -42,11 +46,13 @@
 from dateutil.tz import tzlocal
 from slugify import slugify
 
+from airflow import __version__ as airflow_version

Review Comment:
   Yep, this should help. Secrets Backends do not use Hook for communicate with SSM Parameter Store / Secrets Manager: https://github.com/apache/airflow/pull/27823#discussion_r1029801751



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1336249438

   @ferruzzi `tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id` test fail locally because you have credentials in shared credentials file in `default` profile.
   Actually we need to move this test into AwsBaseHook, if we actually need this test. If I understand correctly we try to test that boto3/botocore will raise an error if no credentials provided.
   
   `TestAwsS3Hook.test_create_bucket_no_region_regional_endpoint`: in this test we check that if user want to use regional s3 endpoint for `us-east-1` but do not specify any region than bucket creation will failed (Airflow prevent this) because in this case region would set to `aws-global`
   
   It might happen by two things:
   1. Some changes happen in recent version of `botocore` and this settings ignored or do not return `aws-global` as region.
   2. The settings `{"config_kwargs": {"s3": {"us_east_1_regional_endpoint": "regional"}}}` somehow overwrites after changes in this PR. So other `config_kwargs` might overwrites, e.g. retry strategy.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1338280134

   > Did you run all tessts from providers?
   breeze testing tests --test-type providers[amazon]
   Those should run full "amazon providers" suite in the same sequence they are run in CI - run them on main and see they don't fail.
   
   But they do, that's what I was saying.  There are two tests in hooks/s3.py that fail when I run them locally even when I run them using breeze from main without my code.  I can work on figuring out why and fixing them, but if they fail in breeze in main then I don't see how the code in this PR is the issue.  That should be a separate PR to fix them.
   
   
   **FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id**
   
   (Note, this one is no longer failing in the CI and I have not caught up on my messages from yesterday or this morning, this may already be fixed)
   > /test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id test fail locally because you have credentials in shared credentials file in default profile.
   
   Yup, that looks right.  I propose that if that test expects to not find one, then it should mock the check to see if there is one and return false.  Does that sound like the fight fix to that one to you two?  If we like that solution I can make the fix but I think it belongs in a different PR.
   
   
   **FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3Hook::test_create_bucket_no_region_regional_endpoint**
   
   When run directly (not in breeze) locally it returns an exception that the bucvket already exists, when run in breeze it fails witht he message `botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the CreateBucket operation: The AWS Access Key Id you provided does not exist in our records.`  I want to say that sounds like it is not mocking the S3 connection and is actually hitting the live S3 API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1338477249

   Looks like that commit from @Taragolis fixed one of them.  Looking at the other one, it's throwing `botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the CreateBucket operation: The AWS Access Key Id you provided does not exist in our records.`which sounds like it's not mocking the connection to S3.  I'll poke at it and see what I can figure out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043547508


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   The 5 is a fixed bit in UUID to show the version and the unit tests check for it to confirm the format so that either has to stay or the unit tests need to be changed.  I'd like to keep it, but if you have a strong opinion here or a reason it should be changed I'll update the unit tests too.
   
   For the exception, yeah you are right but I did it for consistency since the others all go by the policy that "nothing should possibly bubble up".  I can change it to IndexError if you still want, I just figured I would explain my reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1043132982


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -409,6 +414,92 @@ def __init__(
         self._config = config
         self._verify = verify
 
+    @classmethod
+    def _get_provider_version(cls) -> str:
+        """Checks the Providers Manager for the package version."""
+        try:
+            manager = ProvidersManager()
+            hook = manager.hooks[cls.conn_type]
+            if not hook:
+                # This gets caught immediately, but without it MyPy complains
+                # Item "None" of "Optional[HookInfo]" has no attribute "package_name"
+                # on the following line and static checks fail.
+                raise ValueError(f"Hook info for {cls.conn_type} not found in the Provider Manager.")
+            provider = manager.providers[hook.package_name]
+            return provider.version
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _find_class_name(target_function_name: str) -> str:
+        """
+        Given a frame off the stack, return the name of the class which made the call.
+        Note: This method may raise a ValueError or an IndexError, but the calling
+        method is catching and handling those.
+        """
+        stack = inspect.stack()
+        # Find the index of the most recent frame which called the provided function name.
+        target_frame_index = [frame.function for frame in stack].index(target_function_name)
+        # Pull that frame off the stack.
+        target_frame = stack[target_frame_index][0]
+        # Get the local variables for that frame.
+        frame_variables = target_frame.f_locals["self"]
+        # Get the class object for that frame.
+        frame_class_object = frame_variables.__class__
+        # Return the name of the class object.
+        return frame_class_object.__name__
+
+    def _get_caller(self, target_function_name: str = "execute") -> str:
+        """Given a function name, walk the stack and return the name of the class which called it last."""
+        try:
+            caller = self._find_class_name(target_function_name)
+            if caller == "BaseSensorOperator":
+                # If the result is a BaseSensorOperator, then look for whatever last called "poke".
+                return self._get_caller("poke")
+            return caller
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "Unknown"
+
+    @staticmethod
+    def _generate_dag_key() -> str:
+        """
+        The Object Identifier (OID) namespace is used to salt the dag_id value.
+        That salted value is used to generate a SHA-1 hash which, by definition,
+        can not (reasonably) be reversed.  No personal data can be inferred or
+        extracted from the resulting UUID.
+        """
+        try:
+            dag_id = os.environ["AIRFLOW_CTX_DAG_ID"]
+            return str(uuid.uuid5(uuid.NAMESPACE_OID, dag_id))
+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"

Review Comment:
   Small nitpick 
   
   1. Seems like there is only KeyError might happen here
   2. Some fixes in nil-uuid
   
   
   
   ```suggestion
           except KeyError:
               # Under no condition should an error here ever cause an issue for the user.
               return "00000000-0000-0000-0000-000000000000"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1336122059

   > Two tests are failing in `providers/amazon/aws/hooks/test_s3.py` but they are also failing in main on my side and shouldn't be related to any changes I've made.
   
   They do not seem to fail on "canary builds" in CI: for main, so there must be something in your changes that cause it  https://github.com/apache/airflow/actions/runs/3607606417 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27823:
URL: https://github.com/apache/airflow/pull/27823#issuecomment-1336254238

   > @ferruzzi `tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id` test fail locally because you have credentials in shared credentials file in `default` profile. Actually we need to move this test into AwsBaseHook, if we actually need this test. If I understand correctly we try to test that boto3/botocore will raise an error if no credentials provided.
   > 
   > `TestAwsS3Hook.test_create_bucket_no_region_regional_endpoint`: in this test we check that if user want to use regional s3 endpoint for `us-east-1` but do not specify any region than bucket creation will failed (Airflow prevent this) because in this case region would set to `aws-global`
   > 
   > It might happen by two things:
   > 
   > 1. Some changes happen in recent version of `botocore` and this settings ignored or do not return `aws-global` as region.
   > 2. The settings `{"config_kwargs": {"s3": {"us_east_1_regional_endpoint": "regional"}}}` somehow overwrites after changes in this PR. So other `config_kwargs` might overwrites, e.g. retry strategy.
   
   I think in this case the side-effect is different than DB:
   
   ```
   INFO     botocore.credentials:credentials.py:1251 Found credentials in shared credentials file: ~/.aws/credentials
   ```
   
   Looks like the tests you added are creating (and not deleting) ~/.aws/credentials - and the failing test expected to raise "No Credentials" error. 
   
   If my hypothesis is right, the fix is two fold:
   
   * make sure your new tests clean-up the ~/.aws/credentials in tearDown/fixture
   * make sure to cleanup ~/.aws/credentials  in setup before the `test_check_for_bucket_raises_error_with_invalid_conn_id` test. 
    
   On one hand the test did not have the proper "Setup" clearing the state before the state to one that was expected from the test, but on the other hand - it would be difficult to foresee when the test was being written that someone else will create and leave such credential file. Hard to blame anyone in particular, other than it's totally not feasible to run thousands of tests each in completely isolated and pristine environment without side-effects like that.
   
   This would be another side-effect example that you might stumble upon and it is something that will happen occasionally unfortunately. Can't imagine how to prevent those kind of problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #27823: Amazon Provider Package user agent

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #27823:
URL: https://github.com/apache/airflow/pull/27823#discussion_r1041523771


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -451,22 +542,36 @@ def get_session(self, region_name: str | None = None) -> boto3.session.Session:
             conn=self.conn_config, region_name=region_name, config=self.config
         ).create_session()
 
+    def _get_config(self, config: Config | None = None) -> Config:
+        """
+        No AWS Operators use the config argument to this method.
+        Keep backward compatibility with other users who might use it
+        """
+        if config is None:
+            config = deepcopy(self.config)
+
+        # ignore[union-attr] is required for this block to appease MyPy
+        # because the user_agent_extra field is generated at runtime.
+        user_agent_config = Config(
+            user_agent_extra=self._generate_user_agent_extra_field(
+                existing_user_agent_extra=config.user_agent_extra  # type: ignore[union-attr]

Review Comment:
   After a while trying to figure out the problem with the S3 test, this is the root.   If I remove the default value on line 414 above, then the S3 test passes, but then conifg here is possibly a NoneType and causes issues.  
   
   I think the solution is to find a better default value for L414 which passes the values from Connection() in.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org