You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/22 22:28:39 UTC

[GitHub] [airflow] confusedpublic opened a new pull request, #28545: Fix Provider Amazon Test Connection when using custom endpoint url

confusedpublic opened a new pull request, #28545:
URL: https://github.com/apache/airflow/pull/28545

   Original `test_connection` behaviour did not account for the use of custom endpoint urls.
   This would lead to all attempts to test such connections to fail, as the calls to 
   `sts.get_caller_identity` would be sent to the actual AWS endpoint, rather than the custom 
   endpoint defined by the `endpoint_url` provided by the connection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] closed pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url
URL: https://github.com/apache/airflow/pull/28545


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #28545:
URL: https://github.com/apache/airflow/pull/28545#issuecomment-1363405163

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] noheton commented on pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
noheton commented on PR #28545:
URL: https://github.com/apache/airflow/pull/28545#issuecomment-1386802509

   Is this related to using S3-compatible object stores?
   Currently our custom X-Com interface using a local minio instance does no longer work.
   
   ```python
   import os
   from airflow.models.connection import Connection
   
   from airflow.providers.amazon.aws.hooks.s3 import S3Hook
   
   conn = Connection(
       conn_id="aws_default",
       conn_type="aws",
       login="<REDACTED>",  # Reference to AWS Access Key ID
       password="<REDACTED>",  # Reference to AWS Secret Access Key
       extra={
           # Specify extra parameters here
           "region_name": "eu-central-1",
           "endpoint_url": "http://localhost:9000",
       },
   )
   
   # Generate Environment Variable Name and Connection URI
   env_key = f"AIRFLOW_CONN_{conn.conn_id.upper()}"
   conn_uri = conn.get_uri()
   print(f"{env_key}={conn_uri}")
   # Test connection
   os.environ[env_key] = conn_uri
   print(conn.test_connection())
   ```
   
   This fails with 
   ```
   [2023-01-18 11:08:08,825] {connection_wrapper.py:303} INFO - AWS Connection (conn_id='aws_default', conn_type='aws') credentials retrieved from login and password.
   (False, 'An error occurred (InvalidParameterValue) when calling the GetCallerIdentity operation: Unsupported action GetCallerIdentity')
   ```
   I get the same message when I try to setup the connection in airflow directly.
   
   Currently we're looking into a workaround using boto3 directly, which still works fine with minio, but we would prefer to use a more integrated solution.
   
   Thanks for any advice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28545:
URL: https://github.com/apache/airflow/pull/28545#issuecomment-1386839687

   @noheton, AFAIK minio doesn't support STS Get Caller Identity, so you can't test connection.
   
   Please create separate discussion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] confusedpublic commented on pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by "confusedpublic (via GitHub)" <gi...@apache.org>.
confusedpublic commented on PR #28545:
URL: https://github.com/apache/airflow/pull/28545#issuecomment-1435943597

   Sorry, I've been busy. I can look at it again this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28545:
URL: https://github.com/apache/airflow/pull/28545#discussion_r1055917750


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -754,7 +754,9 @@ def test_connection(self):
         """
         try:
             session = self.get_session()
-            conn_info = session.client("sts").get_caller_identity()
+            conn_info = session.client(
+                "sts", endpoint_url=self.conn_config.endpoint_url
+            ).get_caller_identity()

Review Comment:
   In case if custom endpoint set in connection then this connection only would work with STS API, otherwise it will raise an error and vice versa if user provide something like `https://elasticmapreduce.eu-west-1.amazonaws.com` then connection test will fail
   
   Unfortunetly this is a limitation of current implementation: only one endpoint for connection.
   
   So this should be fixed on connection level, for example add ability provide endpoint on service level with some backward compatibility
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] confusedpublic commented on a diff in pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
confusedpublic commented on code in PR #28545:
URL: https://github.com/apache/airflow/pull/28545#discussion_r1056686843


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -754,7 +754,9 @@ def test_connection(self):
         """
         try:
             session = self.get_session()
-            conn_info = session.client("sts").get_caller_identity()
+            conn_info = session.client(
+                "sts", endpoint_url=self.conn_config.endpoint_url
+            ).get_caller_identity()

Review Comment:
   So you think we need to implement something like the following on the connection extra details..
   
   ```json
   {
       "service_endpoints": {
           "s3": "https://s3.eu-west-2.amazonaws.com",
           "emr": "https://elasticmapreduce.eu-west-1.amazonaws.com"
       },
       "endpoint_url": "http://localstack:4566"
   }
   ```
   
   where the connection should do something like the following... (very pseudo code):
   
   ```python
   
   def get_client(client_type):
       
       service_endpoint = self.extra.service_endpoints.get("client_type", self.endpoint_url)
       return self.client(client_type, endpoint_url=service_endpoint
   ```
   
   where we first attempt to get a service level endpoint url, falling back to a custom/overall endpoint url, which can default to `None` if not set, therefore allowing the default AWS endpoint url to be used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28545:
URL: https://github.com/apache/airflow/pull/28545#discussion_r1056694827


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -754,7 +754,9 @@ def test_connection(self):
         """
         try:
             session = self.get_session()
-            conn_info = session.client("sts").get_caller_identity()
+            conn_info = session.client(
+                "sts", endpoint_url=self.conn_config.endpoint_url
+            ).get_caller_identity()

Review Comment:
   I worked on the same feature, my implementation something like that for `AwsConnectionWrapper` method
   
   ```python
       def get_service_endpoint(self, service_name: str, *, sts_use_default: bool = False) -> str | None:
           """
           Return endpoint_url for specific service.
   
           :param service_name: The name of a service, e.g. 's3' or 'ec2'.
           :param sts_use_default: Is associate default endpoint url from config with sts service.
               For historical reason AwsBaseHook never use endpoint url from
               Connection because this could break some connections, e.g. obtained by Assume Role.
           """
           if not self._endpoint_url:
               return None
   
           try:
               return self._endpoint_url[service_name]
           except KeyError:
               default = self._endpoint_url.get("_default")
               if service_name != "sts" or sts_use_default:
                   return default
               elif default:
                   warnings.warn(
                       "Default endpoint URL disabled for 'sts' service, please set it explicit.",
                       UserWarning,
                       stacklevel=3,
                   )
               return None
   ```
   
   And some extending in connection:
   1. Allow set `endpoint_url` as dictionary
   2. If `endpoint_url` set as non empty string then convert it to `{'_default': endpoint_url}`
   
   This just a sample, I do not have strong opinion about final implementation. 
   My only point about the current limitation of `endpoint_url` and why we do not provide it into STS client



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on PR #28545:
URL: https://github.com/apache/airflow/pull/28545#issuecomment-1435948533

   > Sorry, I've been busy. 
   
   No problem, it's just a clarification 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #28545:
URL: https://github.com/apache/airflow/pull/28545#issuecomment-1499780831

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] confusedpublic commented on a diff in pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
confusedpublic commented on code in PR #28545:
URL: https://github.com/apache/airflow/pull/28545#discussion_r1056686843


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -754,7 +754,9 @@ def test_connection(self):
         """
         try:
             session = self.get_session()
-            conn_info = session.client("sts").get_caller_identity()
+            conn_info = session.client(
+                "sts", endpoint_url=self.conn_config.endpoint_url
+            ).get_caller_identity()

Review Comment:
   So you think we need to implement something like the following on the connection extra details..
   
   ```json
   {
       "service_endpoints": {
           "s3": "https://s3.eu-west-2.amazonaws.com",
           "emr": "https://elasticmapreduce.eu-west-1.amazonaws.com"
       },
       "endpoint_url": "http://localstack:4566"
   }
   ```
   
   where the connection should do something like the following... (very pseudo code):
   
   ```python
   
   def get_client(client_type):
       
       service_endpoint = self.service_endpoints.get("client_type", self.endpoint_url)
       return self.client(client_type, endpoint_url=service_endpoint
   ```
   
   where we first attempt to get a service level endpoint url, falling back to a custom/overall endpoint url, which can default to `None` if not set, therefore allowing the default AWS endpoint url to be used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28545:
URL: https://github.com/apache/airflow/pull/28545#discussion_r1056692271


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -754,7 +754,9 @@ def test_connection(self):
         """
         try:
             session = self.get_session()
-            conn_info = session.client("sts").get_caller_identity()
+            conn_info = session.client(
+                "sts", endpoint_url=self.conn_config.endpoint_url
+            ).get_caller_identity()

Review Comment:
   Yep, something like that. With some additional logic most related to STS client and historical reason.
   If STS client defined internally in `AwsGenericHook` and `BaseSessionFactory` it should not fallback to default `endpoint_url`. 
   
   Usual users need some additional time for change their connections and if we can not break their connections and workflow it would be nice. Additional logic + warnings it is better approach.
   
   There is two places where we use STS client internally
   1. Check credentials by call STS GetCallerIdentity
   2. Assume role
   
   https://github.com/apache/airflow/blob/b3e26560c7fd835570a0b3a9d65670c87c8cfe0a/airflow/providers/amazon/aws/hooks/base_aws.py#L184-L186
   
   In case if user use [StsHook](https://github.com/apache/airflow/blob/b3e26560c7fd835570a0b3a9d65670c87c8cfe0a/airflow/providers/amazon/aws/hooks/sts.py#L22) or `AwsBaseHook(client_type="sts")` then we should fallback to default `endpoint_url`. The same how it works now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28545: Fix Provider Amazon Test Connection when using custom endpoint url

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on PR #28545:
URL: https://github.com/apache/airflow/pull/28545#issuecomment-1435759264

   @confusedpublic, do you have a plan to continue work on this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org