You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/03 20:36:54 UTC

[GitHub] [airflow] RachitSharma2001 opened a new pull request, #28706: Update provide_bucket_name() decorator to handle new conn_type

RachitSharma2001 opened a new pull request, #28706:
URL: https://github.com/apache/airflow/pull/28706

   Addresses issue #27587. 
   
   The way I approached this is through requiring the user, when they create an `aws conn type`, to specify the bucket name in the extras field like so:
   {
           "s3_bucket_name" : [bucket name]
   }
   
   I updated the `test_provide_bucket_name` test, as previously it was dependent on the previous `provide_bucket_name()` decorator whose functionality is no longer valid (due to the removal of `s3 conn_type`), as elaborated in the issue. 
   
   I have also updated the documentation to inform the user of the `s3_bucket_name` field.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1063303149


##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
 
         if "bucket_name" not in bound_args.arguments:
             self = args[0]
-            if self.conn_config and self.conn_config.schema:
-                bound_args.arguments["bucket_name"] = self.conn_config.schema

Review Comment:
   We need also provide backward compatibility with current version.
   Basically we need to check `bucket_name` in both places:
   1. In `service_config.s3.bucket_name`
   2. Fallback to `conn_config.schema` and depreciation warning
   
   This also could be done in  [AwsConnectionWrapper](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/utils/connection_wrapper.py#L76)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas merged pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
o-nikolas merged PR #28706:
URL: https://github.com/apache/airflow/pull/28706


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1383860516

   @RachitSharma2001 You need to rebase you branch for resolve issues related to removed packages, e.g.: #28962
   
   But everything other looking god to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1064049945


##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
 
         if "bucket_name" not in bound_args.arguments:
             self = args[0]
-            if self.conn_config and self.conn_config.schema:
-                bound_args.arguments["bucket_name"] = self.conn_config.schema
+            if self.aws_conn_id:
+                conn_args = self.get_connection(self.aws_conn_id).extra_dejson

Review Comment:
   That is a good point. I have added the property `service_config` to `AwsGenericHook`, and now the conditional logic within `provide_bucket_name` is a lot simpler, and it utilizes caching! Thank you for the suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1397437544

   Sounds good! I have committed those suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1379424059

   Sorry about that, the tests were failing `static_checks` before. I have now fixed those tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1379437757

   @RachitSharma2001 awesome! 
   
   Would you like also update information in to documentation: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#configuring-the-connection ?
   
   Some general information about `service_config` and info about `bucket_name` in extra [Examples for the Extra field](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#examples-for-the-extra-field) or even could create separate section where we should describe available options for different clients.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1080784133


##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -67,6 +67,8 @@ Extra (optional)
     Specify the extra parameters (as json dictionary) that can be used in AWS
     connection. All parameters are optional.
 
+    * ``service_config``: json used for specifying information about different AWS services, such as S3 or STS.

Review Comment:
   ```suggestion
       * ``service_config``: json used to specify configuration/parameters for different AWS services, such as S3 or STS.
   ```



##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -269,6 +271,23 @@ This assumes all other Connection fields eg **AWS Access Key ID** or **AWS Secre
       "assume_role_kwargs": { "something":"something" }
     }
 
+5. Using **service_config** to specify information about services such as S3, STS, and EMR

Review Comment:
   ```suggestion
   5. Using **service_config** to specify configuration for services such as S3, STS, and EMR
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1063300999


##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
 
         if "bucket_name" not in bound_args.arguments:
             self = args[0]
-            if self.conn_config and self.conn_config.schema:
-                bound_args.arguments["bucket_name"] = self.conn_config.schema
+            if self.aws_conn_id:
+                conn_args = self.get_connection(self.aws_conn_id).extra_dejson

Review Comment:
   Get connection it is quite expensive call there is no reason call it every time when decorated method called.
   
   IMHO better:
   1. create new property/attribute in [AwsConnectionWrapper](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/utils/connection_wrapper.py#L76) which store this dictionary
   
   2. Add method to `get_service_config(service_name: str)` to AwsConnectionWrapper which return config for specific config service
   ```python
   
   def get_service_config(service_name: str) -> dict:
       return self.service_config.get(service_name, {})
   ```
   
   3. Add property into [AwsGenericHook](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/hooks/base_aws.py#L379) which associated with selected `service_name`
   
   ```python
   
   @property
   def service_config() -> dict:
       service_name = client_type or resource_type
       return self.conn_config.get_service_config(service_name)
   ```
   
   In this case we only call get_connection once because `AwsGenericHook.conn_config` is cached property.



##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
 
         if "bucket_name" not in bound_args.arguments:
             self = args[0]
-            if self.conn_config and self.conn_config.schema:
-                bound_args.arguments["bucket_name"] = self.conn_config.schema
+            if self.aws_conn_id:
+                conn_args = self.get_connection(self.aws_conn_id).extra_dejson

Review Comment:
   This also make integration with other services easier



##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
 
         if "bucket_name" not in bound_args.arguments:
             self = args[0]
-            if self.conn_config and self.conn_config.schema:
-                bound_args.arguments["bucket_name"] = self.conn_config.schema
+            if self.aws_conn_id:
+                conn_args = self.get_connection(self.aws_conn_id).extra_dejson

Review Comment:
   And you need only call in decorator something like this
   
   ```python
   
   if "bucket_name" in self.service_config:
       ...
   
   ```



##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -466,16 +466,26 @@ def test_delete_bucket_if_not_bucket_exist(self, s3_bucket):
             assert mock_hook.delete_bucket(bucket_name=s3_bucket, force_delete=True)
         assert ctx.value.response["Error"]["Code"] == "NoSuchBucket"
 
-    @mock.patch.object(S3Hook, "get_connection", return_value=Connection(schema="test_bucket"))
+    from airflow.providers.amazon.aws.utils.connection_wrapper import AwsConnectionWrapper

Review Comment:
   Import should be defined in top level, unless we have strong reason import here



##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
 
         if "bucket_name" not in bound_args.arguments:
             self = args[0]
-            if self.conn_config and self.conn_config.schema:
-                bound_args.arguments["bucket_name"] = self.conn_config.schema

Review Comment:
   We need also provide backward compatibility with current version.
   Basically we need to check `bucket_name` in both places:
   1. In `service_config.s3.bucket_name`
   2. Fallback to `conn_config. schema` and depreciation warning
   
   This also could be done in  [AwsConnectionWrapper](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/utils/connection_wrapper.py#L76)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1062824388


##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -66,6 +66,8 @@ Extra (optional)
     Specify the extra parameters (as json dictionary) that can be used in AWS
     connection. All parameters are optional.
 
+    * ``s3_bucket_name``: Name of the S3 bucket that the hook should reference
+

Review Comment:
   I'd like the idea for move this setting in separate extra rather than use `schema`
   However I don't think service level parameter should stored in top level of AWS Connection Extra.
   
   The reason is simple - `boto3` provide access to 300+ AWS Services, and if we store it in top level, then we have a huge chance to have a mess in the future like we have now mess with credentials and assuming roles
   
   ```python
   Python 3.9.10 (main, Feb 25 2022, 16:54:01) 
   Type 'copyright', 'credits' or 'license' for more information
   IPython 8.7.0 -- An enhanced Interactive Python. Type '?' for help.
   PyDev console: using IPython 8.7.0
   Python 3.9.10 (main, Feb 25 2022, 16:54:01) 
   [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
   
   import boto3, botocore
   
   botocore.__version__
   Out[3]: '1.29.43'
   
   boto3.__version__
   Out[4]: '1.26.43'
   
   session = boto3.session.Session(region_name="us-east-1")
   len(session.get_available_services())
   Out[6]: 336
   ```
   
   I'd suggest to move it in separate parameter (dictionary) which store per service level, e.g.
   
   ```json
   {
     "service_config": {
       "s3": {
         "bucket_name": "awesome-bucket"
       },
       "sts": {
         "endpoint_url": "https://example.org"
       },
       "emr": {
         "job_flow_overrides": {"Name": "PiCalc", "ReleaseLabel": "emr-6.7.0"},
         "endpoint_url": "https://emr.example.org"
       }
   }
   
   ```
   
   This changes also might help with this PR: https://github.com/apache/airflow/pull/28545 (cc: @confusedpublic)
   And in long term allow us discontinue [Amazon Elastic MapReduce (EMR) Connection](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/emr.html#amazon-elastic-mapreduce-emr-connection)
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1383220176

   Sounds good! I thought that a good idea would be to first inform user of the service_config json field in general, and then to add an example of how to use to specify S3 bucket name, and information pertaining to other services like STS and EMR. You can see how it looks [here](https://github.com/apache/airflow/blob/14194bb6186e1b6a358ffd22dd56b734d3dac5ed/docs/apache-airflow-providers-amazon/connections/aws.rst). Let me know how this looks, and if there is anything I should add or change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1062926098


##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -66,6 +66,8 @@ Extra (optional)
     Specify the extra parameters (as json dictionary) that can be used in AWS
     connection. All parameters are optional.
 
+    * ``s3_bucket_name``: Name of the S3 bucket that the hook should reference
+

Review Comment:
   That sounds good. I have updated airflow/providers/amazon/aws/hooks/s3.py to reflect the requested changes. From these changes, the user would be able to specify the S3 bucket name in the "extra" parameter when creating an aws connection, by entering the following json:
   ```
   {
     "service_config": {
       "s3": {
         "bucket_name": "[bucket name]"
       },
   }
   ```
   Let me know if this is what you had in mind, or if there are any other changes that are needed. 
   Also, I was not sure how to update the documentation to reflect this option of indicating S3 Bucket name. Would I just specify in `docs/apache-airflow-providers-amazon/connections/aws.rst` that the user should enter `"service_config.s3.bucket_name"` in `Extra's` in order to specify the bucket name, or is there a better way to indicate this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1064049641


##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -466,16 +466,26 @@ def test_delete_bucket_if_not_bucket_exist(self, s3_bucket):
             assert mock_hook.delete_bucket(bucket_name=s3_bucket, force_delete=True)
         assert ctx.value.response["Error"]["Code"] == "NoSuchBucket"
 
-    @mock.patch.object(S3Hook, "get_connection", return_value=Connection(schema="test_bucket"))
+    from airflow.providers.amazon.aws.utils.connection_wrapper import AwsConnectionWrapper

Review Comment:
   Good catch, sorry about that. I have removed this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1064049714


##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
 
         if "bucket_name" not in bound_args.arguments:
             self = args[0]
-            if self.conn_config and self.conn_config.schema:
-                bound_args.arguments["bucket_name"] = self.conn_config.schema

Review Comment:
   That makes sense. I have committed changes to include the backwards functionality. Should I also include a test to make sure the backward functionality still works, in `test_s3.py`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org