You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/03 20:36:54 UTC
[GitHub] [airflow] RachitSharma2001 opened a new pull request, #28706: Update provide_bucket_name() decorator to handle new conn_type
RachitSharma2001 opened a new pull request, #28706:
URL: https://github.com/apache/airflow/pull/28706
Addresses issue #27587.
The way I approached this is through requiring the user, when they create an `aws conn type`, to specify the bucket name in the extras field like so:
{
"s3_bucket_name" : [bucket name]
}
I updated the `test_provide_bucket_name` test, as previously it was dependent on the previous `provide_bucket_name()` decorator whose functionality is no longer valid (due to the removal of `s3 conn_type`), as elaborated in the issue.
I have also updated the documentation to inform the user of the `s3_bucket_name` field.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1063303149
##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
if "bucket_name" not in bound_args.arguments:
self = args[0]
- if self.conn_config and self.conn_config.schema:
- bound_args.arguments["bucket_name"] = self.conn_config.schema
Review Comment:
We need also provide backward compatibility with current version.
Basically we need to check `bucket_name` in both places:
1. In `service_config.s3.bucket_name`
2. Fallback to `conn_config.schema` and depreciation warning
This also could be done in [AwsConnectionWrapper](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/utils/connection_wrapper.py#L76)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] o-nikolas merged pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
o-nikolas merged PR #28706:
URL: https://github.com/apache/airflow/pull/28706
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1383860516
@RachitSharma2001 You need to rebase you branch for resolve issues related to removed packages, e.g.: #28962
But everything other looking god to me
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1064049945
##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
if "bucket_name" not in bound_args.arguments:
self = args[0]
- if self.conn_config and self.conn_config.schema:
- bound_args.arguments["bucket_name"] = self.conn_config.schema
+ if self.aws_conn_id:
+ conn_args = self.get_connection(self.aws_conn_id).extra_dejson
Review Comment:
That is a good point. I have added the property `service_config` to `AwsGenericHook`, and now the conditional logic within `provide_bucket_name` is a lot simpler, and it utilizes caching! Thank you for the suggestion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RachitSharma2001 commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1397437544
Sounds good! I have committed those suggestions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RachitSharma2001 commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1379424059
Sorry about that, the tests were failing `static_checks` before. I have now fixed those tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1379437757
@RachitSharma2001 awesome!
Would you like also update information in to documentation: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#configuring-the-connection ?
Some general information about `service_config` and info about `bucket_name` in extra [Examples for the Extra field](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html#examples-for-the-extra-field) or even could create separate section where we should describe available options for different clients.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] o-nikolas commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
o-nikolas commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1080784133
##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -67,6 +67,8 @@ Extra (optional)
Specify the extra parameters (as json dictionary) that can be used in AWS
connection. All parameters are optional.
+ * ``service_config``: json used for specifying information about different AWS services, such as S3 or STS.
Review Comment:
```suggestion
* ``service_config``: json used to specify configuration/parameters for different AWS services, such as S3 or STS.
```
##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -269,6 +271,23 @@ This assumes all other Connection fields eg **AWS Access Key ID** or **AWS Secre
"assume_role_kwargs": { "something":"something" }
}
+5. Using **service_config** to specify information about services such as S3, STS, and EMR
Review Comment:
```suggestion
5. Using **service_config** to specify configuration for services such as S3, STS, and EMR
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1063300999
##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
if "bucket_name" not in bound_args.arguments:
self = args[0]
- if self.conn_config and self.conn_config.schema:
- bound_args.arguments["bucket_name"] = self.conn_config.schema
+ if self.aws_conn_id:
+ conn_args = self.get_connection(self.aws_conn_id).extra_dejson
Review Comment:
Get connection it is quite expensive call there is no reason call it every time when decorated method called.
IMHO better:
1. create new property/attribute in [AwsConnectionWrapper](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/utils/connection_wrapper.py#L76) which store this dictionary
2. Add method to `get_service_config(service_name: str)` to AwsConnectionWrapper which return config for specific config service
```python
def get_service_config(service_name: str) -> dict:
return self.service_config.get(service_name, {})
```
3. Add property into [AwsGenericHook](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/hooks/base_aws.py#L379) which associated with selected `service_name`
```python
@property
def service_config() -> dict:
service_name = client_type or resource_type
return self.conn_config.get_service_config(service_name)
```
In this case we only call get_connection once because `AwsGenericHook.conn_config` is cached property.
##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
if "bucket_name" not in bound_args.arguments:
self = args[0]
- if self.conn_config and self.conn_config.schema:
- bound_args.arguments["bucket_name"] = self.conn_config.schema
+ if self.aws_conn_id:
+ conn_args = self.get_connection(self.aws_conn_id).extra_dejson
Review Comment:
This also make integration with other services easier
##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
if "bucket_name" not in bound_args.arguments:
self = args[0]
- if self.conn_config and self.conn_config.schema:
- bound_args.arguments["bucket_name"] = self.conn_config.schema
+ if self.aws_conn_id:
+ conn_args = self.get_connection(self.aws_conn_id).extra_dejson
Review Comment:
And you need only call in decorator something like this
```python
if "bucket_name" in self.service_config:
...
```
##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -466,16 +466,26 @@ def test_delete_bucket_if_not_bucket_exist(self, s3_bucket):
assert mock_hook.delete_bucket(bucket_name=s3_bucket, force_delete=True)
assert ctx.value.response["Error"]["Code"] == "NoSuchBucket"
- @mock.patch.object(S3Hook, "get_connection", return_value=Connection(schema="test_bucket"))
+ from airflow.providers.amazon.aws.utils.connection_wrapper import AwsConnectionWrapper
Review Comment:
Import should be defined in top level, unless we have strong reason import here
##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
if "bucket_name" not in bound_args.arguments:
self = args[0]
- if self.conn_config and self.conn_config.schema:
- bound_args.arguments["bucket_name"] = self.conn_config.schema
Review Comment:
We need also provide backward compatibility with current version.
Basically we need to check `bucket_name` in both places:
1. In `service_config.s3.bucket_name`
2. Fallback to `conn_config. schema` and depreciation warning
This also could be done in [AwsConnectionWrapper](https://github.com/apache/airflow/blob/8290ade26deba02ca6cf3d8254981b31cf89ee5b/airflow/providers/amazon/aws/utils/connection_wrapper.py#L76)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1062824388
##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -66,6 +66,8 @@ Extra (optional)
Specify the extra parameters (as json dictionary) that can be used in AWS
connection. All parameters are optional.
+ * ``s3_bucket_name``: Name of the S3 bucket that the hook should reference
+
Review Comment:
I'd like the idea for move this setting in separate extra rather than use `schema`
However I don't think service level parameter should stored in top level of AWS Connection Extra.
The reason is simple - `boto3` provide access to 300+ AWS Services, and if we store it in top level, then we have a huge chance to have a mess in the future like we have now mess with credentials and assuming roles
```python
Python 3.9.10 (main, Feb 25 2022, 16:54:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.7.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.7.0
Python 3.9.10 (main, Feb 25 2022, 16:54:01)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
import boto3, botocore
botocore.__version__
Out[3]: '1.29.43'
boto3.__version__
Out[4]: '1.26.43'
session = boto3.session.Session(region_name="us-east-1")
len(session.get_available_services())
Out[6]: 336
```
I'd suggest to move it in separate parameter (dictionary) which store per service level, e.g.
```json
{
"service_config": {
"s3": {
"bucket_name": "awesome-bucket"
},
"sts": {
"endpoint_url": "https://example.org"
},
"emr": {
"job_flow_overrides": {"Name": "PiCalc", "ReleaseLabel": "emr-6.7.0"},
"endpoint_url": "https://emr.example.org"
}
}
```
This changes also might help with this PR: https://github.com/apache/airflow/pull/28545 (cc: @confusedpublic)
And in long term allow us discontinue [Amazon Elastic MapReduce (EMR) Connection](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/emr.html#amazon-elastic-mapreduce-emr-connection)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RachitSharma2001 commented on pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on PR #28706:
URL: https://github.com/apache/airflow/pull/28706#issuecomment-1383220176
Sounds good! I thought that a good idea would be to first inform user of the service_config json field in general, and then to add an example of how to use to specify S3 bucket name, and information pertaining to other services like STS and EMR. You can see how it looks [here](https://github.com/apache/airflow/blob/14194bb6186e1b6a358ffd22dd56b734d3dac5ed/docs/apache-airflow-providers-amazon/connections/aws.rst). Let me know how this looks, and if there is anything I should add or change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1062926098
##########
docs/apache-airflow-providers-amazon/connections/aws.rst:
##########
@@ -66,6 +66,8 @@ Extra (optional)
Specify the extra parameters (as json dictionary) that can be used in AWS
connection. All parameters are optional.
+ * ``s3_bucket_name``: Name of the S3 bucket that the hook should reference
+
Review Comment:
That sounds good. I have updated airflow/providers/amazon/aws/hooks/s3.py to reflect the requested changes. From these changes, the user would be able to specify the S3 bucket name in the "extra" parameter when creating an aws connection, by entering the following json:
```
{
"service_config": {
"s3": {
"bucket_name": "[bucket name]"
},
}
```
Let me know if this is what you had in mind, or if there are any other changes that are needed.
Also, I was not sure how to update the documentation to reflect this option of indicating S3 Bucket name. Would I just specify in `docs/apache-airflow-providers-amazon/connections/aws.rst` that the user should enter `"service_config.s3.bucket_name"` in `Extra's` in order to specify the bucket name, or is there a better way to indicate this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1064049641
##########
tests/providers/amazon/aws/hooks/test_s3.py:
##########
@@ -466,16 +466,26 @@ def test_delete_bucket_if_not_bucket_exist(self, s3_bucket):
assert mock_hook.delete_bucket(bucket_name=s3_bucket, force_delete=True)
assert ctx.value.response["Error"]["Code"] == "NoSuchBucket"
- @mock.patch.object(S3Hook, "get_connection", return_value=Connection(schema="test_bucket"))
+ from airflow.providers.amazon.aws.utils.connection_wrapper import AwsConnectionWrapper
Review Comment:
Good catch, sorry about that. I have removed this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] RachitSharma2001 commented on a diff in pull request #28706: Update provide_bucket_name() decorator to handle new conn_type
Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on code in PR #28706:
URL: https://github.com/apache/airflow/pull/28706#discussion_r1064049714
##########
airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -57,8 +57,14 @@ def wrapper(*args, **kwargs) -> T:
if "bucket_name" not in bound_args.arguments:
self = args[0]
- if self.conn_config and self.conn_config.schema:
- bound_args.arguments["bucket_name"] = self.conn_config.schema
Review Comment:
That makes sense. I have committed changes to include the backwards functionality. Should I also include a test to make sure the backward functionality still works, in `test_s3.py`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org