You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ferruzzi (via GitHub)" <gi...@apache.org> on 2023/02/17 01:53:01 UTC

[GitHub] [airflow] ferruzzi commented on a diff in pull request #29452: Add support of a different AWS connection for DynamoDB

ferruzzi commented on code in PR #29452:
URL: https://github.com/apache/airflow/pull/29452#discussion_r1109190072


##########
airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py:
##########
@@ -78,16 +85,20 @@ class DynamoDBToS3Operator(BaseOperator):
         :ref:`howto/transfer:DynamoDBToS3Operator`
 
     :param dynamodb_table_name: Dynamodb table to replicate data from
+    :param source_aws_conn_id: The Airflow connection used for AWS credentials
+        to access DynamoDB. If this is None or empty then the default boto3
+        behaviour is used. If running Airflow in a distributed manner and
+        source_aws_conn_id is None or empty, then default boto3 configuration
+        would be used (and must be maintained on each worker node).
     :param s3_bucket_name: S3 bucket to replicate data to
     :param file_size: Flush file to s3 if file size >= file_size
     :param dynamodb_scan_kwargs: kwargs pass to <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan>
     :param s3_key_prefix: Prefix of s3 object key
     :param process_func: How we transforms a dynamodb item to bytes. By default we dump the json
-    :param aws_conn_id: The Airflow connection used for AWS credentials.
-        If this is None or empty then the default boto3 behaviour is used. If
-        running Airflow in a distributed manner and aws_conn_id is None or
-        empty, then default boto3 configuration would be used (and must be
-        maintained on each worker node).
+    :param dest_aws_conn_id: The Airflow connection used for AWS credentials
+        to access S3. If this is no set then the source_aws_conn_id connection is used.

Review Comment:
   ```suggestion
       :param dest_aws_conn_id: The Airflow connection used for AWS credentials
           to access S3. If this is not set then the source_aws_conn_id connection is used.
   ```



##########
airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py:
##########
@@ -32,11 +33,17 @@
 from airflow.models import BaseOperator
 from airflow.providers.amazon.aws.hooks.dynamodb import DynamoDBHook
 from airflow.providers.amazon.aws.hooks.s3 import S3Hook
+from airflow.utils.types import NOTSET, ArgNotSet
 
 if TYPE_CHECKING:
     from airflow.utils.context import Context
 
 
+_DEPRECATION_MSG = (
+    "The aws_conn_id parameter has been deprecated. You should pass instead the source_aws_conn_id parameter."

Review Comment:
   ```suggestion
       "The aws_conn_id parameter has been deprecated. Use the source_aws_conn_id parameter instead."
   ```
   
   I'm still not convinced that dropping aws_conn_id is the right answer here and would prefer to see it left in, but if consensus is to drop it then I guess source_ is a suitable name.



##########
airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py:
##########
@@ -118,10 +131,18 @@ def __init__(
         self.dynamodb_scan_kwargs = dynamodb_scan_kwargs
         self.s3_bucket_name = s3_bucket_name
         self.s3_key_prefix = s3_key_prefix
-        self.aws_conn_id = aws_conn_id
+        if aws_conn_id is not NOTSET:
+            warnings.warn(_DEPRECATION_MSG, DeprecationWarning, stacklevel=3)
+            self.source_aws_conn_id = aws_conn_id
+        else:
+            self.source_aws_conn_id = source_aws_conn_id
+        if dest_aws_conn_id is NOTSET:
+            self.dest_aws_conn_id = self.source_aws_conn_id
+        else:
+            self.dest_aws_conn_id = dest_aws_conn_id

Review Comment:
   You'd have to test it, but this may be easier as 
   
   ```suggestion
       self.dest_aws_conn_id = dest_aws_conn_id or self.source_aws_conn_id
   ```
   
   I'm pretty sure (but not positive) that NOTSET returns falsy



##########
airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py:
##########
@@ -118,10 +131,18 @@ def __init__(
         self.dynamodb_scan_kwargs = dynamodb_scan_kwargs
         self.s3_bucket_name = s3_bucket_name
         self.s3_key_prefix = s3_key_prefix
-        self.aws_conn_id = aws_conn_id
+        if aws_conn_id is not NOTSET:
+            warnings.warn(_DEPRECATION_MSG, DeprecationWarning, stacklevel=3)
+            self.source_aws_conn_id = aws_conn_id
+        else:
+            self.source_aws_conn_id = source_aws_conn_id

Review Comment:
   Should there be an exception if `aws_conn_id` and `source_aws_conn_id` are both provided?  If not, I think it should be made pretty obvious which has priority.  Maybe the fact you are deprecating `aws_conn_id` is enough of a hint, but maybe not.  Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org