You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/03 20:28:59 UTC

[GitHub] [airflow] RachitSharma2001 opened a new pull request, #28705: Update S3ToRedshift Operator docs to indicate multiple key functionality

RachitSharma2001 opened a new pull request, #28705:
URL: https://github.com/apache/airflow/pull/28705

   As shown in this issue #27957, the current documentation for the S3ToRedshift Operator seems to indicate that only one key from S3 can be transferred to Redshift. However, as elaborated [here](https://stackoverflow.com/questions/30869340/aws-redshift-copy-data-from-s3-with-wildcard) and in the aws docs here,  the COPY command from S3 to Redshift automatically looks for **all** keys that matches the given prefix, and then copies all of them to redshift. 
   
   For this PR, I wanted to update the docs to make this clear to Airflow users. I also added a system test that displays this functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28705: Update S3ToRedshift Operator docs to indicate multiple key functionality

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28705:
URL: https://github.com/apache/airflow/pull/28705#discussion_r1061173156


##########
airflow/providers/amazon/aws/transfers/s3_to_redshift.py:
##########
@@ -42,7 +42,7 @@ class S3ToRedshiftOperator(BaseOperator):
     :param schema: reference to a specific schema in redshift database
     :param table: reference to a specific table in redshift database
     :param s3_bucket: reference to a specific S3 bucket
-    :param s3_key: reference to a specific S3 key
+    :param s3_key: reference either to a specific S3 key or a set of keys or folders sharing that prefix

Review Comment:
   What does “a set of keys” mean? Does it has to be a Python set, or is the term being used more liberally? If the latter case I think _collection_ is a more common term.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #28705: Update S3ToRedshift Operator docs to indicate multiple key functionality

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #28705:
URL: https://github.com/apache/airflow/pull/28705#discussion_r1062672973


##########
tests/system/providers/amazon/aws/example_redshift_s3_transfers.py:
##########
@@ -207,6 +208,18 @@ def delete_security_group(sec_group_id: str, sec_group_name: str):
     )
     # [END howto_transfer_s3_to_redshift]
 
+    # [START howto_transfer_s3_to_redshift_multiple_keys]
+    transfer_s3_to_redshift_multiple = S3ToRedshiftOperator(

Review Comment:
   Dont forget to add it to the `chain` command



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] irudson commented on a diff in pull request #28705: Update S3ToRedshift Operator docs to indicate multiple key functionality

Posted by GitBox <gi...@apache.org>.
irudson commented on code in PR #28705:
URL: https://github.com/apache/airflow/pull/28705#discussion_r1061885226


##########
airflow/providers/amazon/aws/transfers/s3_to_redshift.py:
##########
@@ -42,7 +42,7 @@ class S3ToRedshiftOperator(BaseOperator):
     :param schema: reference to a specific schema in redshift database
     :param table: reference to a specific table in redshift database
     :param s3_bucket: reference to a specific S3 bucket
-    :param s3_key: reference to a specific S3 key
+    :param s3_key: reference either to a specific S3 key or a set of keys or folders sharing that prefix

Review Comment:
   I suppose a better wording would be:
   ```
   :param s3_key: key prefix that selects single or multiple objects from S3
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] RachitSharma2001 commented on pull request #28705: Update S3ToRedshift Operator docs to indicate multiple key functionality

Posted by GitBox <gi...@apache.org>.
RachitSharma2001 commented on PR #28705:
URL: https://github.com/apache/airflow/pull/28705#issuecomment-1372815972

   Thank you all for the suggestions for the changes. I have updated the wording of the documentation in [airflow/providers/amazon/aws/transfers/s3_to_redshift.py](https://github.com/apache/airflow/pull/28705/files/00906f44d2a2f83e1d79962f94566993ab5657a5#diff-b1fb9ee27a02c9eb122ba0e5c5a578c6a8ebb9c37777894d6765208a740f700a), and have added the requested change to the system test. Let me know if any other changes are needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #28705: Update S3ToRedshift Operator docs to indicate multiple key functionality

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #28705:
URL: https://github.com/apache/airflow/pull/28705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org