You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/11/21 06:13:48 UTC

[GitHub] [airflow] Lee2532 opened a new pull request, #27812: 27811 Add s3 object get

Lee2532 opened a new pull request, #27812:
URL: https://github.com/apache/airflow/pull/27812

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragement file, named `{pr_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] shubham22 commented on pull request #27812: Add `S3GetObjectOperator`

Posted by GitBox <gi...@apache.org>.
shubham22 commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1347936961

   @Lee2532 - I'm wondering if you're still looking into this. Let us know if you need any help. Also, if you get chance, can you clarify the use case as @vincbeck asked? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on pull request #27812: Add `S3GetObejectOperator`

Posted by GitBox <gi...@apache.org>.
vincbeck commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1323890243

   Please also add documentation for this new operator. You likely need to update this [file](https://github.com/apache/airflow/blob/main/docs/apache-airflow-providers-amazon/operators/s3.rst)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed pull request #27812: Add `S3GetObjectOperator`

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #27812: Add `S3GetObjectOperator`
URL: https://github.com/apache/airflow/pull/27812


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #27812: Add `S3GetObejectOperator`

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #27812:
URL: https://github.com/apache/airflow/pull/27812#discussion_r1029564951


##########
docs/apache-airflow-providers-amazon/operators/s3.rst:
##########
@@ -190,6 +190,21 @@ You can specify a ``prefix`` to filter the objects whose name begins with such p
     :start-after: [START howto_operator_s3_list]
     :end-before: [END howto_operator_s3_list]
 
+
+Get Amazon S3 one object
+======================

Review Comment:
   ```suggestion
   Get Amazon S3 object
   ======================
   ```



##########
docs/apache-airflow-providers-amazon/operators/s3.rst:
##########
@@ -190,6 +190,21 @@ You can specify a ``prefix`` to filter the objects whose name begins with such p
     :start-after: [START howto_operator_s3_list]
     :end-before: [END howto_operator_s3_list]
 
+

Review Comment:
   ```suggestion
   .. _howto/operator:S3GetObjectOperator:
   
   ```



##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -757,3 +757,36 @@ def execute(self, context: Context):
         )
 
         return hook.list_prefixes(bucket_name=self.bucket, prefix=self.prefix, delimiter=self.delimiter)
+
+class S3GetObejectOperator(BaseOperator):

Review Comment:
   ```suggestion
   class S3GetObjectOperator(BaseOperator):
   ```



##########
docs/apache-airflow-providers-amazon/operators/s3.rst:
##########
@@ -190,6 +190,21 @@ You can specify a ``prefix`` to filter the objects whose name begins with such p
     :start-after: [START howto_operator_s3_list]
     :end-before: [END howto_operator_s3_list]
 
+
+Get Amazon S3 one object
+======================
+
+To one Amazon S3 object within an Amazon S3 bucket you can use

Review Comment:
   ```suggestion
   To read one Amazon S3 object within an Amazon S3 bucket you can use
   ```



##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -757,3 +757,36 @@ def execute(self, context: Context):
         )
 
         return hook.list_prefixes(bucket_name=self.bucket, prefix=self.prefix, delimiter=self.delimiter)
+
+class S3GetObejectOperator(BaseOperator):
+    """
+    Get object from `key-path` as string.
+

Review Comment:
   ```suggestion
   
   .. seealso::
           For more information on how to use this operator, take a look at the guide:
           :ref:`howto/operator:S3GetObjectOperator`
    
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27812: Add `S3GetObjectOperator`

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1327914223

   Would be great to fix all the comments/typos.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #27812: 27811 Add s3 object get

Posted by GitBox <gi...@apache.org>.
uranusjr commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1321512155

   Please follow the contributor guides, run linters locally with pre-commit, and fix the errors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #27812: 27811 Add s3 object get

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1321760619

   @Lee2532 You need to add tests. Because looks like this operator wouldn't work with default XCom backend
   - boto3.resources.factory.s3.Object - not JSON serializable
   - boto3.resources.factory.s3.Object - not `pickle`ble 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Lee2532 commented on pull request #27812: Add `S3GetObjectOperator`

Posted by GitBox <gi...@apache.org>.
Lee2532 commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1347941602

   I'm thinking about not sharing a lot of data using xcom, but I can't think of any special way. If we find that way, we'll create an issue again Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27812: Add `S3GetObjectOperator`

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1348947353

   Closing then


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on pull request #27812: Add `S3GetObejectOperator`

Posted by GitBox <gi...@apache.org>.
vincbeck commented on PR #27812:
URL: https://github.com/apache/airflow/pull/27812#issuecomment-1323953817

   I actually should have asked this question before commenting on this PR but what is your use case here? This operator returns the object specified in `s3_bucket` and `s3_key` but can you describe a DAG where this operator would be used? A S3 file can be big and Xcom should not be used to share large amount of data  between operators. If what you are trying to achieve is reading data from S3 bucket and send it to another source, maybe a transfer operator would be more suitable for this use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org