You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/05 17:57:05 UTC

[GitHub] [airflow] vincbeck opened a new pull request, #22758: Add S3CreateObjectOperator

vincbeck opened a new pull request, #22758:
URL: https://github.com/apache/airflow/pull/22758

   Create a new operator `S3CreateObjectOperator` to create a new file in S3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r849820953


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   @ferruzzi The difference is that in your PythonOperator example `requests.get(DATA_URL)` is inside the python callable thus executed only when the task is executed while in the `S3CreateObjectOperator` the `requests.get(DATA_URL)` will be executed every time the dag is parsed.
   
   What you wrote should be in fact`HttpToS3Operator` which is a transfer operator. The `S3CreateObjectOperator` as shown here is not transfer operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r844031450


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   I updated the comment.
   
   Well the common use case would be I want to upload a file to S3 (JSON, CSV, ...) and specify its content. Example:
   - I pull some data from an API (e.g.) and I want to store the result into S3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r844141528


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   No, the example dag in the PR is just how to set the Operator.
   I want to see how you thought the operator is to be used with an actual workflow.
   
   I think there may be a wrong assumption here.
   Tasks do not share data between them (other than metadata via Xcom).
   
   What I fail to understand is how this operator will get the given string? (Assuming the string is not hard coded)?
   Would the string be retrieved from other operator? How?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r849992018


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   I'm not sure if you saw my reply @eladkal, but in those three cases I linked, such an http to S3 operator would not help. 
   
   Another example, that is not of the hardcoded variety, that I've run into recently was with the Step Functions operator. There is an operator to run a SF workflow and then another to fetch the json result which is returned, it would be convenient to have an S3 operator to consume the output from xcom and write it to S3, instead of having to use the python operator and write boilder plate code that uses the hook directly to create the object or create a tempfile and use the local to s3 operator.
   
   I'm not sure what the grave concern is about adding this operator, but I'd sure like to have it, and would make use of it.
   
   Maybe we can flip the question around and here what concerns, risks, oversights and pitfalls you see over including this operator? Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r848932625


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   > What I fail to understand is how this operator will get the given string? (Assuming the string is not hard coded)?
   
   From within Airflow use cases I know of (see [here](https://github.com/apache/airflow/blob/2400de2c5ece644cadb870baeea28907fa4dcf58/airflow/providers/amazon/aws/example_dags/example_s3_to_redshift.py#L36), [here](https://github.com/apache/airflow/blob/2400de2c5ece644cadb870baeea28907fa4dcf58/airflow/providers/amazon/aws/example_dags/example_athena.py#L44) and [here](https://github.com/apache/airflow/blob/2400de2c5ece644cadb870baeea28907fa4dcf58/airflow/providers/amazon/aws/example_dags/example_glue.py#L75)) are from hard coded string/data in the dag file.
   
   From user dags, I've seen both hardcoded and runtime.
   
   > But for that to work it means that by some other way someone stored data in Xcom and then you want to create S3 object from the data stored in xcom. This may encourage bad practices.
   
   What is bad practice about a workflow which consumes output of one operation and then writes that data to S3? Taking some output (whether that be json, text, csv, etc) from one operation and persisting that data to object storage is a very common pipeline workflow. In the S3 case you either must write an unnecessary/temporary file to disk or use the S3Hook directly. I've used both, and the latter is much cleaner, but not as convenient as a dedicated operator would be.
   
   But that's just my 2c



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r845540146


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   But for that to work it means that by some other way someone stored data in Xcom and then you want to create S3 object from the data stored in xcom.
   
   This is why I asked to see an actual DAG example using this operator as part of a complete workflow.
   
   Lets see what other thinks...



##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   But for that to work it means that by some other way someone stored data in Xcom and then you want to create S3 object from the data stored in xcom.
   
   This is why I asked to see an actual DAG example using this operator as part of a complete workflow.
   
   Lets see what other think...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r852288469


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   If you can bake one of these examples into the example dag of this operator it's enough for me :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r844031450


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   I updated the comment.
   
   Well the common use case would be I want to upload a file to S3 (JSON, CSV, ...) and specify its content. Example:
   - I pull some data from an API (e.g. JSON) and I want to store the result into S3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r845536666


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   What I meant was just to expose a parameter to get data from Xcom and upload that data into S3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on PR #22758:
URL: https://github.com/apache/airflow/pull/22758#issuecomment-1102971266

   I updated the example which was already there to have some data which look like more "real world" example. I dont know if it's enough? Usually we do not mix up services in our sample dags, that's why I did not include any Xcom example. If you think it is really needed to have an example which uses another service and Xcom to push the results into S3, I'll do that
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r849849776


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   I guess from this discussion we can close this PR and instead, create another to create `HttpToS3Operator `?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r844039214


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   Can you please share some DAG example (can be only with operator names)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r852295181


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   +1 to showing usage in the example dag for users.
   
   Another way to show its usage would be updating some of the code I linked which uses the load_string hook method directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal merged pull request #22758: Add `S3CreateObjectOperator`

Posted by GitBox <gi...@apache.org>.
eladkal merged PR #22758:
URL: https://github.com/apache/airflow/pull/22758


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r849827842


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   `HttpToS3Operator` doesn't exist (yet) but I can see how that would be a solution.  Your correction about the timing of the execution is right though, and appreciated.  Sorry about that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r845521556


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   That's a good point. What I can propose is to add a parameter to retrieve data from Xcom. Here is an example of operator using that technique: https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/transfers/google_api_to_s3.py#L57
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r851990135


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   > Maybe we can flip the question around and here what concerns, risks, oversights and pitfalls you see over including this operator? Thanks!
   
   None.
   
   I said that my concern is the use case of how this operator is to be used. I'm concerned that users will use this differently than what intended - and we actually discovered during this discussion that the original intention was for 
   `HttpToS3Operator`. The process we did here is not something users are likely to do. I am more worried that we are serving users a way to a bad practice even if we don't mean that.
   
   I'd be more comfortable with adding example dag which emphasise how the operator is expected to be used (e.g an actual example that shows the data expected to be passed is metadata).
   
   That said - I am not going to block this operator. We were just having discussion.
   
   If you still think differently i'd be happy to resolve this discussion and more to the code review part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r852289181


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   I'm resolving this thread and let's move to code review.
   Should @vincbeck be able to create such example dag that would be great.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22758:
URL: https://github.com/apache/airflow/pull/22758#issuecomment-1102806291

   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r849831534


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   So @ferruzzi can you describe/show a use case of using the operator as implemented in this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r851990135


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   > Maybe we can flip the question around and here what concerns, risks, oversights and pitfalls you see over including this operator? Thanks!
   
   None.
   
   I said that my concern is the use case of how this operator is to be used. I'm concerned that users will use this differently than what intended - and we actually discovered during this discussion that the original intention was for 
   `HttpToS3Operator`. The process we did here is not something users are likely to do. I am more worried that we are serving users a way to a bad practice even if we don't mean that.
   
   I'd be more comfortable with adding example dag which emphasise how the operator is expected to be used (e.g an actual example that shows the data expected to be passed is metadata).
   
   That said - I am not going to block this operator. We were just having discussion.
   
   If you still think differently i'd be happy to resolve this discussion and move to the code review part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r843298615


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   Given from where?
   Can you describe a use case where this operator is needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r852700747


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from `data` as string or bytes.
+
+    .. seealso::
+        For more information on how to use this operator, take a look at the guide:
+        :ref:`howto/operator:S3CreateObjectOperator`
+
+    :param s3_bucket: Name of the S3 bucket where to save the object. (templated)
+        It should be omitted when `bucket_key` is provided as a full s3:// url.
+    :param s3_key: The key of the object to be created. (templated)
+        It can be either full s3:// style url or relative path from root level.
+        When it's specified as a full s3:// url, please omit bucket_name.
+    :param data: string or bytes to save as content.
+    :param replace: If True, it will overwrite the key if it already exists
+    :param encrypt: If True, the file will be encrypted on the server-side
+        by S3 and will be stored in an encrypted form while at rest in S3.
+    :param acl_policy: String specifying the canned ACL policy for the file being
+        uploaded to the S3 bucket.
+    :param encoding: The string to byte encoding.
+        It should be specified only when `data` is provided as string.
+    :param compression: Type of compression to use, currently only gzip is supported.
+        It can be specified only when `data` is provided as string.
+    :param aws_conn_id: Connection id of the S3 connection to use
+    :param verify: Whether or not to verify SSL certificates for S3 connection.
+        By default SSL certificates are verified.
+
+        You can provide the following values:
+
+        - False: do not validate SSL certificates. SSL will still be used,
+                 but SSL certificates will not be
+                 verified.
+        - path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.
+                 You can specify this argument if you want to use a different
+                 CA cert bundle than the one used by botocore.
+
+    """
+
+    template_fields: Sequence[str] = ('s3_bucket', 's3_key')

Review Comment:
   ```suggestion
       template_fields: Sequence[str] = ('s3_bucket', 's3_key', 'data')
   ```
   
   For the use case described you need also to load metadata from Xcom



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on PR #22758:
URL: https://github.com/apache/airflow/pull/22758#issuecomment-1103527264

   > Usually we do not mix up services in our sample dags
   It doesn't have to be. Example dags may serve more than one purpose - in this case my goal was to show that the data is actually metadata but I'm OK with leaving it as is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r845531831


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   I don't think it's the same...
   It allows to retrieve configurations for the API call from Xcom. The is considered as metadata.
   The operator you shared is `GoogleApiToS3Operator`
   it reads from API and write to S3.
   The xcom part you were referencing is prior to the API call... it's just about the configs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
vincbeck commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r844062731


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   There is one DAG example in this PR. Please take a look at `example_s3.py`. I personally do not really like `LocalFilesystemToS3Operator` because it requires having a local file. If we take the same example I stated, it means I would need to create a local file after I fetched the data from the API and before I store it in S3. I would also need to remove that file afterwards



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r845540146


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   But for that to work it means that by some other way someone stored data in Xcom and then you want to create S3 object from the data stored in xcom. This may encourage bad practices.
   
   This is why I asked to see an actual DAG example using this operator as part of a complete workflow.
   
   Lets see what other think...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r844039214


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   Can you please share DAG example? (can be only with operator names)
   From your description it feels more like :
   https://github.com/apache/airflow/blob/a07bb9b93699beae253a21695d7be66540038a5c/airflow/providers/amazon/aws/transfers/local_to_s3.py#L27



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r849805095


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   FWIW, I just ran into a similar situation working with the SageMaker operators where this would be handy.  Use case:  Given a URL, fetch the csv located there and upload it to S3.  I can either use the S3Hook().load_string() as in Niko's examples, or maybe write the data to a tempfile and use the LocalToS3 transfer; both should work fine, but the operator would be convenient.
   
   ```
   @task
   def load_data_to_S3():
        dataset = requests.get(DATA_URL).content
        s3_hook = S3Hook()
        s3_hook.load_string(dataset, f'{S3_KEY}/input_data.csv', S3_BUCKET, replace=True)
   ```
   
   could become
   
   ```
    load_data_to_S3 = S3CreateObjectOperator(
        bucket=S3_BUCKET,
        key=f'{S3_KEY}/input_data.csv',
        data=requests.get(DATA_URL).content,
        replace=True,
    )
    ```
    
    I guess in the end maybe neither is strictly "better" but I'd say it isn't unreasonable for a user to expect to find an operator to do that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on a diff in pull request #22758: Add S3CreateObjectOperator

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r852286408


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   >  and we actually discovered during this discussion that the original intention was for HttpToS3Operator
   
   
   I think the conversation got a bit muxed by one example being given at the top then an assumption being made that such an example is the only applicable usecase ever possible. I have mentioned examples in my previous two comments about usecases for which the HttpToS3Operator would _not_ help and this operator would. But, as always, I'm happy to disagree a commit if you deem it dangerous or reckless to merge this than we can continue to use the hook directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org