You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 22:54:22 UTC

[GitHub] [airflow] o-nikolas commented on a diff in pull request #22758: Add S3CreateObjectOperator

o-nikolas commented on code in PR #22758:
URL: https://github.com/apache/airflow/pull/22758#discussion_r848932625


##########
airflow/providers/amazon/aws/operators/s3.py:
##########
@@ -318,6 +318,94 @@ def execute(self, context: 'Context'):
         )
 
 
+class S3CreateObjectOperator(BaseOperator):
+    """
+    Creates a new object from a given string or bytes.

Review Comment:
   > What I fail to understand is how this operator will get the given string? (Assuming the string is not hard coded)?
   
   From within Airflow use cases I know of (see [here](https://github.com/apache/airflow/blob/2400de2c5ece644cadb870baeea28907fa4dcf58/airflow/providers/amazon/aws/example_dags/example_s3_to_redshift.py#L36), [here](https://github.com/apache/airflow/blob/2400de2c5ece644cadb870baeea28907fa4dcf58/airflow/providers/amazon/aws/example_dags/example_athena.py#L44) and [here](https://github.com/apache/airflow/blob/2400de2c5ece644cadb870baeea28907fa4dcf58/airflow/providers/amazon/aws/example_dags/example_glue.py#L75)) are from hard coded string/data in the dag file.
   
   From user dags, I've seen both hardcoded and runtime.
   
   > But for that to work it means that by some other way someone stored data in Xcom and then you want to create S3 object from the data stored in xcom. This may encourage bad practices.
   
   What is bad practice about a workflow which consumes output of one operation and then writes that data to S3? Taking some output (whether that be json, text, csv, etc) from one operation and persisting that data to object storage is a very common pipeline workflow. In the S3 case you either must write an unnecessary/temporary file to disk or use the S3Hook directly. I've used both, and the latter is much cleaner, but not as convenient as a dedicated operator would be.
   
   But that's just my 2c



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org