You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "André Kelpe (Jira)" <ji...@apache.org> on 2022/06/03 08:10:00 UTC

[jira] [Created] (ARROW-16746) S3 tag support on write

André Kelpe created ARROW-16746:
-----------------------------------

             Summary: S3 tag support on write
                 Key: ARROW-16746
                 URL: https://issues.apache.org/jira/browse/ARROW-16746
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: André Kelpe


S3 allows tagging data to better organize ones data ([https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)] We use this for efficient downstream processes/inventory management.

Currently arrow/pyarrow does not allow tags to be added on write. This is causing us to scan the bucket and re-apply the tags after a pyrrow based process has run.

I looked through the code and think that it could potentially be done via the metadata mechanism.

The tags need to be added to the CreateMultipartUploadRequest here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156

See also

http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da



--
This message was sent by Atlassian Jira
(v8.20.7#820007)