You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "phofl (via GitHub)" <gi...@apache.org> on 2024/03/08 10:16:33 UTC

[I] [Python/C++] Make upload size per part configurable when uploading to S3 [arrow]

phofl opened a new issue, #40420:
URL: https://github.com/apache/arrow/issues/40420

   ### Describe the enhancement requested
   
   Currently, the size of every part in a multipart upload is hard-coded here: https://github.com/apache/arrow/blob/d2970e1d047f1bd31c31995c35450a7e5bfce3c0/cpp/src/arrow/filesystem/s3fs.cc#L1394-L1400
   
   We've run into issues where the request rate to s3 is too high when uploading form a bigger cluster and S3 errors. 
   
   ```
   OSError("When completing multiple part upload for key '***' in bucket '***': AWS Error SLOW_DOWN during CompleteMultipartUpload operation: Please reduce your request rate.")
   ```
   
   We've tried alleviating this with different s3 bucket prefixes, but this didn't solve the problem completely.
   
   Would you be open to expose an option that makes the part size configurable so that we can configure the chunk size?
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python/C++] Make upload size per part configurable when uploading to S3 [arrow]

Posted by "behcetm (via GitHub)" <gi...@apache.org>.
behcetm commented on issue #40420:
URL: https://github.com/apache/arrow/issues/40420#issuecomment-1996794094

   Hi @phofl,
   
   Thanks for bringing up this topic. This limitation of the fixed size parts also prevents uploading objects larger than 100GB which I think is quite low compared to the maximum object size(5 TB) supported by AWS S3. I would be very interested for a solution to this issue.
   
   My preferred solution would be having the part size configurable per output stream. I guess that could still work for Cloudflare R2.
   
   Behcet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org