You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "StuartHadfield (via GitHub)" <gi...@apache.org> on 2023/04/28 10:59:51 UTC

[GitHub] [arrow] StuartHadfield commented on issue #34892: [C++] Mechanism for throttling remote filesystems to avoid rate limiting

StuartHadfield commented on issue #34892:
URL: https://github.com/apache/arrow/issues/34892#issuecomment-1527388677

   Chiming in here as I'm a pyarrow user and having immense difficulty with this. Food for thought:
   
   Imagine a scenario where you have nearly continuous influx of data, which you need to render into parquet and store on S3. A backoff strategy works fine and well for a single write, but when you have loads of data incoming, if you get rate limited, and you backoff, you risk falling behind to a point where it's very difficult to catch up.
   
   This is, of course, hypothetical, but it illustrates that whilst throttling and retry with backoff would be *very* useful for 90% of use cases (and I would certainly appreciate them, I just do not possess the programming skill to implement them here :( ), there are some niche circumstances where we may need to consider batching writes more efficiently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org