You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Colin Marc (JIRA)" <ji...@apache.org> on 2015/08/13 01:29:46 UTC

[jira] [Created] (HADOOP-12319) S3AFastOutputStream has no ability to apply backpressure

Colin Marc created HADOOP-12319:
-----------------------------------

             Summary: S3AFastOutputStream has no ability to apply backpressure
                 Key: HADOOP-12319
                 URL: https://issues.apache.org/jira/browse/HADOOP-12319
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs/s3
    Affects Versions: 2.7.0
            Reporter: Colin Marc
            Priority: Critical


Currently, users of S3AFastOutputStream can control memory usage with a few settings: {{fs.s3a.threads.{core/max}}}, which controls the number of active uploads (specifically as arguments to a {{ThreadPoolExecutor}}), and {{fs.s3a.max.total.tasks}}, which controls the size of the feeding queue for the {{ThreadPoolExecutor}}.

However, a user can get an almost *guaranteed* crash if the throughput of the writing job is higher than the total S3 throughput, because there is never any backpressure or blocking on calls to {{write}}.

If {{fs.s3a.max.total.tasks}} is set high (the default is 1000), then {{write}} calls will continue to add data to the queue, which can eventually OOM. But if the user tries to set it lower, then writes will fail when the queue is full; the {{ThreadPoolExecutor}} will reject the part with {{java.util.concurrent.RejectedExecutionException}}.

Ideally, calls to {{write}} should *block, not fail* when the queue is full, so as to apply backpressure on whatever the writing process is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)