You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Sean Mackrory (JIRA)" <ji...@apache.org> on 2016/12/06 17:47:58 UTC

[jira] [Updated] (HADOOP-13826) S3A Deadlock in multipart copy due to thread pool limits.

     [ https://issues.apache.org/jira/browse/HADOOP-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Mackrory updated HADOOP-13826:
-----------------------------------
    Attachment: HADOOP-13826.003.patch

For the sake of trying stuff out, attaching a patch that gives an unbounded ThreadPoolExecutor to the BlockingThreadPoolExecutorService, and the original unbounded one to everything else. All tests pass, including the new test that was previously able to induce a deadlock.

I like [~Thomas Demoor]'s point about the control tasks not being memory intensive: having control tasks in an unbounded queue and not having to worry about them overwhelming resources too easily would solve the concern about how to make all these individual pools easily configurable. I'm fairly certain my original proposal would work more completely if rather than having 3 nested executors and only the inner-most one separating tasks into isolated pools, the outer-most executor immediately separated tasks into their own queues as well, and that would still need to be done, but there's still also the concern about relying on internal AWS APIs, which we should probably avoid.

> S3A Deadlock in multipart copy due to thread pool limits.
> ---------------------------------------------------------
>
>                 Key: HADOOP-13826
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13826
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>            Priority: Critical
>         Attachments: HADOOP-13826.001.patch, HADOOP-13826.002.patch, HADOOP-13826.003.patch
>
>
> In testing HIVE-15093 we have encountered deadlocks in the s3a connector. The TransferManager javadocs (http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html) explain how this is possible:
> {quote}It is not recommended to use a single threaded executor or a thread pool with a bounded work queue as control tasks may submit subtasks that can't complete until all sub tasks complete. Using an incorrectly configured thread pool may cause a deadlock (I.E. the work queue is filled with control tasks that can't finish until subtasks complete but subtasks can't execute because the queue is filled).{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org