You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Aaron Fabbri (Jira)" <ji...@apache.org> on 2020/07/08 03:12:00 UTC

[jira] [Commented] (HADOOP-16798) job commit failure in S3A MR magic committer test

    [ https://issues.apache.org/jira/browse/HADOOP-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153199#comment-17153199 ] 

Aaron Fabbri commented on HADOOP-16798:
---------------------------------------

I missed the party on this one, but just had a thought.. Did you consider inserting a failure point that hangs one of the commit threads when they POST data? Either delay the POST or the response? Would that make it easier to reproduce these cases?

Thanks for the fix.

> job commit failure in S3A MR magic committer test
> -------------------------------------------------
>
>                 Key: HADOOP-16798
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16798
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>             Fix For: 3.3.1
>
>         Attachments: stdout
>
>
> failure in 
> {code}
> ITestS3ACommitterMRJob.test_200_execute:304->Assert.fail:88 Job job_1578669113137_0003 failed in state FAILED with cause Job commit failed: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@6e894de2 rejected from org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor@225eed53[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
> {code}
> Stack implies thread pool rejected it, but toString says "Terminated". Race condition?
> *update 2020-04-22*: it's caused when a task is aborted in the AM -the threadpool is disposed of, and while that is shutting down in one thread, task commit is initiated using the same thread pool. When the task committer's destroy operation times out, it kills all the active uploads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org