You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2019/06/10 16:48:00 UTC

[jira] [Commented] (HADOOP-16357) TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists

    [ https://issues.apache.org/jira/browse/HADOOP-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860143#comment-16860143 ] 

Steve Loughran commented on HADOOP-16357:
-----------------------------------------

also, -1 on the patch. Making the problem "go away" by changing terasort is not the solution, as it breaks the whole idea of "changing committer must be transparent"

Instead I'm going to change the default value. FWIW,  my initial patch does this but also adds a check for the dest path being a file, and fails fast for all committers if that holds. And that hits the fact that the mock tests are so brittle they all go on to fail at this point with the move from fs.exists() to fs.getFileStatus. That is: fixing the mock tests to catch up with a single API call change is more effort than any other part of the patch

> TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-16357
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16357
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0, 3.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Minor
>         Attachments: HADOOP-16357-001.patch, MAPREDUCE-7216-001.patch
>
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath and writes partition filename but DirectoryStagingCommitter expects output path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with state FAILED due to: Job setup failed : org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job as Task committer attempt_1559891760159_0011_m_000000_0: Destination path exists and committer conflict resolution mode is "fail"
> 	at org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
> 	at org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org