You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2019/06/10 15:34:00 UTC

[jira] [Commented] (HADOOP-16357) Fix TeraSort Job failing on S3 DirectoryStagingCommitter

    [ https://issues.apache.org/jira/browse/HADOOP-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860088#comment-16860088 ] 

Steve Loughran commented on HADOOP-16357:
-----------------------------------------

I'm movings this to hadoop-common and pointing the blame at the staging committer: it doesn't need to be checking the destination directory.

The normal FileOutputCommitter and the Magic committer both just create the directory and don't worry about the rest.

the staging committer will fail if the dest dir is there, because the default conflict mode is "fail", For the others it is implicitly, "append"

Proposed

* change the default mode to append, so as to be consistent.
* update the docs

Note: if you are using this in spark, spark itself fails fast if the dest dir exists, so to actually have append there you need to tell the spark.write operation to overwrite. Moot for Terasort.

> Fix TeraSort Job failing on S3 DirectoryStagingCommitter
> --------------------------------------------------------
>
>                 Key: HADOOP-16357
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16357
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.1.0, 3.2.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Minor
>         Attachments: MAPREDUCE-7216-001.patch
>
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath and writes partition filename but DirectoryStagingCommitter expects output path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with state FAILED due to: Job setup failed : org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job as Task committer attempt_1559891760159_0011_m_000000_0: Destination path exists and committer conflict resolution mode is "fail"
> 	at org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
> 	at org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org