You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/12/10 22:25:00 UTC

[jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct

    [ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285401#comment-16285401 ] 

Steve Loughran commented on HADOOP-15107:
-----------------------------------------

 Specifically, the failure mode to worry about is

# task attempt 1 is instructed to commit its output
# task attempt 1 does so (loads the .pending files, saves a single .pendingset file). As Job commit only loads .pendingset files, it only finds lists of output of committed tasks.
# task attempt 1 fails before reporting its success to the job manager
# job manager creates task attempt 2, which it commits, and also generates a .pendingset file
# job commit loads all .pendingset files under the task attempts
# therefore it will load those of both tasks, and commit them.
# and, as things are done in parallel, there's a risk that the final output contains either the output of both attempts, or, if they have the same filenames, a mix of both.

Proposed solution
# task commit to save the pendingset file in a destination dir of the job attempt, with a filename $task.pendingset. 
# if a second task attempt is executed, then it will save to the same file, so overwrite the list of the first set. 
# which will not be committed (and will need list+abort)



> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15107
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper paper, requires me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the FileOutputCommitter too)
> # show that the Staging committer meets these requirements (most of this is implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset lists from committed tasks to the final destination, where they are read and committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org