You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2007/03/16 00:12:09 UTC

[jira] Created: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Speculative Execution and output of Reduce tasks
------------------------------------------------

Key: HADOOP-1127
URL: https://issues.apache.org/jira/browse/HADOOP-1127
Project: Hadoop
Issue Type: Improvement
Components: mapred
Affects Versions: 0.12.0
Reporter: Arun C Murthy
Assigned To: Arun C Murthy
Fix For: 0.13.0

We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.

As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.

Proposal:

Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position.

We create the temp file in the job's output directory itself:
<output_dir>/_<taskid> (emphasis on the leading '_')

On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.

Thoughts?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070420_8.patch

Updated to reflect another set of changes to trunk...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Open  (was: Patch Available)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070423_9.patch

Made a minor change to SortValidator to ensure it doesn't pick up temporary directories of speculative tasks which are cleaned-up late, this sometimes caused sort-validation to failed which relies on the no. of files to deduce the partitions.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070328_1.patch

Early patch for review while I continue testing...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_200700424_10.patch

Moved the helper api to recursively delete a directory to FileUtil on Owen's suggestion...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_200700424_10.patch, HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment:     (was: HADOOP-1127_20070420_8.patch)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490159 ] 

Hadoop QA commented on HADOOP-1127:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12355857/HADOOP-1127_20070420_8.patch applied and successfully tested against trunk revision r530153.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/62/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/62/console

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Open  (was: Patch Available)

Nigel complains that tests hang with this patch... rechecking.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1127:
----------------------------------

    Status: Open  (was: Patch Available)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Open  (was: Patch Available)

There is a subtle bug in the handling of failed moves of the files to it's final resting place... I'll resubmit another patch after testing it.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070420_8.patch

Updated patch to reflect recent changes to trunk...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488017 ] 

Tom White commented on HADOOP-1127:
-----------------------------------

Nigel, are your tests OK now? Thanks.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070424_10.patch

Updated to reflect changes to trunk...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch, HADOOP-1127_20070424_10.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486822 ] 

Hadoop QA commented on HADOOP-1127:
-----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12354945/HADOOP-1127_20070405_5.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525596. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070403_4.patch

Fixed some minor bugs which caused test cases to fail...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070331_2.patch

Tested and updated to reflect trunk...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490334 ] 

Hadoop QA commented on HADOOP-1127:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12355914/HADOOP-1127_20070420_8.patch applied and successfully tested against trunk revision r530556.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/66/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/66/console

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488162 ] 

Doug Cutting commented on HADOOP-1127:
--------------------------------------

Are there other clients of PhasedFileSystem, or should we remove (or at least deprecate) that class as a part of this issue?

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070405_5.patch

Updated patch to reflect recent changes to trunk... 

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070409_6.patch

Fixed the patch so that it works better with arbitrary user-specified output-formats for the output of reduces.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch, HADOOP-1127_20070424_10.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Tom White (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1127:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Arun!

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch, HADOOP-1127_20070424_10.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment:     (was: HADOOP-1127_20070402_3.patch)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491402 ] 

Owen O'Malley commented on HADOOP-1127:
---------------------------------------

+1

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_200700424_10.patch, HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

Resubmitting the fixed patch for the patch-system...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment:     (was: HADOOP-1127_200700424_10.patch)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1127:
----------------------------------

    Comment: was deleted

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_200700424_10.patch, HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491466 ] 

Owen O'Malley commented on HADOOP-1127:
---------------------------------------

Since it was a public class, the standard practice is to leave it in for one more release. So it would be removed before 0.14 is released.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch, HADOOP-1127_20070424_10.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486426 ] 

Hadoop QA commented on HADOOP-1127:
-----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12354863/HADOOP-1127_20070403_4.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524929. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487579 ] 

Hadoop QA commented on HADOOP-1127:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12355179/HADOOP-1127_20070409_6.patch applied and successfully tested against trunk revision r526669.

Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Open  (was: Patch Available)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070402_3.patch

Update patch for review.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Gautam Kowshik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491382 ] 

Gautam Kowshik commented on HADOOP-1127:
----------------------------------------

http://hadoopqa.yst.corp.yahoo.com:8080/hudson/view/Ambient%20Orb/job/Hadoop-FullTest/274/

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_200700424_10.patch, HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491451 ] 

Tom White commented on HADOOP-1127:
-----------------------------------

Just noticed that PhasedFileSystem was deprecated, not removed.  I would suggest creating another issue to remove it.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch, HADOOP-1127_20070424_10.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488803 ] 

Owen O'Malley commented on HADOOP-1127:
---------------------------------------

1. The readFields for Task should always set the taskOutputPath to either a new value or null.
2. I'd prefer to have a getTaskOutputPath that was protected and have the data field private.
3. saveTaskOutput should work on maps too, to support maps that write output.
4. local runner should call saveTaskOutput on maps too.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by Nigel Daley <nd...@yahoo-inc.com>.

I won't be able to test this for a few days, depending on how 0.12.3  
testing goes.

On Apr 4, 2007, at 1:23 PM, Tom White (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/HADOOP-1127? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12486762 ]
>
> Tom White commented on HADOOP-1127:
> -----------------------------------
>
> Thanks Arun.
>
> Nigel, can you confirm your tests don't hang now? Thanks.
>
>> Speculative Execution and output of Reduce tasks
>> ------------------------------------------------
>>
>>                 Key: HADOOP-1127
>>                 URL: https://issues.apache.org/jira/browse/ 
>> HADOOP-1127
>>             Project: Hadoop
>>          Issue Type: Improvement
>>          Components: mapred
>>    Affects Versions: 0.12.0
>>            Reporter: Arun C Murthy
>>         Assigned To: Arun C Murthy
>>             Fix For: 0.13.0
>>
>>         Attachments: HADOOP-1127_20070328_1.patch,  
>> HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch,  
>> HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch
>>
>>
>> We've recently seen instances where jobs run with 'speculative  
>> execution' tend to be quite unstable and fail with  
>> *AlreadyBeingCreatedException* noticed at the NameNode. Also  
>> potentially we could have hairy situations where a failed Reduce  
>> tasks's output could clash with a successful task's (same tip)  
>> output.
>> As it exists, speculative execution relies on the PhasedFileSystem  
>> which creates a temp output file and then on task-completion that  
>> file is 'moved' to its final position via a call to  
>> PhasedFileSystem.commit from ReduceTask.run(). This has lead to  
>> issues such as the above.
>> Proposal:
>> Basically the idea is to due this uniformly for all Reduce tasks  
>> i.e. all reducers create temp files and then have a serialized  
>> 'commit' done by the JobTracker which moves the temp file to it's  
>> final position.
>> We create the temp file in the job's output directory itself:
>> <output_dir>/_<taskid> (emphasis on the leading '_')
>> On task completion we'll add that temp file's path to the  
>> TaskStatus and then the JobTracker moves that file to it's final  
>> position.
>> Thoughts?
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486762 ] 

Tom White commented on HADOOP-1127:
-----------------------------------

Thanks Arun.

Nigel, can you confirm your tests don't hang now? Thanks.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491424 ] 

Hadoop QA commented on HADOOP-1127:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12356177/HADOOP-1127_20070424_10.patch applied and successfully tested against trunk revision r532046.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/73/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/73/console

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch, HADOOP-1127_20070424_10.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Gautam Kowshik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gautam Kowshik updated HADOOP-1127:
-----------------------------------

    Comment: was deleted

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_200700424_10.patch, HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Open  (was: Patch Available)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Gautam Kowshik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491381 ] 

Gautam Kowshik commented on HADOOP-1127:
----------------------------------------

+1 .. Sort500 ran successfully with speculative execution On.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_200700424_10.patch, HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch, HADOOP-1127_20070420_8.patch, HADOOP-1127_20070423_9.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Patch Available  (was: Open)

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Tom White (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1127:
------------------------------

    Status: Open  (was: Patch Available)

Arun, the latest patch isn't applying cleanly to trunk - could you resubmit it please? Thanks.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488709 ] 

Doug Cutting commented on HADOOP-1127:
--------------------------------------

> Should we keep PhasedFileSystem, deprecate it or remove it for now and rewrite it later?

If we think there may be users outside of the core, then we should deprecate it, otherwise we should remove it.  Does anyone know of other users?  If no one speaks up, let's just remove it.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485973 ] 

Hadoop QA commented on HADOOP-1127:
-----------------------------------

-1, because 3 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12354730/HADOOP-1127_20070402_3.patch against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524659. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Status: Open  (was: Patch Available)

Needs a minor change...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488571 ] 

Arun C Murthy commented on HADOOP-1127:
---------------------------------------

As of now the PhasedFileSystem is unused. 

It is also broken in the sense that we ocassionally see it fail with 'AlreadyBeingCreatedException' since there is no synchronization construct available for the PhasedFileSystem to use before the 'moving' the temporary files to their permanent abode. Ideally we would lock the destination directory , move and then unlock.

This patch gets around it by letting the JT act as the arbitrater, which IMO is a hack - albeit the only way to go for now.

In future, once we have a locking mechanism in dfs, we could go back to the PhasedFileSystem...

So, I'm not sure - what do others think? Should we keep PhasedFileSystem, deprecate it or remove it for now and rewrite it later?



> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070419_7.patch

Thanks for the review Owen, here is patch incorporating your feedback...

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, HADOOP-1127_20070403_4.patch, HADOOP-1127_20070405_5.patch, HADOOP-1127_20070409_6.patch, HADOOP-1127_20070419_7.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1127) Speculative Execution and output of Reduce tasks

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1127:
----------------------------------

    Attachment: HADOOP-1127_20070402_3.patch

Fixed a typo in one of the comments.

> Speculative Execution and output of Reduce tasks
> ------------------------------------------------
>
>                 Key: HADOOP-1127
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1127
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1127_20070328_1.patch, HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch
>
>
> We've recently seen instances where jobs run with 'speculative execution' tend to be quite unstable and fail with *AlreadyBeingCreatedException* noticed at the NameNode. Also potentially we could have hairy situations where a failed Reduce tasks's output could clash with a successful task's (same tip) output.
> As it exists, speculative execution relies on the PhasedFileSystem which creates a temp output file and then on task-completion that file is 'moved' to its final position via a call to PhasedFileSystem.commit from ReduceTask.run(). This has lead to issues such as the above.
> Proposal:
> Basically the idea is to due this uniformly for all Reduce tasks i.e. all reducers create temp files and then have a serialized 'commit' done by the JobTracker which moves the temp file to it's final position. 
> We create the temp file in the job's output directory itself:
> <output_dir>/_<taskid> (emphasis on the leading '_')
> On task completion we'll add that temp file's path to the TaskStatus and then the JobTracker moves that file to it's final position.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.