You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Eli Collins (Created) (JIRA)" <ji...@apache.org> on 2011/11/27 23:16:40 UTC

[jira] [Created] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Task failures shouldn't result in Job failures 
-----------------------------------------------

                 Key: MAPREDUCE-3473
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tasktracker
    Affects Versions: 0.23.0, 0.20.205.0
            Reporter: Eli Collins


Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158228#comment-13158228 ] 

Ravi Gummadi commented on MAPREDUCE-3473:
-----------------------------------------

It would be good to distinguish between failures based on failure-types. AM can decide whether to reexecute/relaunch that task too many times or not based on the type of failure and number of times that task failed. No ?
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) A single task tracker failure shouldn't result in Job failure

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174704#comment-13174704 ] 

Eli Collins commented on MAPREDUCE-3473:
----------------------------------------

MAPREDUCE-2960 contains details for a specific example. I think what's going on is that the ability to tolerate disk failures now means you can get a set of task attempt failures on a single TT that would have just been one (because the TT used to stop itself when it saw a disk failure).
                
> A single task tracker failure shouldn't result in Job failure 
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Subroto Sanyal (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159834#comment-13159834 ] 

Subroto Sanyal commented on MAPREDUCE-3473:
-------------------------------------------

*mapreduce.map.failures.maxpercent* and *mapreduce.reduce.failures.maxpercent* hold the percentage of failure tolerance of number of Tasks for Job to handle.

Say in case a map fails and it comes under the tolerance limit then the output of the mapper is lost(will not be considered for further computation). Same is with Reducer.

I suggest let user decide the this failure percentage and be ready for such data loss otherwise, it will come to a surprise to user if the value is set to non-zero.

Further I feel there won't be any correct default non-zero value for these configurations. These values depend on user scenarios/use-cases.
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Subroto Sanyal (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173948#comment-13173948 ] 

Subroto Sanyal commented on MAPREDUCE-3473:
-------------------------------------------

@Eli
Is the issue still a valid one?
Any other suggestion/opinion.....
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Sharad Agarwal (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159107#comment-13159107 ] 

Sharad Agarwal commented on MAPREDUCE-3473:
-------------------------------------------

The knob to control job failure due to task failures are mapreduce.map.failures.maxpercent and mapreduce.reduce.failures.maxpercent. The default value is 0. Due you mean these don't work ?
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159387#comment-13159387 ] 

Eli Collins commented on MAPREDUCE-3473:
----------------------------------------

Cool, so we should re-purpose this jira for defaulting *failures.maxpercent to non-zero values? What value do people use in practice?
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) A single task tracker failure shouldn't result in Job failure

Posted by "Sharad Agarwal (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174636#comment-13174636 ] 

Sharad Agarwal commented on MAPREDUCE-3473:
-------------------------------------------

a single machine failure doesn't result in job to fail; thats the whole point of hadoop. smile.

we are missing the difference with *Task* and *TaskAttempt*.  A task gets 4 chances (task attempts) by default to run before the job is declared failed. 

I think this issue can be resolved as Invalid.
                
> A single task tracker failure shouldn't result in Job failure 
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3473) A single task tracker failure shouldn't result in Job failure

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated MAPREDUCE-3473:
-----------------------------------

    Summary: A single task tracker failure shouldn't result in Job failure   (was: Task failures shouldn't result in Job failures )

@Subroto. Thanks for pinging. Users expect their jobs to complete even if a TT running one of their jobs fails right? So we're clear on terminology, what I identified is that a single TT failure can cause the job to fail. A job should be able to survive a single machine failure right?
                
> A single task tracker failure shouldn't result in Job failure 
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159468#comment-13159468 ] 

Mahadev konar commented on MAPREDUCE-3473:
------------------------------------------

I think we do need the defaults to be 0. Changing it to anything else will be a regression. As Sharad said, losing data should not come as a surprise to users.
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159114#comment-13159114 ] 

Eli Collins commented on MAPREDUCE-3473:
----------------------------------------

Ah, I didn't realize these defaulted to zero, thanks for pointing this out. Anyone know the rationale behind having jobs not tolerate a single task failure by default? From reading HADOOP-1144 it seems like this was chosen because it was the initial behavior before the code could handle task failure.
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158272#comment-13158272 ] 

Ravi Gummadi commented on MAPREDUCE-3473:
-----------------------------------------

Currently in trunk (and even in 0.20), task gets re-launched multiple times even after failures on (say until it fails on ) 4 different nodes. Right ? The possibility of a task failing on 4 different nodes because of these special type of task-failures (like disk failures) is very very low. Right ?
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) A single task tracker failure shouldn't result in Job failure

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174685#comment-13174685 ] 

Eli Collins commented on MAPREDUCE-3473:
----------------------------------------

bq. a single machine failure doesn't result in job to fail; thats the whole point of hadoop. smile.

It can, that's the point of this bug!
                
> A single task tracker failure shouldn't result in Job failure 
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Sharad Agarwal (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159158#comment-13159158 ] 

Sharad Agarwal commented on MAPREDUCE-3473:
-------------------------------------------

note this is *task* failure NOT taskattempt failures. Task failure would mean losing processing on corresponding inputsplit. Not all applications would be ok with it. 
Explicitly setting to non-zero value makes sense so losing data doesn't come as surprise for applications.
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158611#comment-13158611 ] 

Eli Collins commented on MAPREDUCE-3473:
----------------------------------------

The issue is that the single failure can be visible to the client. Eg when the JobClient tries to get the tasklog from from a particular TT that's failed. The reliability mechanism should be invisible to the client. See also MAPREDUCE-2960 (A single TT disk failure can cause the job to fail). To see what I'm talking about run a job in a loop and then fail one of the disks on one of the TTs, some percentage of the time this single failure on a single TT can cause the entire job to fail.
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159602#comment-13159602 ] 

Eli Collins commented on MAPREDUCE-3473:
----------------------------------------

Why would a non-zero value result in a job completing successfully but with data loss?
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3473) Task failures shouldn't result in Job failures

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158239#comment-13158239 ] 

Eli Collins commented on MAPREDUCE-3473:
----------------------------------------

That makes sense. Either way a single task failure should never cause a job to fail right?
                
> Task failures shouldn't result in Job failures 
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3473
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3473
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Eli Collins
>
> Currently some task failures may result in job failures. Eg a local TT disk failure seen in TaskLauncher#run, TaskRunner#run, MapTask#run is visible to and can hang the JobClient, causing the job to fail. Job execution should always be able to survive a task failure if there are sufficient resources. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira