You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/03/26 21:51:32 UTC

[jira] Commented: (HADOOP-1144) Hadoop should allow a configurable percentage of failed map tasks before declaring a job failed.

    [ https://issues.apache.org/jira/browse/HADOOP-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484207 ] 

Andrzej Bialecki  commented on HADOOP-1144:
-------------------------------------------

Nutch could use this feature too - it's quite common that one of the map tasks, which is e.g. parsing a difficult content like PDF or msdoc, crashes or gets stuck. This should not be fatal to the whole job.

As for the configuration of the number of failed tasks - I think it would be good to be able to choose between an absolute number or a percentage.

> Hadoop should allow a configurable percentage of failed map tasks before declaring a job failed.
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1144
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1144
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Christian Kunz
>             Fix For: 0.13.0
>
>
> In our environment it can occur that some map tasks will fail repeatedly because of corrupt input data, which sometimes is non-critical as long as the amount is limited. In this case it is annoying that the whole Hadoop job fails and cannot be restarted till the corrupt data are identified and eliminated from the input. It would be extremely helpful if the job configuration would allow to indicate how many map tasks are allowed to fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.