You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Haibo Chen (JIRA)" <ji...@apache.org> on 2017/07/03 18:11:00 UTC

[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072787#comment-16072787 ] 

Haibo Chen commented on MAPREDUCE-6870:
---------------------------------------

Thanks [~pbacsko] for the patch! Doesn't the fact all reducers have completed always indicate the job is ready to finish? If so, I don't think we need to add another configuration to handle such cases. 

Following that, we could move what's inside in preemptMappersIfNecessary() to checkJobAfterTaskCompletion() (after check for job failure and before job.checkReadyForCommit()), because it is not preempting running mappers but just a part of checking whether a job is ready to finish.

Plus, instead of killing taskattempts, we could just send kill events to map tasks which will in turn kill their individual attempts. Thus, we do not need setNoMoreAttempts() any more.

> Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6870
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.6.1
>            Reporter: Zhe Zhang
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-6870-001.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get scheduled before all reducers are complete, but those mappers run for long time, even after all reducers are complete. This could hurt the performance of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than providing intermediate data to reducers. In that case, the job owner should have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org