You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jothi Padmanabhan (JIRA)" <ji...@apache.org> on 2009/10/06 06:37:31 UTC

[jira] Created: (MAPREDUCE-1060) JT should kill running maps when all the reducers have completed

JT should kill running maps when all the reducers have completed
----------------------------------------------------------------

                 Key: MAPREDUCE-1060
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1060
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Jothi Padmanabhan


We have seen some situations where maps are still running when all the reducers have completed. This could happen because of lost TT's, interplay of speculative tasks with bad TT's etc. If the maps take a long time to run, it unnecessarily delays the job completion time, as this map output is not required anyways. The JT should possibly kill running maps when all the reducers have completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1060) JT should kill running maps when all the reducers have completed

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762500#action_12762500 ] 

Jothi Padmanabhan commented on MAPREDUCE-1060:
----------------------------------------------

Here is one such scenario.
Towards the end of the reduce phase, speculative tasks were launched for some reducers. When these speculative reducers tried to fetch map outputs, the TT  was unable to fetch the map outputs, presumably because the disk had some issues by then. So, these maps were relaunched in some other nodes. In the meanwhile, all the original reducers completed and the speculative reducers killed. So, we have a situation where all the reducers were complete but some maps are still running. 

> JT should kill running maps when all the reducers have completed
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1060
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1060
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Jothi Padmanabhan
>
> We have seen some situations where maps are still running when all the reducers have completed. This could happen because of lost TT's, interplay of speculative tasks with bad TT's etc. If the maps take a long time to run, it unnecessarily delays the job completion time, as this map output is not required anyways. The JT should possibly kill running maps when all the reducers have completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1060) JT should kill running maps when all the reducers have completed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762503#action_12762503 ] 

Arun C Murthy commented on MAPREDUCE-1060:
------------------------------------------

I agree with Devaraj - this definitely smells like a regression...

> JT should kill running maps when all the reducers have completed
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1060
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1060
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Jothi Padmanabhan
>
> We have seen some situations where maps are still running when all the reducers have completed. This could happen because of lost TT's, interplay of speculative tasks with bad TT's etc. If the maps take a long time to run, it unnecessarily delays the job completion time, as this map output is not required anyways. The JT should possibly kill running maps when all the reducers have completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1060) JT should kill running maps when all the reducers have completed

Posted by "Adam Kramer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898468#action_12898468 ] 

Adam Kramer commented on MAPREDUCE-1060:
----------------------------------------

I argue that this is a bug, not an improvement. If the mapper completes successfully on the first try but then fails on the unnecessary 2nd..5th try, the whole job will fail unnecessarily.

Also, this is still occurring. This has been happening a lot lately. It is especially frequent for jobs whose mappers take a long time--because the map node may lose the task tracker for the jobs that finish quickly before the later-ending jobs have finished.

Is there any case in which the side-effects of a second-run mapper would be such that the whole job SHOULD fail even though all the reducers have finished?

> JT should kill running maps when all the reducers have completed
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1060
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1060
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Jothi Padmanabhan
>
> We have seen some situations where maps are still running when all the reducers have completed. This could happen because of lost TT's, interplay of speculative tasks with bad TT's etc. If the maps take a long time to run, it unnecessarily delays the job completion time, as this map output is not required anyways. The JT should possibly kill running maps when all the reducers have completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1060) JT should kill running maps when all the reducers have completed

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762502#action_12762502 ] 

Devaraj Das commented on MAPREDUCE-1060:
----------------------------------------

When jobs finish, all running tasks should be killed via KillJobAction. I will be surprised if this is not happening.

> JT should kill running maps when all the reducers have completed
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1060
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1060
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Jothi Padmanabhan
>
> We have seen some situations where maps are still running when all the reducers have completed. This could happen because of lost TT's, interplay of speculative tasks with bad TT's etc. If the maps take a long time to run, it unnecessarily delays the job completion time, as this map output is not required anyways. The JT should possibly kill running maps when all the reducers have completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1060) JT should kill running maps when all the reducers have completed

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762509#action_12762509 ] 

Jothi Padmanabhan commented on MAPREDUCE-1060:
----------------------------------------------

bq. When jobs finish, all running tasks should be killed via KillJobAction

Here the job did not finish, it looks like job is marked for completion only when all the maps and reducers are finished. In this case, some maps are still running. The Job was marked complete only when the last map finished

> JT should kill running maps when all the reducers have completed
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1060
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1060
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Jothi Padmanabhan
>
> We have seen some situations where maps are still running when all the reducers have completed. This could happen because of lost TT's, interplay of speculative tasks with bad TT's etc. If the maps take a long time to run, it unnecessarily delays the job completion time, as this map output is not required anyways. The JT should possibly kill running maps when all the reducers have completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.