You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "nandan (JIRA)" <ji...@apache.org> on 2011/07/07 17:55:16 UTC

[jira] [Created] (MAPREDUCE-2653) dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles

dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-2653
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2653
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: jobtracker, tasktracker
            Reporter: nandan


I have introduced process monitoring system in hadoop inside tasktracker, which analyses the cpu and memory utilization of each map task and allows me to increase/decrease maximum number of map slots dynamically on each node

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2653) dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles

Posted by "nandan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090015#comment-13090015 ] 

nandan commented on MAPREDUCE-2653:
-----------------------------------

In response to Allen Wittenauer's question: 
How does this method work when the tasks are IO intensive? 

Monitoring system on every TT categorizes and stores each task it runs, into CPU-Intensive and CPU-NonIntensive lists (this includes Memory as well IO intensive tasks) and generates job request by selecting jobs from these lists one by one alternately, considering current cpu-idle time and cpu utilization of the task. Request consists of list of jobs whose map tasks TT can run as extra tasks. This request is submitted to JT through heartbeat, which processes jobs from the request one by one. 

So currently I am treating IO and Memory processes as same.

> dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2653
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2653
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>    Affects Versions: 0.20.203.0
>         Environment: linux
>            Reporter: nandan
>              Labels: map, scheduler, tasks
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I have introduced process monitoring system inside tasktracker, which analyses the cpu and memory utilization of each map task and allows me to increase/decrease maximum number of map slots dynamically on each node. With this I can combine cpu intensive jobs along with memory intensive jobs, thereby reducing the cpu idle time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2653) dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles

Posted by "nandan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nandan updated MAPREDUCE-2653:
------------------------------

           Description: I have introduced process monitoring system inside tasktracker, which analyses the cpu and memory utilization of each map task and allows me to increase/decrease maximum number of map slots dynamically on each node. With this I can combine cpu intensive jobs along with memory intensive jobs, thereby reducing the cpu idle time.  (was: I have introduced process monitoring system in hadoop inside tasktracker, which analyses the cpu and memory utilization of each map task and allows me to increase/decrease maximum number of map slots dynamically on each node)
           Environment: linux
     Affects Version/s: 0.20.203.0
                  Tags: dynamic map slots
                Labels: map scheduler tasks  (was: )
    Remaining Estimate: 672h
     Original Estimate: 672h

> dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2653
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2653
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>    Affects Versions: 0.20.203.0
>         Environment: linux
>            Reporter: nandan
>              Labels: map, scheduler, tasks
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I have introduced process monitoring system inside tasktracker, which analyses the cpu and memory utilization of each map task and allows me to increase/decrease maximum number of map slots dynamically on each node. With this I can combine cpu intensive jobs along with memory intensive jobs, thereby reducing the cpu idle time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2653) dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles

Posted by "nandan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063731#comment-13063731 ] 

nandan commented on MAPREDUCE-2653:
-----------------------------------

To develop a proof of concept I have just concentrated on a cpu utilization.

Currently, I am running multiple jobs simultaneously.
Based on cpu utilization of the task and current cpu idle time, I decide if I can run an extra task of that job (by dynamically increasing map slots), thereby coupling cpu intensive jobs along with jobs which are not cpu intensive

> dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2653
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2653
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>    Affects Versions: 0.20.203.0
>         Environment: linux
>            Reporter: nandan
>              Labels: map, scheduler, tasks
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I have introduced process monitoring system inside tasktracker, which analyses the cpu and memory utilization of each map task and allows me to increase/decrease maximum number of map slots dynamically on each node. With this I can combine cpu intensive jobs along with memory intensive jobs, thereby reducing the cpu idle time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2653) dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062123#comment-13062123 ] 

Allen Wittenauer commented on MAPREDUCE-2653:
---------------------------------------------

How does this method work when the tasks are IO intensive?  What happens if the task forks sub processes?

> dynamic map slots (in addition to predifined) on each node which allows to execute cpu intensive jobs along with memory intensive jobs thereby reducing wastage of cpu cycles
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2653
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2653
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>    Affects Versions: 0.20.203.0
>         Environment: linux
>            Reporter: nandan
>              Labels: map, scheduler, tasks
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I have introduced process monitoring system inside tasktracker, which analyses the cpu and memory utilization of each map task and allows me to increase/decrease maximum number of map slots dynamically on each node. With this I can combine cpu intensive jobs along with memory intensive jobs, thereby reducing the cpu idle time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira