You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2009/09/27 23:46:16 UTC

[jira] Created: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Ability to automatically move machines from one MR compute cluster to another
-----------------------------------------------------------------------------

                 Key: MAPREDUCE-1044
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: dhruba borthakur
            Assignee: Dmytro Molkov


We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760346#action_12760346 ] 

dhruba borthakur commented on MAPREDUCE-1044:
---------------------------------------------

Thanks Allen. We will try Solaris ZFS when the disk quotas become reality. In the meantime, the only way for us to get real job isolation is to move machines from one cluster to another :-)

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "Dmytro Molkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmytro Molkov updated MAPREDUCE-1044:
-------------------------------------

    Attachment: DCD.pdf

I am attaching the initial proposal of the way to do this.

I would love to hear your thoughts on this document, I am currently experimenting with the design described.

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: DCD.pdf
>
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760340#action_12760340 ] 

Allen Wittenauer commented on MAPREDUCE-1044:
---------------------------------------------

Change your OS. :)

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780882#action_12780882 ] 

Arun C Murthy commented on MAPREDUCE-1044:
------------------------------------------

This proposal seems like we are about to re-invent Torque by adding yet another daemon to Hadoop Map-Reduce. You can also use HoD, along with features in Torque, to have a few large clusters and move tasktrackers around (via Torque). I do not have all the details, but I believe Torque can have custom monitoring and can be used to do smarter (i.e. Map-Reduce aware) scheduling.

If isolation really is the end-goal one can use full VMs right now. The capacity-scheduler, in conjunction with TaskController infrastructure in the TaskTracker, has some of the features you want: it does monitoring of memory consumed by the task process tree and ensures they do not go over a limit. Yes, it's harder to do cpu/io monitoring - but it is something everyone is looking to do. Your efforts in this space will be very useful to the whole community at-large... as indicated by our collaboration on MAPREDUCE-220 and other related jiras.

-1 for the direction proposed in this jira.

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: DCD.pdf
>
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780992#action_12780992 ] 

dhruba borthakur commented on MAPREDUCE-1044:
---------------------------------------------

The end goal is *really* to get isolation of jobs having different SLAs.  One way to do this is to use VMs and solaris has relatively good ones from what I understand (not so in Linux). 

Torque can be used, but it is the policies of what-to-move, when-to-move, etc that will be important, rather than which daemon will move it. The daemon could be something that we build (inside or outside the Hadoop framework) or it could be a extension to Torque. This requires that the JT exposes certain APIs that this external piece of software can use to build it policies. Maybe we can use this JIRA to discuss what these policies could be and when they get triggered?

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: DCD.pdf
>
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760159#action_12760159 ] 

dhruba borthakur commented on MAPREDUCE-1044:
---------------------------------------------

The problem that we are trying to solve is really "job isolation". In the current scheme of things, if a task suddenly starts eating lots of memory on a node, the Linux OS is liable to start swapping and/or die a slow death or crawl. Similarly, if a task starts spewing out lots of data on the network, it affects the other tasks running on the same node/rack. These observations have led us to believe that job-isolation is never truly possible (unless u use a Virtual Machine like VMWare) if u let tasks of different jobs run on the same node. The only feasible solution to achieve complete isolation of one job from another is to run them on separate clusters (i.e. nodes). Thoughts?

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760343#action_12760343 ] 

Allen Wittenauer commented on MAPREDUCE-1044:
---------------------------------------------

Using Solaris projects, it theoretically possible to limit mem, cpu and network resources (Solaris IPQoS settings can take a projid).  It looks like Solaris will get real disk quotas for ZFS in 10u8.  We haven't tried doing this in practice, but it sure looks feasible on paper.

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760342#action_12760342 ] 

dhruba borthakur commented on MAPREDUCE-1044:
---------------------------------------------

Changing the OS does not help sharing other resources like network utilization and disk utilization. One task on a machine can complete starve other tasks on the same machine from accessing network/disk resources, isn't it?

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1044) Ability to automatically move machines from one MR compute cluster to another

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760149#action_12760149 ] 

Jeff Hammerbacher commented on MAPREDUCE-1044:
----------------------------------------------

Hey Dhruba,

Rather than moving nodes from MR cluster to another, couldn't you just grow/shrink the minimum allocation of a pool in the Fair Share scheduler? Inserting another daemon outside of the scope of the JT to perform this dynamic resource allocation while the JTs themselves are trying to perform resource allocation seems to be complicating things a bit.

Thanks,
Jeff

> Ability to automatically move machines from one MR compute cluster to another
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1044
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1044
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>
> We have multiple map-reduce clusters that provide different service and support levels for their users. We have seen that utilization of hardware resources are not optimized if we have a static partition of existing hardware resources into these separate MR clusters. It would be nice to have a automatic way to move nodes from one MR cluster to another based on load characteristics and configured policies. This JIRA will discuss some of the ideas and possible implementations of those ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.