You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2012/06/04 06:09:22 UTC

[jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Junping Du created HADOOP-8468:
----------------------------------

             Summary: Umbrella of enhancements to support different failure and locality topologies
                 Key: HADOOP-8468
                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha, io
    Affects Versions: 2.0.0-alpha, 1.0.0
            Reporter: Junping Du
            Assignee: Junping Du
            Priority: Critical


The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junping Du updated HADOOP-8468:
-------------------------------

    Attachment: HADOOP-8468-total-v3.patch
    
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288791#comment-13288791 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Finished! The code separation following JIRA will happen tomorrow.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junping Du updated HADOOP-8468:
-------------------------------

    Attachment: Proposal for enchanced failure and locality topologies.pdf
    
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398269#comment-13398269 ] 

Konstantin Shvachko commented on HADOOP-8468:
---------------------------------------------

> 3rd on local node of 2nd

How so?

Junping try to rewrite the policy I stated earlier using your terms for 4-level topology with node-groups as the third level, and you will see many words change. If you put it in terms when virtual nodes are added as the fourth level, then you don't need to change a word in the old policy. I thought it's a good thing to keep old policies consistent with new use cases. Confirms (1) that it's a good policy, and (2) that it's a good design.

> Agree. That's what I try to do previously also.

What changed your mind? Sounds like the right direction to me.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395911#comment-13395911 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk #1113 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1113/])
    HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology.  Contributed by Junping Du (Revision 1351163)

     Result = FAILURE
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288596#comment-13288596 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Patch is divided into 7 patches and attached to each sub tasks. There are some dependencies between patches and only three patches are independent patches: P1, P3 and P6. 
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489608#comment-13489608 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Great!Thanks.Nicholas!
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Posted by Mi...@emc.com.

That's great Junping.

Hoping to see this in trunk / hadoop 2.0 and hadoop 1.1 soon.

- milind

On Jun 4, 2012, at 8:48 AM, Jun Ping Du wrote:

> Hello Folks,
>      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
>      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
>      Hope this is a good start.    
> 
> Cheers,
> 
> Junping
> 
> ----- Original Message -----
> From: "Junping Du (JIRA)" <ji...@apache.org>
> To: common-issues@hadoop.apache.org
> Sent: Monday, June 4, 2012 12:09:22 PM
> Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies
> 
> Junping Du created HADOOP-8468:
> ----------------------------------
> 
>             Summary: Umbrella of enhancements to support different failure and locality topologies
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 2.0.0-alpha, 1.0.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
> 
> 
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 
>

Re: Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Posted by Mi...@emc.com.

That's great Junping.

Hoping to see this in trunk / hadoop 2.0 and hadoop 1.1 soon.

- milind

On Jun 4, 2012, at 8:48 AM, Jun Ping Du wrote:

> Hello Folks,
>      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
>      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
>      Hope this is a good start.    
> 
> Cheers,
> 
> Junping
> 
> ----- Original Message -----
> From: "Junping Du (JIRA)" <ji...@apache.org>
> To: common-issues@hadoop.apache.org
> Sent: Monday, June 4, 2012 12:09:22 PM
> Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies
> 
> Junping Du created HADOOP-8468:
> ----------------------------------
> 
>             Summary: Umbrella of enhancements to support different failure and locality topologies
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 2.0.0-alpha, 1.0.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
> 
> 
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 
>

Re: Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Posted by Mi...@emc.com.

That's great Junping.

Hoping to see this in trunk / hadoop 2.0 and hadoop 1.1 soon.

- milind

On Jun 4, 2012, at 8:48 AM, Jun Ping Du wrote:

> Hello Folks,
>      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
>      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
>      Hope this is a good start.    
> 
> Cheers,
> 
> Junping
> 
> ----- Original Message -----
> From: "Junping Du (JIRA)" <ji...@apache.org>
> To: common-issues@hadoop.apache.org
> Sent: Monday, June 4, 2012 12:09:22 PM
> Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies
> 
> Junping Du created HADOOP-8468:
> ----------------------------------
> 
>             Summary: Umbrella of enhancements to support different failure and locality topologies
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 2.0.0-alpha, 1.0.0
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
> 
> 
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 
>

Can someone review MAPREDUCE-4309 and MAPREDUCE-4310?

Posted by Jun Ping Du <jd...@vmware.com>.

These two patches are for Hadoop Network Topology extension (YARN part) for virtualization environment.

Thanks,

Junping

----- Original Message -----
From: "Jun Ping Du" <jd...@vmware.com>
To: common-dev@hadoop.apache.org, hdfs-dev@hadoop.apache.org, mapreduce-dev@hadoop.apache.org
Cc: "Mark Pollack" <mp...@vmware.com>, "Jurgen Leschner" <jl...@vmware.com>, "Richard McDougall" <rm...@vmware.com>
Sent: Monday, June 4, 2012 11:48:35 PM
Subject: Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Hello Folks,
      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
      Hope this is a good start.    

Cheers,

Junping

----- Original Message -----
From: "Junping Du (JIRA)" <ji...@apache.org>
To: common-issues@hadoop.apache.org
Sent: Monday, June 4, 2012 12:09:22 PM
Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Junping Du created HADOOP-8468:
----------------------------------

             Summary: Umbrella of enhancements to support different failure and locality topologies
                 Key: HADOOP-8468
                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha, io
    Affects Versions: 2.0.0-alpha, 1.0.0
            Reporter: Junping Du
            Assignee: Junping Du
            Priority: Critical

The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Can someone review MAPREDUCE-4309 and MAPREDUCE-4310?

Posted by Jun Ping Du <jd...@vmware.com>.

These two patches are for Hadoop Network Topology extension (YARN part) for virtualization environment.

Thanks,

Junping

----- Original Message -----
From: "Jun Ping Du" <jd...@vmware.com>
To: common-dev@hadoop.apache.org, hdfs-dev@hadoop.apache.org, mapreduce-dev@hadoop.apache.org
Cc: "Mark Pollack" <mp...@vmware.com>, "Jurgen Leschner" <jl...@vmware.com>, "Richard McDougall" <rm...@vmware.com>
Sent: Monday, June 4, 2012 11:48:35 PM
Subject: Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Hello Folks,
      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
      Hope this is a good start.    

Cheers,

Junping

----- Original Message -----
From: "Junping Du (JIRA)" <ji...@apache.org>
To: common-issues@hadoop.apache.org
Sent: Monday, June 4, 2012 12:09:22 PM
Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Junping Du created HADOOP-8468:
----------------------------------

             Summary: Umbrella of enhancements to support different failure and locality topologies
                 Key: HADOOP-8468
                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha, io
    Affects Versions: 2.0.0-alpha, 1.0.0
            Reporter: Junping Du
            Assignee: Junping Du
            Priority: Critical

The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Posted by Jun Ping Du <jd...@vmware.com>.

Hello Folks,
      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
      Hope this is a good start.    

Cheers,

Junping

----- Original Message -----
From: "Junping Du (JIRA)" <ji...@apache.org>
To: common-issues@hadoop.apache.org
Sent: Monday, June 4, 2012 12:09:22 PM
Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Junping Du created HADOOP-8468:
----------------------------------

             Summary: Umbrella of enhancements to support different failure and locality topologies
                 Key: HADOOP-8468
                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha, io
    Affects Versions: 2.0.0-alpha, 1.0.0
            Reporter: Junping Du
            Assignee: Junping Du
            Priority: Critical


The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396783#comment-13396783 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk #1114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1114/])
    Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

     Result = FAILURE
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398316#comment-13398316 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

> What changed your mind? Sounds like the right direction to me.
>From above comments, you can see way-1 inherit original policy almost as much as way-2. But way-1 will take more simplicity in implementation for some reasons like: DatanodeDescriptor don't have to remap to additional *virtual node* layer, NetworkTopology structure is easier to extend in InnerNode rather than leaf node, etc. Thoughts?
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289164#comment-13289164 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hi Luke,
   Thanks for good comments. Will address this soon.

Best,

Junping
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291584#comment-13291584 ] 

Konstantin Shvachko commented on HADOOP-8468:
---------------------------------------------

Junping, I went over the design document. It is pretty comprehensive. A few comments on the design.

# Conceptually you are extending current Network Topology by introducing a new layer of leaf nodes. Current topology assumes that physical nodes are the leaves of the hierarchy and you add virtual nodes that can reside on physical nodes. I think this is a more logical way to look at the new topology, rather than saying that you introduce the second layer (node groups) over the nodes, as document does.
# The document should clarify how local storage is used by VMs on a physical box. I think the assumption is that VMs never share storage resources. Otherwise there could be a reporting problem. That is, if two VMs share a drive and send two DF reports to the NameNode, then the drive will be counted twice, which can cause problems. I'd recommend to update the pictures and add a section talking about reporting of DNs' resources to NN to make this issue explicitly covered in the design.
# For block replication there are 3 policies to consider:
#* block placement policy, when a new block is created
#* block replication policy, when under-replicated blocks are recovered
#* replica removal policy, when replicas are removed for over-replicated blocks
You covered the first two, and probably need to look into the third as well.
For the first two I'd be good to write down the entire modified policy rather than just listing the differences. 
_And make sure they converge to existing policies if virtual node layer is not defined._
# For YARN I am not convinced you will need to run multiple VMs per node, if not for the sake of generosity. It seems YARN should rely on NodeManager to report resources and manage Containers of a node as a whole. Not sure how multiple VMs on a node can help here. 
For MRv1 on the contrary running multiple VMs per node can be useful for modeling variable slots. In this case again the VMs should not share memory otherwise repoting will go wrong.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289747#comment-13289747 ] 

Luke Lu commented on HADOOP-8468:
---------------------------------

Actually, the two approaches are orthogonal. Avoiding placing more than one data node of the same logical cluster on the same physical host will increase reliability even if the new topology algorithm is in place. 

VM placement is only NP hard if instance configuration is arbitrary and that you require absolute optimal placement. It's easier if the number of instance types is limited a la AWS. I suspect that greedy algorithms exist to approximate the optimal replacement. We don't need millisecond response time for such placement algorithm either, which is only done once at the logical cluster deploy time and when there are physical host failures.

It's definitely easier to do such placement when number of nodes of a logical cluster is much smaller than the number of physical hosts, which is the case for AWS and SmartCloud.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288687#comment-13288687 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

It looks like this move action will actually move sub jira out of parent jira (umbrella). Do we need three parent JIRAs in Common/HDFS/MapReduce?
To your questions on running hadoop inside VMs, I don't have a concrete number for now. But we know some enterprise customer would like to run hadoop cluster in their virtualized datacenter/private cloud.  
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487059#comment-13487059 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Also, a white paper on reliability and performance evaluation for HVE: http://serengeti.cloudfoundry.com/pdf/Hadoop%20Virtualization%20Extensions%20WP.pdf .
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456891#comment-13456891 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Thanks for great comments. Konstantin. the doc revised-1.0 already address full policy definition.
Hi guys, I am back porting patches to branch-1. Hope I can get your support and help on reviewing. :)
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289335#comment-13289335 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

I would update proposal a bit with listing the first approach. This is a workaround without hadoop code change. However, this "1-1 mapping" of data node to physical host will take following restrictions:
1. If nodes' number is larger than the number of physical host.
2. If the number of nodes is smaller than physical hosts, but some hosts are fully occupied by other logical hadoop clusters or other applications.
3. The clouds/datacenters are formed of heterogeneous hosts that some hosts are not suitable to deploy hadoop nodes. i.e. attached to shared storage only.
In general, VM placement in cloud is a complex BIN-packing problem which is NP-hard and should be optimised for a balance of resource utilization and reliability. Applying an absolute rule like the first approach is not the best way. In addition, the principle of hadoop network topology should reflect the physical(or virtual) topology in the bottom layer but should not take strict requirements/restriction to deploying topology.  
Thoughts?
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288655#comment-13288655 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hi Robert,
   Thanks for your reply. So you are suggesting re-create sub tasks in proper project(Common, HDFS, MAPREDUCE). Isn't it? 
   For patch mixed with cross projects (like 1st sub jira, mix COMMON and HDFS), we should create both a common and hdfs project for it?

Best,

Junping

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289903#comment-13289903 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hi Luke,
   Yes. I agree with you that when number of nodes of logical cluster is much smaller than number of (available) physical hosts, it is good to do such placement for reliability if infrastructure allows (although may trade off a bit on more network traffic across rack/core switch. Isn't it?). Are noting this approach in proposal and describing its use scenario good enough to go for proposal?

Thanks,

Junping
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393612#comment-13393612 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #2435 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2435/])
    HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology.  Contributed by Junping Du (Revision 1351163)

     Result = SUCCESS
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396122#comment-13396122 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #2367 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2367/])
    Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

     Result = SUCCESS
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503100#comment-13503100 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hi Konstantin, Thanks for your question and carefully reading on the result. Yes. TestDFSIO has no locality awareness in task scheduling as you said. However, after tasks are scheduled, the work in this umbrella (let's call it HVE for short) can enhance the possibility for client to choose local physical host's data block for two reasons: 
1. HVE make sure all replica cross 3 physical hosts (total 6 hosts), so for any HDFS read, there is 50% chance to have a replica living on the same physical host (previously, it is between 1/3 - 1/2)
2. With HVE, HDFS client can correctly sort the replicas to have nodegroup-local replica have priority to be chosen rather than rack-local replica.
The first reason is just special for this case, but second reason affects more general.
Is that make sense?
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288666#comment-13288666 ] 

Robert Joseph Evans commented on HADOOP-8468:
---------------------------------------------

You can move the JIRAs.  More Actions -> Move.  If it is possible to split them up, it is nice to keep them separate, but it is not totally necessary.  If they do span multiple projects and are hard to split up you can leave them under HADOOP.  The main reason for this is that some people only watch the HDFS lists, while others only look at the MAPREDUCE lists, and may miss changes that are not filed under the appropriate group.

I am interested to see where this goes, and it seems very logical to me to be able to express to Hadoop what your topology really does look like.  I am not sure how many groups are running Hadoop inside VMs except perhaps on EC2, but I have a very limited view into that right now.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junping Du updated HADOOP-8468:
-------------------------------

    Target Version/s: 2.2.0-alpha
    
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288604#comment-13288604 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

I mark P1,P3 and P6 as patch available, P2, P4, P5, P7 as dependency on P1, P3 and P6, cannot pass the build on current trunk.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junping Du updated HADOOP-8468:
-------------------------------

    Attachment: HVE User Guide on branch-1(draft ).pdf
    
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288906#comment-13288906 ] 

Luke Lu commented on HADOOP-8468:
---------------------------------

This is a comment on the proposal, IMO, is missing a viable option. There are essentially two approaches to address the problem.

# Enhance VM placement to ensure 1-1 mapping of data node to physical host within a logical hadoop cluster. This approach doesn't require any modification to Hadoop to achieve the same data reliability/redundancy. This can be a viable option for Hadoop clusters with number of nodes smaller than number of physical hosts, e.g, large public or company wide clouds.
# For Hadoop clusters with more data nodes than the physical host. The analysis in the proposal is spot on and the extra layer is required to achieve optimum data reliability.



                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396139#comment-13396139 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #2438 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2438/])
    Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

     Result = SUCCESS
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junping Du updated HADOOP-8468:
-------------------------------

    Attachment: HVE_Hadoop World Meetup 2012.pptx

I just attach the silde of talk on yesterday's Hadoop World meetup. Thanks.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Suresh Srinivas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-8468:
------------------------------------

      Priority: Major  (was: Critical)
    Issue Type: Improvement  (was: Bug)
    
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393613#comment-13393613 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #2363 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2363/])
    HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology.  Contributed by Junping Du (Revision 1351163)

     Result = SUCCESS
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503097#comment-13503097 ] 

Konstantin Shvachko commented on HADOOP-8468:
---------------------------------------------

Junping,
Checked your article with performance results. Got a question about it. 
How do you explain the performance gain with DFSIO?
MapReduce-wise DFSIO is completely unaware of the locality of the data it reads, because input data is just the file with the file name that the mapper should read. So the input file with name of the file to read is local to task, but not the file that it then reads.
Not saying there is anything wrong with your results, I just think it needs more explanation.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288643#comment-13288643 ] 

Robert Joseph Evans commented on HADOOP-8468:
---------------------------------------------

Junping Du,

I have been looking at some of your patches, but there is a lot here to go through and it is likely to take some time. 

Could you please move your JIRAs to the appropriate project. HDFS JIRAs should be moved out of HADOOP and into HDFS, Mapreduce should go to MAPREDUCE, and only the ones that stay in HADOOP should be for code that goes under the hadoop-common-project directory.

Thanks

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398294#comment-13398294 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hi Konstantin,
   Thanks for your comments. Please see my reply:

> If you put it in terms when virtual nodes are added as the fourth level, then you don't need to change a word in the old policy.
Still need some slightly change as first replica should be placed on local virtual node but not node local. Let me show a two different way of translation the original rules you list above (in rule 2, I omit "on two different nodes" there as it is duplicated with rule 0).
Original:
0. No more than one replica is placed at any one node
1. First replica on the local node
2. Second and third replicas are in the same rack
3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks.

two ways: 1) node, rack -> node, *nodegroup*; 2) node, rack -> *virtual node*, node, rack. The black word represent additional layer.
way 1:
0. No more than one replica is placed at any one *nodegroup*
1. First replica on the local node
2. Second and third replicas are in the same rack
3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks
way 2:
0. No more than one replica is placed at any one node
1. First replica on the local *virtual node*
2. Second and third replicas are in the same rack
3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks

So you can see it is equivalent in words. 
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290043#comment-13290043 ] 

Luke Lu commented on HADOOP-8468:
---------------------------------

Yes, noting the new approach and its impact on overall reliability would make the proposal more complete.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296139#comment-13296139 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hi Konstantin,
   That's good suggestions. Updated proposal should address most of them. A few comments below:
> So my motivation with virtual node extension is that it formally inherits the existing policy, but semantically adds a new level of topology.
Agree. That's what I try to do previously also. :) The current way is mapping node -> (virtual) node and add "nodegroup" level, so that policy is almost exactly the same: 1st on local (virtual) node, 2nd on off-rack, 3rd on local node of 2nd. The only difference is to make sure 2nd and 3rd are off-nodegroup (and if 1st cannot be local(virtual) node, then can be nodegroup-local node).
> But from the failure scenarios viewpoint they are bound to the same node, meaning that node failure takes all of them down
Yes. So adding a node-group level should address the failure relationship between (virtual) nodes perfectly. I think the key points for map current node to vm level include: 
Virtual node (VM) plays as leaf node. There are still failure only happens within VM like daemon failure, os failure, and some physical failure (like: disk failure, as in most cases for running hadoop, VM should mount separated physical disks rather than sharing disk with other VM). So, VM still show some independency even in failure group semantics.
Virtual node is where JVM is running and java network call happens. In current code base, ip(hostname) of a node (reader, datanode) is used to keep data locality. Only VM-level ip is easy to get by JVM and RPC call so that make sense to represent node ip.
Thoughts?


                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291638#comment-13291638 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hey Konstantin, Thanks for a lot of good suggestions here.
For 1. In concept, there are two ways to look at the change we proposed. One way is like you said, we add vm level extension to physical host (but make physical host to be innernode, but not leaf any more.). The other way is: we look at VM (virtual node) as previous physical node as container of processes but add an innernode layer between node and rack. We are preferring the second way as following reasons:
  1) Each VM on the same physical machine plays independently in general but have some relations on reliability and lower communication overhead. Each VM has independent hostname, ip and it is the place where hadoop daemons running.
  2) VMs lives on the same physical machine can belong to different logical hadoop clusters, physical host is not like before that can only be dedicated to one logical hadoop cluster but could be shared. Also, physical host's ip and host info (hypervisor's ip and info) should not be aware by hadoop.  
  3) In some data locality related policies, VM map to previous physical node well as the first choice to place 1st replica, scheduling task, etc. 
For 2. It's right that VMs on the same host will not share storage directly but could do so (with getting virtual disks) through Hypervisor FS (Like VMFS in VMware vSphere) layer. Another way (should recommend for hadoop case) is to go through RDM (Raw Disk Mapping) configuration in hypervisor that each VM can get some dedicated physical disks. In both cases, the virtual disk drive (and its capacity) for each VM are independent and can be reported by DN without any overlapping.
For 3. Yes. It looks we are missing replica removal policy in proposal. I will revise it as your suggestion. Thanks!
For 4. YARN is doing good job in resolving fixed task slot issue that exists in MRv1. Besides resolving this issue in MRv1, it still have some scenarios to run multiple VMs per physical node, like: tenant's task isolation in vm level, separation data node and compute node to support hadoop MapReduce(YARN) cluster auto scale in and out, support standard-customised nodes (as a requirement of cloud) in a heterogeneous hardware environment, etc.
Thoughts?   

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Posted by Jun Ping Du <jd...@vmware.com>.

Hello Folks,
      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
      Hope this is a good start.    

Cheers,

Junping

----- Original Message -----
From: "Junping Du (JIRA)" <ji...@apache.org>
To: common-issues@hadoop.apache.org
Sent: Monday, June 4, 2012 12:09:22 PM
Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Junping Du created HADOOP-8468:
----------------------------------

             Summary: Umbrella of enhancements to support different failure and locality topologies
                 Key: HADOOP-8468
                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha, io
    Affects Versions: 2.0.0-alpha, 1.0.0
            Reporter: Junping Du
            Assignee: Junping Du
            Priority: Critical


The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399536#comment-13399536 ] 

Konstantin Shvachko commented on HADOOP-8468:
---------------------------------------------

It's good that you formulated the policies. Now I can see the differences. In way-2 you actually don't need to say "virtual node". It is the implementation detail. You only care that the first replica is on the local physical node. So way-2 is the same as the original.
In way-1 I agree only one change is needed. Rather surprising.

I briefly checked the patch, and see now that your abstractions are driven by the implementation. Whether you define it way-1 or way-2 implementation-wise you still introduce a new inner level in the topology.
I do not think you need the new class InnerNodeWithNodeGroup. It doesn't have new members or constructors. It overrides isRack(), but only because the old implementation assumed racks are on the second level. I'd rather add nodeType member than checking children of children.

So, I think I understand your motivation with the design. Thanks for clarifying your thoughts to me. I still think that the terminology is better when talking about extending the topology with new leaves, but your way is also valid and does not change the policy much. You choose. Either way, please add the full policy definition in the document.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junping Du updated HADOOP-8468:
-------------------------------

    Attachment: Proposal for enchanced failure and locality topologies (revised-1.0).pdf
    
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295990#comment-13295990 ] 

Konstantin Shvachko commented on HADOOP-8468:
---------------------------------------------

Sorry, got distracted with the Hadoop event of the week.

Here is current replication policy.
0. No more than one replica is placed at any one node
1. First replica on the local node
2. Second and third replicas on two different nodes in a different rack
3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks.


With my thinking that the virtual node level is added, the policy remains unchanged. With a single optional clarification:
(1) First replica on the virtual node then on the local node

With your approach of adding the hypervisor layer the policy need to be revised, by replacing "node" with "node group".

So my motivation with virtual node extension is that _it formally inherits the existing policy, but semantically adds a new level of topology_.

> Each VM on the same physical machine plays independently

As you correctly mention in the design doc, topology is about failure scenarios rather than independence of VMs. VM-s are independent as the entities reporting to the NameNode. But from the failure scenarios viewpoint they are bound to the same node, meaning that node failure takes all of them down.
So the policy should not change, only the implementation of it should.

> VMs lives on the same physical machine can belong to different logical Hadoop clusters

Well you can run two DNs or TTs on the same node belonging to different clusters even now, but nobody does that, because operationally it's just too much hassle. Not sure if virtualization will make it different.
I heard of attempts to run multiple clusters on the same physical nodes for isolation purposes, but didn't hear it was successful.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489601#comment-13489601 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-8468:
------------------------------------------------

Hi Junping, I would be happy to check your branch-1 patch.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489593#comment-13489593 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Hi, Anyone can take a look at recent patch of topology extension for branch-1? I update a new version in HADOOP-8817.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396199#comment-13396199 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #2387 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2387/])
    Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

     Result = FAILURE
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395831#comment-13395831 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Hdfs-trunk #1080 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1080/])
    HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology.  Contributed by Junping Du (Revision 1351163)

     Result = SUCCESS
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487053#comment-13487053 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

As some followup with meetup, I quickly summarize how to configure and use HVE as a draft user guide and put it attached. Please help to review and comments.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, Proposal for enchanced failure and locality topologies.pdf, Proposal for enchanced failure and locality topologies (revised-1.0).pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296131#comment-13296131 ] 

Junping Du commented on HADOOP-8468:
------------------------------------

Update proposal to address Luke and Konstantin's comments:
+ Replica removal policy changes
+ Noting vm placement workaround
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396698#comment-13396698 ] 

Hudson commented on HADOOP-8468:
--------------------------------

Integrated in Hadoop-Hdfs-trunk #1081 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1081/])
    Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

     Result = FAILURE
szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java

                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies (revised-1.0).pdf, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288761#comment-13288761 ] 

Robert Joseph Evans commented on HADOOP-8468:
---------------------------------------------

Having the other links like you have done is usually good enough.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total-v3.patch, HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Make Hadoop NetworkTopology and data locality more pluggable for other deploying topology like: virtualization.

Posted by Jun Ping Du <jd...@vmware.com>.

Hello Folks,
      I just filed a Umbrella jira today to address current NetworkTopology issue that binding strictly to three tier network. The motivation here is to make hadoop more flexible for deploying topology (especially for cloud/virtualization case) and more configurable in data locality related policies like: replica placement, task scheduling, choosing block for DFSClient reading, balancing. 
      We submit a draft proposal in this Umbrella as well as the implementation code. As code base is large (~260K), the code is separated into 7 sub JIRA issues which seems to be more convenient for reviewing. However, we split the code based on functionality which cause some dependencies between patches which way we are not sure the best. Welcome to provide comments and suggestions on doc and code, and look forward to work with all of you to enhance hadoop in some new situations towards perfect.
      Hope this is a good start.    

Cheers,

Junping

----- Original Message -----
From: "Junping Du (JIRA)" <ji...@apache.org>
To: common-issues@hadoop.apache.org
Sent: Monday, June 4, 2012 12:09:22 PM
Subject: [jira] [Created] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Junping Du created HADOOP-8468:
----------------------------------

             Summary: Umbrella of enhancements to support different failure and locality topologies
                 Key: HADOOP-8468
                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha, io
    Affects Versions: 2.0.0-alpha, 1.0.0
            Reporter: Junping Du
            Assignee: Junping Du
            Priority: Critical


The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8468) Umbrella of enhancements to support different failure and locality topologies

Posted by "Junping Du (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junping Du updated HADOOP-8468:
-------------------------------

    Attachment: HADOOP-8468-total.patch

This is a patch with all code changes. We will divide this into 7 sub-patches for easily review and check in.
                
> Umbrella of enhancements to support different failure and locality topologies
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-8468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8468
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha, io
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: HADOOP-8468-total.patch, Proposal for enchanced failure and locality topologies.pdf
>
>
> The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization. 
> Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading: 
> 1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
> 2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
> Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira