You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Eli Collins (Created) (JIRA)" <ji...@apache.org> on 2012/03/22 06:58:22 UTC

[jira] [Created] (HADOOP-8198) Support multiple network interfaces

Support multiple network interfaces
-----------------------------------

                 Key: HADOOP-8198
                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
             Project: Hadoop Common
          Issue Type: New Feature
          Components: io, performance
            Reporter: Eli Collins
            Assignee: Eli Collins


Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8198:
--------------------------------

    Attachment:     (was: MultipleNifsv1.pdf)
    
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8198:
--------------------------------

    Attachment: MultipleNifsv1.pdf
    
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251709#comment-13251709 ] 

Eli Collins commented on HADOOP-8198:
-------------------------------------

@Nathan, thanks for chiming in, answers follow..

- Wiring up multiple interfaces does mean you need 2x the port count, more cable management issues, and potentially additional switch configuration. That's true today for people who use host-level bonding.
- For use case #1 supporting multiple interfaces is like supporting multiple of any host resource (eg disks). You get improved performance and the ability to tolerate more failures at the cost of additional code complexity. We already have to tolerate client <-> worker connection failures, we can leave the current behavior as is, or attempt to better tolerate them by eg working around them (eg see HDFS-3149). Like tolerating disk failures this means some hosts may more resources than others (if by default only one interface is reported then this only affects the multi-interface case). I'm also considering the impact on MR, where you'd want the shuffle to be able to take advantage of this as well, and more importantly, if it didn't then you could potentially have more imbalanced network traffic.
- For use case #2 supporting multiple interfaces is simpler because clients don't necessarily get multiple interfaces, different clients just end up getting different interfaces, in the same way the NN can bind to the wildcard today, causing it to be available on multiple interfaces, and clients can access it via any of them. Note that both are independent, you can support #2 w/o #1 and vice versa.
- Wrt host-level bonding and 10gige, see my comment above to Sanjay, these both help use case #1, they don't address use case #2, the primary motivation.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8198:
--------------------------------

    Attachment: MultipleNifsv3.pdf

Updated design doc attached (v3).

- Based on feedback from Todd, Phil Zeyliger and Eric Sammer, I decided to skip the client-based interface filtering altogether and start with server-side interface filter since we'll need it eventually. Will update HDFS-3147.
- Updated discussion of exposing cluster-private interfaces to off-cluster clients (eg for the pipeline)
- Based on feedback from Sanjay, better clarified the relationship between this work and using host-level bonding

I'm going to create a branch for the HDFS-3140 work.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242847#comment-13242847 ] 

Sanjay Radia commented on HADOOP-8198:
--------------------------------------

Host level bonding would allow a DN or Tasks on a Node to use the aggregate bandwidth of multiple nics. 
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Daryn Sharp (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251922#comment-13251922 ] 

Daryn Sharp commented on HADOOP-8198:
-------------------------------------

Are we going to confine yarn/MR services to using only one NIC?
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244278#comment-13244278 ] 

Eli Collins commented on HADOOP-8198:
-------------------------------------

This proposal enables host-level bonding (as part of use case #1). Today host-level bonding doesn't work because the NN sets the DatanodeID IP field to be the IP where the IPC came from. So if you configure the IP of a bond in the dfs.datanode.address it is ignored (ie this config is only used to set the transfer port). HDFS-3146 will allow you to just report the IP of the bond, thus enabling host-level bonding. Or if you don't want to do host-level bonding you can just report the IPs of both interfaces. The latter is useful because some users find host-level bonding a pita to configure and would prefer we use multiple interfaces out of the box.

Note that host-level bonding is insufficient for use case #2. Suppose a host has 2 interfaces, one is cluster-private - not routable by clients outside the cluster - and the other is usable by clients outside the cluster (eg an adjacent cluster or system). You can't bond these two interfaces, and the NN only advertises one DN IP, so it can only hand out one, which means only one client will work. You can try to work around this by port-forwarding from the public interface to the private interface but that defeats the purpose. Alternatively, if the DN was advertised by hostname then you can get this to work by having on-cluster clients resolve the hostname to one IP (eg using host files or a local DNS server) and off cluster resolve it to another (eg they use what's in DNS). This is actually the approach I posted for v1, but it has some drawbacks (eg lots of extra DNS lookups) and more complex configuration so I don't think we want to do this for trunk. It's much simpler to be able to report multiple IPs, and configure which to use.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252063#comment-13252063 ] 

Sanjay Radia commented on HADOOP-8198:
--------------------------------------

Eli, below is some feedback to improve the document/proposal and some questions. At some point, the answers should move into the document. 
* Intro
  Not true that "Hadoop does not currently utilize multiple network interfaces". - Hadoop relies on host-level bonding for multiple nics and this does not allow use case 1.2 but addresses use case 1.1. Worth clarifying this in the intro.
* Use case 1.1 Layer 7 bonding
** Some folks get thrown off by the "Layer 7 bonding" thinking  that the proposal is to configure switches, nic, etc to get  layer 7 bonding. Add a sentence to clarify.
** bandwidth and failover is the main motivation for this use case.
*** Q What are the limitations of  host level bonding for workers and masters for this use case (as opposed to use case 1.2)?
** Add text that worker is DN or TT and Master is NN or JT,
* Use case 1.2 - i understand the text but have hard time buying this use case as a high priority.  Can you please motivate this better. Can we separate this one for phase 2 (or were you planning that anyway)?
* Use case 1.3 - you have text and filed jiras for a client config to use a specific nic - is this a separate use case?
* Insert a section "Scope"
** Your description is general (Masters and Workers) but you limit the detailed sections to HDFS and MR. How will things work if don't address this problem at all layers? (see my comment below on issues/risks)
* Insert a section called "Issues/Risks"
** What if other services up the stack have not been modified and depend only on host-level-bonding?
** Minimize changes to DfsCLient so it can remain as thin as possible for future client side porting to other languages. (There are solutions that do not require changes to the dfsClient that we should consider in the short term).
* 2 Requirements
** some of the requirements seem to prescribe the solution. Separate out the solutions and insert in a section between 2 and 3 describing the solution.
** 2.1 - you are not stating a requirement here -- reword - nics can be bonded or multi-homed. Current wildcard IP config should be used for multi-homing. 
        State that the current multi-homing config does not help with use case 1.1 for worker nodes.
** 2.2 Multiple interfaces on master
*** I assume this is motived completely by use case 1.2 and not use case 1.1 since we don't have bandwidth issues for masters - please add this clarification.
*** What will this do to tokens if tokens are obtained from one of the several interfaces on a Master.
** 2.5 Enable clients to use multiple local interfaces - what is the motivation for this? Is this for tasks running inside Hadoop  or for clients outside the hadoop cluster. (or is this a yet another use case? If so add to use cases section.)
** Add requirements:
*** old configs should run unchanged
*** host level bonding should continue to work
*** security should work for the proposed enhancements
*** solution should work for all protocols - hadoop native, http rpc, etc. (if you are excluding any please clarify).
*** should work with HA
* 3.1 Example Config ...
** This section could be significantly improved. You switch between "current config"  and "how the current config's semantics are extended" and "new configuration"; this is sometimes confusing. 
* 4.3 Security 
** looks like the paragraph is not complete - last sentence is cut off. 
** Also added some details on what change are needed for this to work with security. Part of the paragraph reads like requirements statements and should move to the requirements section.
* 4.3 HA
** Add to requirements as noted above.
** Are there new issues here? Will have to failover multiple IPs  of the NN etc.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251292#comment-13251292 ] 

Eli Collins commented on HADOOP-8198:
-------------------------------------

Use case 2 is the primary motivation,as you point out use case #1 can be addressed purely with host-level bonding. The two scenarios I care most about, that will be used widely, are:
# Systems where the on-cluster traffic is on a cluster-private, high-speed (eg infiniband) network, and off-cluster access from clients goes over a separate network.
# Systems where the cluster needs a dedicated, high-performance link with an adjacent system (another Hadoop cluster, an EDW, etc)

Agree wrt HADOOP-7510, I've been discussing moving tokens off IPs and to hostname or a host-level identifier with Daryn. I think we need to tackle this for HA as well, he's got some good ideas here.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251926#comment-13251926 ] 

Todd Lipcon commented on HADOOP-8198:
-------------------------------------

bq. Are we going to confine yarn/MR services to using only one NIC?

If I recall correctly, the shuffle services use job tokens and not service tokens as well, right? I think it's OK to confine the RPC interfaces to using one NIC (for now) as they're generally not throughput-intensive. Adding multi-NIC support for them would be nice in the future for fault tolerance but I think it should be a separate task, since as you've brought up, it's much harder.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251892#comment-13251892 ] 

Todd Lipcon commented on HADOOP-8198:
-------------------------------------

I agree with the above comments that tokens are starting to fall apart. But, I don't think this current proposal has any relation to the token issue -- Eli is only proposing to add multi-NIC support for datanodes, and datanodes don't have service tokens. They only validate block tokens, which have no associated host/IP/etc.

If we wanted multi-NIC on the NN RPC, the token issue would be a blocker, but I don't think that's the current proposal.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Daryn Sharp (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251617#comment-13251617 ] 

Daryn Sharp commented on HADOOP-8198:
-------------------------------------

As Eli alluded to, I have been trying to formulate a more sustainable model for mapping tokens to their host.  The current token service model is rapidly breaking with the advent of features like HA and multi-interface hosts.  The use of ip service (ip:port) provides no portability of tokens.  Using host-based tokens (host:port) provides portability when a host's ip changes.

HA and multi-interface hosts create a larger challenge that host-based tokens do not address.  HA currently rewrites the service of a fetched token to an abstract identifier.  The token is later duplicated and the service is rewritten to each of the HA NN's services.  This approach only works if all clients are identically configured.  Cross-colo access requires all grids to share configurations which becomes a difficult maintenance issue.

Multi-interface hosts would require the same concept of HA - an abstract identifier - but the client cannot chose the identifier w/o further complicated config settings.  What I'd like to propose is the client no longer manages the service of a token.  The server will set the service to an abstract id for itself.  The id will be presented in the SASL challenge.  The client will chose the token based on the id.  This approach would neatly solve the dichotomy between ips/hosts/HA/multi-interface.

Whether multi-interface support is added, I think the token service model needs to be evolved because it's becoming fragile and relying too much on the client to manage the service correctly.  I'll create another jira and write up a design proposal as soon as I get a chance.

                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249963#comment-13249963 ] 

Sanjay Radia commented on HADOOP-8198:
--------------------------------------

> .. host-level bonding works today as long as the IP of bond is where IPC comes from as well ...
Hence if Hadoop is correctly configured then multiple NICs work for use case 1.
Eli, how important is use case 2 for Hadoop users? 
One of my concerns is that this code change is going to have some tricky corner cases to get right based on experience with the recent ipAddress-dnsName change patch (forget jira number) that resulted in some subtle bugs down the road with tokens etc.


                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246487#comment-13246487 ] 

Eli Collins commented on HADOOP-8198:
-------------------------------------

I should clarify that host-level bonding works today as long as the IP of bond is where IPC comes from as well, eg if a DN is configured only to listen on the IP of the bond. It breaks if the DN listens on all interfaces and does IPC to the NN on an interface that's not the bond. Check out specifics in HDFS-3146.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8198:
--------------------------------

    Attachment: MultipleNifsv2.pdf

Updated design doc, with a dozen clarifications based on offline feedback. 
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8198) Support multiple network interfaces

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8198:
--------------------------------

    Attachment: MultipleNifsv1.pdf

v1 design doc attached. Thanks to Todd Lipcon, Philip Zeyliger, Tom White, Dave Wang and Jon Hsieh for early feedback on an earlier draft.
                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Daryn Sharp (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252465#comment-13252465 ] 

Daryn Sharp commented on HADOOP-8198:
-------------------------------------

@Todd: If the scope is only DNs, then you may be correct that it's not a blocking issue.  The basic issue is a token cannot be acquired over 1 interface, and then subsequently used via another.  This may or may not be an issue today, but it's worth noting that it places strong limitations on clients and network topologies in a multi-NIC environment.

We have to ensure that all services issuing tokens to an external client cannot be cannot be run on a multi-NIC host.  I believe this currently applies to the NN, RM, AM(??), JHS, and maybe others.  Since the AM runs on a DN, it's the one to be most concerned about.  I don't understand the yarn token passing enough to know if it is a problem today.

Repurposing of hosts will be impacted.  If a service is moved from a failed host to another host, and the new host is multi-NIC, then the grid internal interfaces must be shutdown.  If the host is repurposed to be a DN again, then the interfaces will need to be re-enabled.


                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8198) Support multiple network interfaces

Posted by "Nathan Roberts (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251573#comment-13251573 ] 

Nathan Roberts commented on HADOOP-8198:
----------------------------------------

Hi Eli, I'd really like to understand the additional complexity that this feature will add to the overall system. Not only in the hadoop software itself, but also in the management of the cluster. Some examples that come to mind: 
* More configuration to get right/wrong, not only at the hadoop level but also at the host and network
* More physical wires to get crossed
* More components that can now fail - nodes can have 1 of 2 nics fail - what will that mean? How much more software will have to be written to deal with cases like this?
* Considerably more cluster configurations that will need to be tested. 
* Token handling gets even trickier, even beyond what will be needed for HA, I think??

Given the fact that 10g ethernet is right around the corner (like very price competitive this year), what does that do to the equation? 

Assuming host-level bonding was in place, what are the real benefits we would see on clusters made up of commodity boxes (no separate high-speed fabric)?

Anything you can do to help clarify/quantify/address the additional complexity will help. 



                
> Support multiple network interfaces
> -----------------------------------
>
>                 Key: HADOOP-8198
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8198
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io, performance
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: MultipleNifsv1.pdf, MultipleNifsv2.pdf, MultipleNifsv3.pdf
>
>
> Hadoop does not currently utilize multiple network interfaces, which is a common user request, and important in enterprise environments. This jira covers a proposal for enhancements to Hadoop so it better utilizes multiple network interfaces. The primary motivation being improved performance, performance isolation, resource utilization and fault tolerance. The attached design doc covers the high-level use cases, requirements, a proposal for trunk/0.23, discussion on related features, and a proposal for Hadoop 1.x that covers a subset of the functionality of the trunk/0.23 proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira