You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Colin Patrick McCabe (JIRA)" <ji...@apache.org> on 2015/12/17 01:46:46 UTC

[jira] [Commented] (HADOOP-12653) Client.java can get "Address already in use" when using kerberos and attempting to bind to any port on the local IP address

    [ https://issues.apache.org/jira/browse/HADOOP-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061231#comment-15061231 ] 

Colin Patrick McCabe commented on HADOOP-12653:
-----------------------------------------------

The code that's having the problem is here:
{code}
/*
 * Bind the socket to the host specified in the principal name of the
 * client, to ensure Server matching address of the client connection
 * to host name in principal passed.
 */
UserGroupInformation ticket = remoteId.getTicket();
if (ticket != null && ticket.hasKerberosCredentials()) {
  KerberosInfo krbInfo = 
    remoteId.getProtocol().getAnnotation(KerberosInfo.class);
  if (krbInfo != null && krbInfo.clientPrincipal() != null) {
    String host = 
      SecurityUtil.getHostFromPrincipal(remoteId.getTicket().getUserName());
    
    // If host name is a valid local address then bind socket to it
    InetAddress localAddr = NetUtils.getLocalInetAddress(host);
    if (localAddr != null) {
      this.socket.bind(new InetSocketAddress(localAddr, 0));  <=== HERE
    }
  }
{code}
You can see that this is binding to port 0, so the usual explanations for getting "address already in use" are not relevant here.

There is a discussion here: https://idea.popcount.org/2014-04-03-bind-before-connect/

It's kind of a confusing issue, but it boils down to:
* Every TCP connection is identified by a unique 4-tuple of (src ip, src port, dst ip, dst port).
* Calling {{bind-then-connect}} imposes restrictions on what src port can be that simply calling {{connect}} does not.  Specifically {{bind}} has to choose a port without knowing what dst ip and dst port will be, meaning that it has to be more conservative to ensure global uniqueness.

I think using {{SO_REUSEADDR}} can help here.  It's a bit confusing since that also opens us up to getting {{EADDRNOTAVAIL}}.  If I'm reading this right, though, that error code would only happen in the rare case where two threads happened to get into the critical section between bind and connect at the same time AND choose the same source port.  We could either retry in that case or ignore it and rely on higher-level retry mechanisms to kick in.

> Client.java can get "Address already in use" when using kerberos and attempting to bind to any port on the local IP address
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12653
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12653
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: net
>    Affects Versions: 2.4.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> Client.java can get "Address already in use" when using kerberos and attempting to bind to any port on the local IP address.  It appears to be caused by the host running out of ports in the ephemeral range.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)