You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Marshall McMullen (JIRA)" <ji...@apache.org> on 2012/06/01 15:35:23 UTC

[jira] [Commented] (ZOOKEEPER-1476) ipv6 reverse dns related timeouts on OSX connecting to localhost

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287397#comment-13287397 ] 

Marshall McMullen commented on ZOOKEEPER-1476:
----------------------------------------------

We've experienced this identical problem where reverse name lookup prevents zookeeper leader election from ever completing successfully. In our case this was failing on Linux with IPv4 not IPv6. As it turns out, there is a lot of code in zookeeper server that calls GetHostName which does a reverse dns lookup. I've patched the code in question to use GetHostString instead which does not do a reverse name lookup. Eventually it does perform a lookup but it uses getByName to do a normal dns lookup if necessary (if it's not an IP address already). 

I'm happy to upload the patch we use, but I can only vouch for it compiling properly on openjdk7. The function I had to use (GetHostString) was wrongly private in openjdk6 and made public in openjdk7. I don't know whether that function is public or private in Sun or IBM or any other flavor of java.
                
> ipv6 reverse dns related timeouts on OSX connecting to localhost
> ----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1476
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1476
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Jilles van Gurp
>            Priority: Minor
>
> We observed a weird, random issue trying to create zookeeper client connections on osx. Sometimes it would work and sometimes it would fail. Also it is randomly very slow. It turns out both issues have the same cause.
> My hosts file on osx (which is an unmodified default one), lists three entries for localhost:
> 127.0.0.1	localhost
> ::1             localhost 
> fe80::1%lo0	localhost
> We saw zookeeper trying to connect to fe80:0:0:0:0:0:0:1%1 sometimes, which is not listed (actually one in four times, it seems to round robin over the addresses). 
> Whenever that happens, it sometimes works and sometimes fails. In both cases it's very slow. Reason: the reverse lookup for fe80:0:0:0:0:0:0:1%1 can't be resolved using the hosts file and it falls back to actually using the dns. Sometimes it actually works but other times it fails/times out after about 5 seconds. Probably a platform specific settings with dns setup hide this problem on linux. 
> As a workaround, we preresolve localhost now: Inet4Address.getByName("localhost"). This always resolves to 127.0.0.1 on my machine and works fast.
> This fixes the issue for us. We're not sure where the fe80:0:0:0:0:0:0:1%1 address comes from though. I don't recall having this issue with other server side software so this might be a mix of platform setup, osx specific defaults, and zookeeper behavior.
> I've seen one ticket that relates to ipv6 in zookeeper that might be related: ZOOKEEPER-667. Perhaps the workaround for that ticket introduced this problem? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira