You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Enis Söztutar <en...@apache.org> on 2012/03/16 20:36:34 UTC

timeouts due to ipv6 loopback

Hi,

While testing with a zk+hbase minicluster, I noticed non-deterministic
timeouts from the zookeeper client. The issue seems to be that the zk
server binds to the localhost address, but the client sometimes gets the
ipv4 loopback address, and sometimes gets the ipv6. The default hosts file
in my Mac OS box is:

127.0.0.1    localhost
::1             localhost
fe80::1%lo0    localhost

So, the problem is, when the client tries to connect to the fe80::1%lo0
address, the hostname resolution causes a timeout (notice the time
difference between 16:16:05 and 16:16:10):

12/03/09 16:16:05 INFO zookeeper.ClientCnxn: Opening socket connection to
server /fe80:0:0:0:0:0:0:1%1:21818
12/03/09 16:16:10 WARN client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration
occurred when trying to find JAAS configuration.
12/03/09 16:16:10 INFO client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
12/03/09 16:16:10 INFO server.NIOServerCnxnFactory: Accepted socket
connection from /fe80:0:0:0:0:0:0:1%1:54176
12/03/09 16:16:10 INFO zookeeper.ClientCnxn: Client session timed out, have
not heard from server in 5008ms for sessionid 0x0, closing socket
connection and attempting reconnect
12/03/09 16:16:10 WARN server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x0, likely client has

I have tracked down this issue, and it seems the timeout is caused by:

ClientCnxn.java: 935

setName(getName().replaceAll("
\\(.*\\)",
                    "(" + addr.getHostName() + ":" + addr.getPort() + ")"));
            try {
                zooKeeperSaslClient = new
ZooKeeperSaslClient("zookeeper/"+addr.getHostName());

the addr.getHostName() obviously tries to do a reverse lookup, which takes
>5 seconds, and at that time, we have not negotiated a timeout with the
server, so the increasing the client timeout has no affect as well. I have
solved my problem by removing the link local entry fe80::1%lo0 from my
hosts file. But I was wondering whether anyone experienced a similar
problem or whether there is really a bug that we have to address, or it is
because of a misconfiguration.

Thanks,
Enis

Ref:
http://superuser.com/questions/241642/what-is-the-relevance-of-fe801lo0-localhost-in-etc-hosts