You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Hugh Warrington (JIRA)" <ji...@apache.org> on 2011/01/28 15:11:43 UTC

[jira] Commented: (ZOOKEEPER-979) UnknownHostException in QuorumCnxManager

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988088#action_12988088 ] 

Hugh Warrington commented on ZOOKEEPER-979:
-------------------------------------------

Ok, I think I've got to the bottom of this. We were programmatically building a java.util.Properties object, using

for (InetAddress host : hosts) {
    properties.put(String.format("server.%d", i++), String.format("%s:2888:3888", host.toString()));
}

This was building properties of the form

/10.0.0.1:2888:3888

Notice the leading slash. We then passed the Properties object into QuorumPeerConfig.parseProperties(), which duly constructs an InetSocketAddress with hostname '/10.0.0.1' and port 3888. Note that since the hostname contains the bogus character at the start, the resulting electionAddr.isUnresolved() will be true, since the attempt to resolve the hostname will have failed.

Everything then continues until the first attempt is made to do Socket.connect() with that InetSocketAddress. At this point, some undocumented behaviour in the Socket class comes into play. In sun.nio.ch.SocketAdaptor.connect() (line 140 in openjdk 1.6.0_17 that I'm using) it calls Net.translateException(), which takes the UnresolvedAddressException and instead throws an UnknownHostException. The rationale behind this seems to be that UnresolvedHostException is an unchecked exception, and they want to throw an IOException ("Throw UnknownHostException from here since it cannot be thrown as a SocketException"). So instead they just obscure the true source of the problem, and the developer is none the wiser. It doesn't seem to be stated anywhere, but apparently you may only call Socket.connect() with a resolved InetSocketAddress.

Anyway, it seems to me the thing to do here would be to try to resolve the provided server addresses much earlier. Perhaps even in QuorumPeerConfig, via InetAddress.getByName().

> UnknownHostException in QuorumCnxManager
> ----------------------------------------
>
>                 Key: ZOOKEEPER-979
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-979
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2
>            Reporter: Hugh Warrington
>            Priority: Minor
>
> I'm using zk 3.3.2 and I'm seeing this in my logs around startup:
> 2011-01-27 10:16:21,513 [WorkerSender Thread] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 0 at election address xxx.yyy.com/10.2.131.19:3888
> java.net.UnknownHostException
> 	at sun.nio.ch.Net.translateException(Net.java:100)
> 	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:140)
> 	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:366)
> 	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:335)
> 	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
> 	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
> 	at java.lang.Thread.run(Thread.java:636)
> And all subsequent zk ops give {{ConnectionLossException}}.
> I've just explained this to breed_zk on IRC, and he asked me to file a ticket, mentioning that UnknownHostException may sometimes be thrown for reasons other than host resolution. While I'm reasonably certain that the hostname is correct and should be contactable, I need to put some more time into checking our network setup to be absolutely sure. However, two observations arose while looking into this:
> * At the top of QuorumCnxManager.connectOne(), we set electionAddr (or fail and return). But then a few lines later we don't actually use this local variable in the call to connect(). This seems like a minor programming mistake (although AFAICT it doesn't change the behaviour).
> * In the subsequent catch block, the UnknownHostException that's thrown doesn't contain the address that we were trying to connect to (though if you capture WARN log messages, you can see what it was).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.