You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2022/01/25 14:47:25 UTC

[GitHub] [zookeeper] andrekramer1 commented on pull request #1798: ZOOKEEPER-3988 rg.apache.zookeeper.server.NettyServerCnxn.receiveMessage throws NullPointerException

andrekramer1 commented on pull request #1798:
URL: https://github.com/apache/zookeeper/pull/1798#issuecomment-1021258557


   On this PR: NettyServerCnxnFactory.java line 462 zkServer.serverStats().incrementAuthFailedCount(); Probably also could use an if (zkServer != null). There are uses of zks in NettyServerCnxn that I was also protecting from being null. 
   It would be better if the ServerCnxn flavours were more similar as you pointed out it does a close early on exceptions. Then it may not need the SSL handshake suppression for example. 
   
   A minimal fix is probably best but in testing just branch/PR I just get it stuck at launching Zookeeper 0:
   
   14:34:25.541 [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled)] INFO  org.apache.zookeeper.server.quorum.FastLeaderElection - Notification time out: 25600 ms
   14:34:39.674 [epollEventLoopGroup-4-1] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   14:34:47.441 [epollEventLoopGroup-4-2] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   14:34:51.147 [QuorumConnectionThread-[myid=1]-1] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 2 at election address pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
   java.net.UnknownHostException: pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
   	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
   	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
   	at java.net.Socket.connect(Socket.java:609) ~[?:?]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   14:34:51.158 [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled)] INFO  org.apache.zookeeper.server.quorum.FastLeaderElection - Notification time out: 51200 ms
   14:34:51.160 [QuorumConnectionThread-[myid=1]-1] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
   java.net.UnknownHostException: pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
   	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
   	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
   	at java.net.Socket.connect(Socket.java:609) ~[?:?]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   14:35:09.729 [epollEventLoopGroup-4-1] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   14:35:17.480 [epollEventLoopGroup-4-2] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   14:35:39.785 [epollEventLoopGroup-4-1] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   14:35:42.371 [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled)] INFO  org.apache.zookeeper.server.quorum.FastLeaderElection - Notification time out: 60000 ms
   14:35:42.371 [QuorumConnectionThread-[myid=1]-5] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
   java.net.UnknownHostException: pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
   	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
   	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
   	at java.net.Socket.connect(Socket.java:609) ~[?:?]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   14:35:42.374 [QuorumConnectionThread-[myid=1]-4] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 2 at election address pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
   java.net.UnknownHostException: pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
   	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
   	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
   	at java.net.Socket.connect(Socket.java:609) ~[?:?]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   14:35:47.425 [epollEventLoopGroup-4-2] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   14:36:09.738 [epollEventLoopGroup-4-1] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   14:36:17.448 [epollEventLoopGroup-4-2] WARN  org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
   
   I suspect Zookeeper 0 is not responding with "I'm ok/up" so kubernetes never starts up the other two instances. I saw this before and I think I side stepped it by always allowing things to progress and reply "I'm ok/up". Not exactly sure why the NIO server context does not suffer this problem but just closing the connection may not be enough to fix it for Netty/Pulsar/Kubernetes.
   
   So not really sure how to progress this now unless we ignore SSL for Zookeeper/Pulsar/Kubernetes to keep the throttling hack and otherwise try with my original approach if throttling can't be avoided.
   
   Hopefully I've not messed up testing or something else in building your pull request. But probably we need some more testing with Pulsar/Kubernetes at the very least?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org