You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2022/01/25 14:47:25 UTC
[GitHub] [zookeeper] andrekramer1 commented on pull request #1798: ZOOKEEPER-3988 rg.apache.zookeeper.server.NettyServerCnxn.receiveMessage throws NullPointerException
andrekramer1 commented on pull request #1798:
URL: https://github.com/apache/zookeeper/pull/1798#issuecomment-1021258557
On this PR: NettyServerCnxnFactory.java line 462 zkServer.serverStats().incrementAuthFailedCount(); Probably also could use an if (zkServer != null). There are uses of zks in NettyServerCnxn that I was also protecting from being null.
It would be better if the ServerCnxn flavours were more similar as you pointed out it does a close early on exceptions. Then it may not need the SSL handshake suppression for example.
A minimal fix is probably best but in testing just branch/PR I just get it stuck at launching Zookeeper 0:
14:34:25.541 [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled)] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification time out: 25600 ms
14:34:39.674 [epollEventLoopGroup-4-1] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
14:34:47.441 [epollEventLoopGroup-4-2] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
14:34:51.147 [QuorumConnectionThread-[myid=1]-1] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 2 at election address pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
java.net.UnknownHostException: pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
at java.net.Socket.connect(Socket.java:609) ~[?:?]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
14:34:51.158 [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled)] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification time out: 51200 ms
14:34:51.160 [QuorumConnectionThread-[myid=1]-1] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
java.net.UnknownHostException: pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
at java.net.Socket.connect(Socket.java:609) ~[?:?]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
14:35:09.729 [epollEventLoopGroup-4-1] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
14:35:17.480 [epollEventLoopGroup-4-2] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
14:35:39.785 [epollEventLoopGroup-4-1] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
14:35:42.371 [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled)] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification time out: 60000 ms
14:35:42.371 [QuorumConnectionThread-[myid=1]-5] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 3 at election address pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
java.net.UnknownHostException: pulsar-zookeeper-2.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
at java.net.Socket.connect(Socket.java:609) ~[?:?]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
14:35:42.374 [QuorumConnectionThread-[myid=1]-4] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 2 at election address pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local:3888
java.net.UnknownHostException: pulsar-zookeeper-1.pulsar-zookeeper.c8y-messaging-service.svc.cluster.local
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) ~[?:?]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
at java.net.Socket.connect(Socket.java:609) ~[?:?]
at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) [org.apache.zookeeper-zookeeper-3.8.0-SNAPSHOT.jar:3.8.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
14:35:47.425 [epollEventLoopGroup-4-2] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
14:36:09.738 [epollEventLoopGroup-4-1] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
14:36:17.448 [epollEventLoopGroup-4-2] WARN org.apache.zookeeper.server.NettyServerCnxnFactory - Zookeeper server is not running, close the connection before starting the TLS handshake
I suspect Zookeeper 0 is not responding with "I'm ok/up" so kubernetes never starts up the other two instances. I saw this before and I think I side stepped it by always allowing things to progress and reply "I'm ok/up". Not exactly sure why the NIO server context does not suffer this problem but just closing the connection may not be enough to fix it for Netty/Pulsar/Kubernetes.
So not really sure how to progress this now unless we ignore SSL for Zookeeper/Pulsar/Kubernetes to keep the throttling hack and otherwise try with my original approach if throttling can't be avoided.
Hopefully I've not messed up testing or something else in building your pull request. But probably we need some more testing with Pulsar/Kubernetes at the very least?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org