You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2021/04/12 07:40:23 UTC

[GitHub] [zookeeper] symat opened a new pull request #1681: ZOOKEEPER-4247: NPE while processing messages during quorum member restart

symat opened a new pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681


   When a ZooKeeper server realizes that an other quorum peer was shut down (e.g. during a rolling upgrade or
   rolling restart), the ServerCnxn.zkServer variable is set to null by QuorumPear.close(). This is why in the code
   we usually check the zkServer variable before using it. But this check was missing in one place thus causing 
   NPE in NettyServerCnx.receiveMessage:
   
   ```
   2021-02-08T12:42:08.229+0000 [myid:] - ERROR 
   [nioEventLoopGroup-4-1:NettyServerCnxnFactory$CnxnChannelHandler@329]- Unexpected exception in receive
    java.lang.NullPointerException: null ~[zookeeper-3.6.2.jar:3.6.2]
    at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:518) 
    at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:368) 
    at org.apache.zookeeper.server.NettyServerCnxnFactory
           $CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:326)
   ...
   ```
   
   In this commit we add the necessary check and (after throwing an IOException) we will basically ignore the 
   processing of the received message when the remote ZooKeeper server is already down.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] asfgit closed pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] arshadmohammad commented on a change in pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
arshadmohammad commented on a change in pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#discussion_r611499234



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
##########
@@ -520,8 +520,12 @@ private void receiveMessage(ByteBuf message) {
                         if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                             throw new IOException("Len error " + len);
                         }
-                        // checkRequestSize will throw IOException if request is rejected
-                        zkServer.checkRequestSizeWhenReceivingMessage(len);
+                        if (zkServer == null || !zkServer.isRunning()) {
+                            throw new IOException("ZK down");
+                        } else {

Review comment:
       can we move `zkServer.checkRequestSizeWhenReceivingMessage(len);` out of else block  and remove the else block.
   otherwise logically there are more statements which should be in else block
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] symat commented on a change in pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
symat commented on a change in pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#discussion_r611652247



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
##########
@@ -520,6 +520,9 @@ private void receiveMessage(ByteBuf message) {
                         if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                             throw new IOException("Len error " + len);
                         }
+                        if (zkServer == null || !zkServer.isRunning()) {
+                            throw new IOException("ZK down");
+                        }

Review comment:
       actually I would prefer to use the following approach, copied from an other part of this function: https://github.com/apache/zookeeper/blob/7fdadf7273f34dd0552db25a3771cf55b65e9208/zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java#L475-L478
   
   This way we kind of "save" the reference of `zkServer` to an other variable and if `zkServer` becomes null right after we did the check, we are still in a safe spot. Although the final solution would be to use locks and not allowing to use/modify `zkServer` paralelly from multiple threads... But that would be a more serious refactoring.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] arshadmohammad commented on a change in pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
arshadmohammad commented on a change in pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#discussion_r611659277



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
##########
@@ -520,6 +520,9 @@ private void receiveMessage(ByteBuf message) {
                         if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                             throw new IOException("Len error " + len);
                         }
+                        if (zkServer == null || !zkServer.isRunning()) {
+                            throw new IOException("ZK down");
+                        }

Review comment:
       Great!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] symat commented on pull request #1681: ZOOKEEPER-4247: NPE while processing messages during quorum member restart

Posted by GitBox <gi...@apache.org>.
symat commented on pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#issuecomment-817568630


   I think we should cherry-pick this small bugfix to all active branches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] symat commented on pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
symat commented on pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#issuecomment-817865344


   > I will merge this PR after 3.6.3 release.
   
   Thank you @arshadmohammad !
   
   Also thanks for the comments and quick review for both you and @eolivelli !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] symat commented on a change in pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
symat commented on a change in pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#discussion_r611653116



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
##########
@@ -520,6 +520,9 @@ private void receiveMessage(ByteBuf message) {
                         if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                             throw new IOException("Len error " + len);
                         }
+                        if (zkServer == null || !zkServer.isRunning()) {
+                            throw new IOException("ZK down");
+                        }

Review comment:
       I just pushed a new commit, what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] arshadmohammad commented on pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
arshadmohammad commented on pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#issuecomment-817841943


   I will merge this PR after 3.6.3 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] arshadmohammad commented on a change in pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
arshadmohammad commented on a change in pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#discussion_r611642449



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
##########
@@ -520,6 +520,9 @@ private void receiveMessage(ByteBuf message) {
                         if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                             throw new IOException("Len error " + len);
                         }
+                        if (zkServer == null || !zkServer.isRunning()) {
+                            throw new IOException("ZK down");
+                        }

Review comment:
       sorry I missed in previous review, I think it is better to handle the same way as it is handled in `NIOServerCnxn` to be consistent with all types of connections
   ```
   if (!isZKServerRunning()) {
         throw new IOException("ZooKeeperServer not running");
     }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] symat commented on a change in pull request #1681: ZOOKEEPER-4247: NPE while processing message from restarted quorum member

Posted by GitBox <gi...@apache.org>.
symat commented on a change in pull request #1681:
URL: https://github.com/apache/zookeeper/pull/1681#discussion_r611618896



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
##########
@@ -520,8 +520,12 @@ private void receiveMessage(ByteBuf message) {
                         if (len < 0 || len > BinaryInputArchive.maxBuffer) {
                             throw new IOException("Len error " + len);
                         }
-                        // checkRequestSize will throw IOException if request is rejected
-                        zkServer.checkRequestSizeWhenReceivingMessage(len);
+                        if (zkServer == null || !zkServer.isRunning()) {
+                            throw new IOException("ZK down");
+                        } else {

Review comment:
       yeah, it is cleaner without the else block, I agree. And functionally equivalent. Thanks! I pushed a new commit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org