You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/06 10:41:17 UTC

[GitHub] [ozone] adoroszlai opened a new pull request, #3387: HDDS-6703. Install snapshot should wait for RPC server to stop

adoroszlai opened a new pull request, #3387:
URL: https://github.com/apache/ozone/pull/3387

   ## What changes were proposed in this pull request?
   
   HDDS-6685 changed install snapshot logic in OM, restarting various OM components during the process.  However, it does not wait for OM RPC server to stop, leading to intermittent `BindException` a bit later during start.
   
   ```
   2022-05-05 13:41:05,653 [pool-2368-thread-1] ERROR om.OzoneManager (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start RPC Server.
   java.net.BindException: Problem binding to [localhost:39581] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
   	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809)
   	at org.apache.hadoop.ipc.Server.bind(Server.java:640)
   	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1225)
   	at org.apache.hadoop.ipc.Server.<init>(Server.java:3117)
   	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1062)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.<init>(ProtobufRpcEngine2.java:464)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:434)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:361)
   	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
   	at org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:1084)
   	at org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:1054)
   	at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3366)
   	at org.apache.hadoop.ozone.om.OzoneManager.installCheckpoint(OzoneManager.java:3254)
   	at org.apache.hadoop.ozone.om.OzoneManager.installSnapshotFromLeader(OzoneManager.java:3231)
   ```
   
   Note that `TestOzoneManagerPrepare` may still fail at a later step, but not due to `BindException`.
   
   https://issues.apache.org/jira/browse/HDDS-6703
   
   ## How was this patch tested?
   
   Regular CI:
   https://github.com/adoroszlai/hadoop-ozone/runs/6308878370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai closed pull request #3387: HDDS-6703. Install snapshot should wait for RPC server to stop

Posted by GitBox <gi...@apache.org>.
adoroszlai closed pull request #3387: HDDS-6703. Install snapshot should wait for RPC server to stop
URL: https://github.com/apache/ozone/pull/3387


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3387: HDDS-6703. Install snapshot should wait for RPC server to stop

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3387:
URL: https://github.com/apache/ozone/pull/3387#discussion_r867268365


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java:
##########
@@ -3297,6 +3297,7 @@ TermIndex installCheckpoint(String leaderId, Path checkpointLocation,
     if (canProceed) {
       // Stop RPC server before stop metadataManager
       omRpcServer.stop();
+      omRpcServer.join();

Review Comment:
   Is this needed? 
   `Server.java`
   ```
     public synchronized void stop() {
       LOG.info("Stopping server on " + this.port);
       this.running = false;
       if (this.handlers != null) {
         for(int i = 0; i < this.handlerCount; ++i) {
           if (this.handlers[i] != null) {
             this.handlers[i].interrupt();
           }
         }
       }
   
       this.listener.interrupt();
       this.listener.doStop();
       this.responder.interrupt();
       this.notifyAll();
       this.rpcMetrics.shutdown();
       this.rpcDetailedMetrics.shutdown();
     }
   ```
   and 
   ```  
   public synchronized void join() throws InterruptedException {
       while(this.running) {
         this.wait();
       }
   
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi commented on pull request #3387: HDDS-6703. Disable failing testPrepareDownedOM until Ratis upgrade

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on PR #3387:
URL: https://github.com/apache/ozone/pull/3387#issuecomment-1120881358

   +1.  Yes, we need RATIS-1481 to have a successful checkpoint installation.  
   
   Thanks @adoroszlai 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi merged pull request #3387: HDDS-6703. Disable failing testPrepareDownedOM until Ratis upgrade

Posted by GitBox <gi...@apache.org>.
ChenSammi merged PR #3387:
URL: https://github.com/apache/ozone/pull/3387


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai closed pull request #3387: HDDS-6703. Disable failing testPrepareDownedOM until Ratis upgrade

Posted by GitBox <gi...@apache.org>.
adoroszlai closed pull request #3387: HDDS-6703. Disable failing testPrepareDownedOM until Ratis upgrade
URL: https://github.com/apache/ozone/pull/3387


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3387: HDDS-6703. Install snapshot should wait for RPC server to stop

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3387:
URL: https://github.com/apache/ozone/pull/3387#discussion_r867308926


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java:
##########
@@ -3297,6 +3297,7 @@ TermIndex installCheckpoint(String leaderId, Path checkpointLocation,
     if (canProceed) {
       // Stop RPC server before stop metadataManager
       omRpcServer.stop();
+      omRpcServer.join();

Review Comment:
   Your are right, this is for other threads.  I'll need to debug further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3387: HDDS-6703. Disable failing testPrepareDownedOM until Ratis upgrade

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3387:
URL: https://github.com/apache/ozone/pull/3387#issuecomment-1120915203

   Thanks @kaijchen, @kerneltime for the review, @ChenSammi for reviewing and merging this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3387: HDDS-6703. Install snapshot should wait for RPC server to stop

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3387:
URL: https://github.com/apache/ozone/pull/3387#discussion_r867268394


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerPrepare.java:
##########
@@ -162,6 +163,7 @@ public void testPrepareDownedOM() throws Exception {
     // it missed once it receives the prepare transaction.
     cluster.restartOzoneManager(downedOM, true);
     runningOms.add(shutdownOMIndex, downedOM);
+    ExitUtils.assertNotTerminated();

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3387: HDDS-6703. Install snapshot should wait for RPC server to stop

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3387:
URL: https://github.com/apache/ozone/pull/3387#discussion_r867308926


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java:
##########
@@ -3297,6 +3297,7 @@ TermIndex installCheckpoint(String leaderId, Path checkpointLocation,
     if (canProceed) {
       // Stop RPC server before stop metadataManager
       omRpcServer.stop();
+      omRpcServer.join();

Review Comment:
   Your are right, this is for other threads.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org