You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/02 10:55:28 UTC

[GitHub] [ozone] bharatviswa504 opened a new pull request #2299: HDDS-5290. Handle SIGTERM to handle clean shutdown of SCM.

bharatviswa504 opened a new pull request #2299:
URL: https://github.com/apache/ozone/pull/2299


   ## What changes were proposed in this pull request?
   
   Handle SIGTERM to perform cleanshut down in SCM.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5290
   
   ## How was this patch tested?
   
   Tested it on a cluster.
   
   ```
   2021-06-02 10:49:03,620 ERROR org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: RECEIVED SIGNAL 15: SIGTERM
   2021-06-02 10:49:03,624 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping Replication Manager Service.
   2021-06-02 10:49:03,624 INFO org.apache.hadoop.hdds.scm.container.ReplicationManager: Stopping Replication Monitor Thread.
   2021-06-02 10:49:03,624 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping the Datanode Admin Monitor.
   2021-06-02 10:49:03,625 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping Lease Manager of the command watchers
   2021-06-02 10:49:03,625 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping datanode service RPC server
   2021-06-02 10:49:03,625 INFO org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer: Stopping the RPC server for DataNodes
   2021-06-02 10:49:03,625 INFO org.apache.hadoop.ipc.Server: Stopping server on 9861
   2021-06-02 10:49:03,630 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 9861
   2021-06-02 10:49:03,632 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
   2021-06-02 10:49:05,606 WARN org.apache.hadoop.hdds.scm.node.NodeStateManager: Current Thread is interrupted, shutting down HB processing thread for Node Manager.
   2021-06-02 10:49:05,607 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping block service RPC server
   2021-06-02 10:49:05,607 INFO org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer: Stopping the RPC server for Block Protocol
   2021-06-02 10:49:05,607 INFO org.apache.hadoop.ipc.Server: Stopping server on 9863
   2021-06-02 10:49:05,610 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 9863
   2021-06-02 10:49:05,611 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping the StorageContainerLocationProtocol RPC server
   2021-06-02 10:49:05,611 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
   2021-06-02 10:49:05,611 INFO org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer: Stopping the RPC server for Client Protocol
   2021-06-02 10:49:05,611 INFO org.apache.hadoop.ipc.Server: Stopping server on 9860
   2021-06-02 10:49:05,616 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 9860
   2021-06-02 10:49:05,617 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
   2021-06-02 10:49:05,617 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping Storage Container Manager HTTP server.
   2021-06-02 10:49:05,639 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.w.WebAppContext@73fb1d7f{scm,/,null,STOPPED}{jar:file:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.14269015/jars/hadoop-hdds-server-scm-1.1.0.7.1.7.0-414.jar!/webapps/scm}
   2021-06-02 10:49:05,644 INFO org.eclipse.jetty.server.AbstractConnector: Stopped ServerConnector@332820f4{HTTP/1.1, (http/1.1)}{0.0.0.0:9876}
   2021-06-02 10:49:05,644 INFO org.eclipse.jetty.server.session: node0 Stopped scavenging
   2021-06-02 10:49:05,645 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@12968227{static,/static,jar:file:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.14269015/jars/hadoop-hdds-server-scm-1.1.0.7.1.7.0-414.jar!/webapps/static,STOPPED}
   2021-06-02 10:49:05,646 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@58496c97{logs,/logs,file:///var/log/hadoop-ozone/,STOPPED}
   2021-06-02 10:49:05,647 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping Block Manager Service.
   2021-06-02 10:49:05,647 INFO org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service SCMBlockDeletingService
   2021-06-02 10:49:05,647 INFO org.apache.hadoop.hdds.utils.BackgroundService: Shutting down service SCMBlockDeletingService
   2021-06-02 10:49:05,648 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping SCM Event Queue.
   2021-06-02 10:49:05,652 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping SCM HA services.
   2021-06-02 10:49:05,652 INFO org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl: stopping ratis server 0.0.0.0:9894
   2021-06-02 10:49:05,653 INFO org.apache.ratis.server.RaftServer: 3466735b-3f79-43d7-b9d0-de3452b5dacc: close
   2021-06-02 10:49:05,654 INFO org.apache.ratis.server.RaftServer$Division: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790: shutdown
   2021-06-02 10:49:05,654 INFO org.apache.ratis.util.JmxRegister: Successfully un-registered JMX Bean with object name Ratis:service=RaftServer,group=group-9FF618EF3790,id=3466735b-3f79-43d7-b9d0-de3452b5dacc
   2021-06-02 10:49:05,655 INFO org.apache.ratis.server.impl.RoleInfo: 3466735b-3f79-43d7-b9d0-de3452b5dacc: shutdown 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790-FollowerState
   2021-06-02 10:49:05,655 INFO org.apache.ratis.server.impl.StateMachineUpdater: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790-StateMachineUpdater: set stopIndex = 1108
   2021-06-02 10:49:05,656 INFO org.apache.ratis.server.impl.FollowerState: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790-FollowerState was interrupted: {}
   java.lang.InterruptedException: sleep interrupted
   	at java.lang.Thread.sleep(Native Method)
   	at java.lang.Thread.sleep(Thread.java:340)
   	at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
   	at org.apache.ratis.util.TimeDuration.sleep(TimeDuration.java:324)
   	at org.apache.ratis.util.TimeDuration.sleep(TimeDuration.java:309)
   	at org.apache.ratis.server.impl.FollowerState.run(FollowerState.java:118)
   2021-06-02 10:49:05,656 INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: Current Snapshot Index 1108, takeSnapshot took 1 ms
   2021-06-02 10:49:05,659 INFO org.apache.ratis.server.impl.StateMachineUpdater: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790-StateMachineUpdater: Took a snapshot at index 1108
   2021-06-02 10:49:05,659 INFO org.apache.ratis.server.impl.StateMachineUpdater: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790-StateMachineUpdater: snapshotIndex: updateIncreasingly 1105 -> 1108
   2021-06-02 10:49:05,664 INFO org.apache.ratis.metrics.RatisMetrics: Unregistering Metrics Registry : ratis.state_machine.3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790
   2021-06-02 10:49:05,665 INFO org.apache.ratis.server.RaftServer$Division: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790: closes. applyIndex: 1108
   2021-06-02 10:49:05,666 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790-SegmentedRaftLogWorker was interrupted, exiting. There are 0 tasks remaining in the queue.
   2021-06-02 10:49:05,666 INFO org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790-SegmentedRaftLogWorker close()
   2021-06-02 10:49:05,667 INFO org.apache.ratis.metrics.RatisMetrics: Unregistering Metrics Registry : ratis.log_worker.3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790
   2021-06-02 10:49:05,667 INFO org.apache.ratis.metrics.RatisMetrics: Unregistering Metrics Registry : ratis.leader_election.3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790
   2021-06-02 10:49:05,667 INFO org.apache.ratis.metrics.RatisMetrics: Unregistering Metrics Registry : ratis.server.3466735b-3f79-43d7-b9d0-de3452b5dacc@group-9FF618EF3790
   2021-06-02 10:49:05,667 INFO org.apache.ratis.grpc.server.GrpcService: 3466735b-3f79-43d7-b9d0-de3452b5dacc: shutdown server with port 9894 now
   2021-06-02 10:49:05,682 INFO org.apache.ratis.grpc.server.GrpcService: 3466735b-3f79-43d7-b9d0-de3452b5dacc: shutdown server with port 9894 successfully
   2021-06-02 10:49:05,682 INFO org.apache.ratis.util.JvmPauseMonitor: JvmPauseMonitor-3466735b-3f79-43d7-b9d0-de3452b5dacc: Stopped
   2021-06-02 10:49:05,684 WARN org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator: RatisPipelineUtilsThread is not running, just ignore.
   2021-06-02 10:49:05,685 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManager: Stopping SCM MetadataStore.
   2021-06-02 10:49:05,686 WARN org.apache.ratis.grpc.server.GrpcServerProtocolService: 3466735b-3f79-43d7-b9d0-de3452b5dacc: installSnapshot onError, lastRequest: 23c8fce1-de0c-4630-b788-fa606b2d1ff1->3466735b-3f79-43d7-b9d0-de3452b5dacc#81826-t8,previous=(t:8, i:1107),leaderCommit=1107,initializing? true,entries: size=1, first=(t:8, i:1108), METADATAENTRY(c:1107): org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: CANCELLED: client cancelled
   2021-06-02 10:49:05,689 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping StorageContainerManager metrics system...
   2021-06-02 10:49:05,690 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: prometheus thread interrupted.
   2021-06-02 10:49:05,691 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: StorageContainerManager metrics system stopped.
   2021-06-02 10:49:05,692 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG:
   /************************************************************
   SHUTDOWN_MSG: Shutting down StorageContainerManager at xxx
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 closed pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 closed pull request #2299:
URL: https://github.com/apache/ozone/pull/2299


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2299:
URL: https://github.com/apache/ozone/pull/2299#discussion_r644449196



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       Yes, 1s is very short. I think it should be in minutes rather than seconds.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2299:
URL: https://github.com/apache/ozone/pull/2299#discussion_r644474492



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       I see the default is 30 seconds.
   
   ```
     /** Default shutdown hook timeout: {@value} seconds. */
     public static final long SERVICE_SHUTDOWN_TIMEOUT_DEFAULT = 30;
   
     static long getShutdownTimeout(Configuration conf) {
       long duration = conf.getTimeDuration(
           SERVICE_SHUTDOWN_TIMEOUT,
           SERVICE_SHUTDOWN_TIMEOUT_DEFAULT,
           TIME_UNIT_DEFAULT);
       if (duration < TIMEOUT_MINIMUM) {
         duration = TIMEOUT_MINIMUM;
       }
       return duration;
     }
   ```

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       Introduced a config and handled this as part of #2301 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2299:
URL: https://github.com/apache/ozone/pull/2299#discussion_r644474492



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       I see the default is 30 seconds.
   
   ```
     /** Default shutdown hook timeout: {@value} seconds. */
     public static final long SERVICE_SHUTDOWN_TIMEOUT_DEFAULT = 30;
   
     static long getShutdownTimeout(Configuration conf) {
       long duration = conf.getTimeDuration(
           SERVICE_SHUTDOWN_TIMEOUT,
           SERVICE_SHUTDOWN_TIMEOUT_DEFAULT,
           TIME_UNIT_DEFAULT);
       if (duration < TIMEOUT_MINIMUM) {
         duration = TIMEOUT_MINIMUM;
       }
       return duration;
     }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2299:
URL: https://github.com/apache/ozone/pull/2299#discussion_r644449196



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       Yes, 1s is very short. I think it should be in minutes rather than seconds.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] xiaoyuyao commented on a change in pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #2299:
URL: https://github.com/apache/ozone/pull/2299#discussion_r644333913



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       should we add a longer timeout here? The default 1s may be too short. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] xiaoyuyao commented on a change in pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #2299:
URL: https://github.com/apache/ozone/pull/2299#discussion_r644333913



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       should we add a longer timeout here? The default 1s may be too short. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2299:
URL: https://github.com/apache/ozone/pull/2299#discussion_r644551597



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManagerStarter.java
##########
@@ -164,7 +166,15 @@ private void commonInit() {
     public void start(OzoneConfiguration conf) throws Exception {
       StorageContainerManager stm = StorageContainerManager.createSCM(conf);
       stm.start();
-      stm.join();
+
+      ShutdownHookManager.get().addShutdownHook(() -> {

Review comment:
       Introduced a config and handled this as part of #2301 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 closed pull request #2299: HDDS-5290. Handle SIGTERM to ensure clean shutdown of SCM.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 closed pull request #2299:
URL: https://github.com/apache/ozone/pull/2299


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org