You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by ch...@apache.org on 2023/09/28 11:20:06 UTC

[incubator-celeborn] 06/07: [CELEBORN-1013] Shutdown master if initialized failed

This is an automated email from the ASF dual-hosted git repository.

chengpan pushed a commit to branch branch-0.3
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git

commit fd5ecc1e31474adef9d4e18a01bf42d1f1899536
Author: sychen <sy...@ctrip.com>
AuthorDate: Thu Sep 28 19:02:59 2023 +0800

    [CELEBORN-1013] Shutdown master if initialized failed
    
    ### What changes were proposed in this pull request?
    ```java
    23/09/28 14:48:12,512 ERROR [main] Master: Initialize master failed.
    java.net.BindException: Address already in use
            at sun.nio.ch.Net.bind0(Native Method)
            at sun.nio.ch.Net.bind(Net.java:461)
            at sun.nio.ch.Net.bind(Net.java:453)
            at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
            at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
            at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)
            at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
    ```
    
    ### Why are the changes needed?
    For example, bind's http service port(`celeborn.metrics.master.prometheus.port`) port is occupied and master startup fails, but because the thread started by Raft is not a daemon, the master process still exists.
    
    https://github.com/apache/ratis/blob/d461a01a53e7e130f0ec4143e75b316012137b62/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L283-L290
    
    ### Does this PR introduce _any_ user-facing change?
    
    ### How was this patch tested?
    
    Closes #1945 from cxzl25/CELEBORN-1013.
    
    Authored-by: sychen <sy...@ctrip.com>
    Signed-off-by: Cheng Pan <ch...@apache.org>
    (cherry picked from commit 16bf2aeeaa1767db5f6b676818ce4c9062ed2608)
    Signed-off-by: Cheng Pan <ch...@apache.org>
---
 .../org/apache/celeborn/service/deploy/master/Master.scala     | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala b/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
index 4344461c5..737a8a9b5 100644
--- a/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
+++ b/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
@@ -973,7 +973,13 @@ private[deploy] object Master extends Logging {
   def main(args: Array[String]): Unit = {
     val conf = new CelebornConf()
     val masterArgs = new MasterArguments(args, conf)
-    val master = new Master(conf, masterArgs)
-    master.initialize()
+    try {
+      val master = new Master(conf, masterArgs)
+      master.initialize()
+    } catch {
+      case e: Throwable =>
+        logError("Initialize master failed.", e)
+        System.exit(-1)
+    }
   }
 }