You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by ch...@apache.org on 2023/09/28 11:20:06 UTC
[incubator-celeborn] 06/07: [CELEBORN-1013] Shutdown master if initialized failed
This is an automated email from the ASF dual-hosted git repository.
chengpan pushed a commit to branch branch-0.3
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
commit fd5ecc1e31474adef9d4e18a01bf42d1f1899536
Author: sychen <sy...@ctrip.com>
AuthorDate: Thu Sep 28 19:02:59 2023 +0800
[CELEBORN-1013] Shutdown master if initialized failed
### What changes were proposed in this pull request?
```java
23/09/28 14:48:12,512 ERROR [main] Master: Initialize master failed.
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:461)
at sun.nio.ch.Net.bind(Net.java:453)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
```
### Why are the changes needed?
For example, bind's http service port(`celeborn.metrics.master.prometheus.port`) port is occupied and master startup fails, but because the thread started by Raft is not a daemon, the master process still exists.
https://github.com/apache/ratis/blob/d461a01a53e7e130f0ec4143e75b316012137b62/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L283-L290
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes #1945 from cxzl25/CELEBORN-1013.
Authored-by: sychen <sy...@ctrip.com>
Signed-off-by: Cheng Pan <ch...@apache.org>
(cherry picked from commit 16bf2aeeaa1767db5f6b676818ce4c9062ed2608)
Signed-off-by: Cheng Pan <ch...@apache.org>
---
.../org/apache/celeborn/service/deploy/master/Master.scala | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala b/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
index 4344461c5..737a8a9b5 100644
--- a/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
+++ b/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
@@ -973,7 +973,13 @@ private[deploy] object Master extends Logging {
def main(args: Array[String]): Unit = {
val conf = new CelebornConf()
val masterArgs = new MasterArguments(args, conf)
- val master = new Master(conf, masterArgs)
- master.initialize()
+ try {
+ val master = new Master(conf, masterArgs)
+ master.initialize()
+ } catch {
+ case e: Throwable =>
+ logError("Initialize master failed.", e)
+ System.exit(-1)
+ }
}
}