You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by zh...@apache.org on 2023/03/16 03:33:37 UTC

[incubator-celeborn] branch main updated: [CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354)

This is an automated email from the ASF dual-hosted git repository.

zhouky pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git


The following commit(s) were added to refs/heads/main by this push:
     new 599bdbeb7 [CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354)
599bdbeb7 is described below

commit 599bdbeb72a945297ce0b5ab5f86bbc8ed6c094d
Author: Ethan Feng <et...@apache.org>
AuthorDate: Thu Mar 16 11:33:32 2023 +0800

    [CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354)
---
 README.md           | 20 ++++++++++++--------
 conf/hosts.template | 24 ++++++++++++++++++++++++
 sbin/start-all.sh   |  2 ++
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index bce2a57f3..8c1451ba3 100644
--- a/README.md
+++ b/README.md
@@ -128,13 +128,16 @@ celeborn.worker.storage.dirs /mnt/disk1/,/mnt/disk2
 celeborn.worker.monitor.disk.enabled false
 ```
 4. Copy Celeborn and configurations to all nodes
-5. Start Celeborn master
-   `$CELEBORN_HOME/sbin/start-master.sh`
-6. Start Celeborn worker
-   For single master cluster : `$CELEBORN_HOME/sbin/start-worker.sh rss://<master-host>:<master-port>`
-   For HA cluster :`$CELEBORN_HOME/sbin/start-worker.sh`
-7. If Celeborn start success, the output of Master's log should be like this:
-```angular2html
+5. Start all services. If you install Celeborn distribution in same path on every node and your
+   cluster can perform SSH login then you can fill `$CELEBORN_HOME/conf/hosts` and
+   use `$CELEBORN_HOME/sbin/start-all.sh` to start all
+   services. If the installation paths are not identical, you will need to start service manually.  
+   Start Celeborn master  
+   `$CELEBORN_HOME/sbin/start-master.sh`  
+   Start Celeborn worker  
+   `$CELEBORN_HOME/sbin/start-worker.sh`
+6. If Celeborn start success, the output of Master's log should be like this:
+```
 22/10/08 19:29:11,805 INFO [main] Dispatcher: Dispatcher numThreads: 64
 22/10/08 19:29:11,875 INFO [main] TransportClientFactory: mode NIO threads 64
 22/10/08 19:29:12,057 INFO [main] Utils: Successfully started service 'MasterSys' on port 9097.
@@ -172,7 +175,8 @@ spark.shuffle.service.enabled false
 # Sort shuffle writer use less memory than hash shuffle writer, if your shuffle partition count is large, try to use sort hash writer.  
 spark.celeborn.shuffle.writer hash
 
-# we recommend set spark.celeborn.push.replicate.enabled to true to enable server-side data replication 
+# we recommend set spark.celeborn.push.replicate.enabled to true to enable server-side data replication
+# If you have only one worker, this setting must be false 
 spark.celeborn.push.replicate.enabled true
 
 # Support for Spark AQE only tested under Spark 3
diff --git a/conf/hosts.template b/conf/hosts.template
new file mode 100644
index 000000000..b5f8f57d4
--- /dev/null
+++ b/conf/hosts.template
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+[master]
+node1
+
+[worker]
+node2
+node3
+node4
\ No newline at end of file
diff --git a/sbin/start-all.sh b/sbin/start-all.sh
index d06170688..359b5ea02 100755
--- a/sbin/start-all.sh
+++ b/sbin/start-all.sh
@@ -47,6 +47,8 @@ do
     sleep $CELEBORN_SLEEP
   fi
 done
+# pause 5 seconds to make sure that master is ready.
+sleep 5s
 
 # start workers
 for host in `echo "$HOST_LIST" | sed  "s/#.*$//;/^$/d" | grep '\[worker\]' | awk '{print $NF}'`