You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by et...@apache.org on 2023/03/16 03:35:34 UTC
[incubator-celeborn] branch branch-0.2 updated: [CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354)
This is an automated email from the ASF dual-hosted git repository.
ethanfeng pushed a commit to branch branch-0.2
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/branch-0.2 by this push:
new 33b2471da [CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354)
33b2471da is described below
commit 33b2471daf5215b1e82281405fd92851b995103b
Author: Ethan Feng <et...@apache.org>
AuthorDate: Thu Mar 16 11:33:32 2023 +0800
[CELEBORN-420] Add hosts template and docs about start-all scripts. (#1354)
(cherry picked from commit 599bdbeb72a945297ce0b5ab5f86bbc8ed6c094d)
---
README.md | 20 ++++++++++++--------
conf/hosts.template | 24 ++++++++++++++++++++++++
sbin/start-all.sh | 2 ++
3 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index e75a0e9cb..9727a2f0c 100644
--- a/README.md
+++ b/README.md
@@ -127,13 +127,16 @@ celeborn.worker.storage.dirs /mnt/disk1/,/mnt/disk2
celeborn.worker.monitor.disk.enabled false
```
4. Copy Celeborn and configurations to all nodes
-5. Start Celeborn master
- `$CELEBORN_HOME/sbin/start-master.sh`
-6. Start Celeborn worker
- For single master cluster : `$CELEBORN_HOME/sbin/start-worker.sh rss://<master-host>:<master-port>`
- For HA cluster :`$CELEBORN_HOME/sbin/start-worker.sh`
-7. If Celeborn start success, the output of Master's log should be like this:
-```angular2html
+5. Start all services. If you install Celeborn distribution in same path on every node and your
+ cluster can perform SSH login then you can fill `$CELEBORN_HOME/conf/hosts` and
+ use `$CELEBORN_HOME/sbin/start-all.sh` to start all
+ services. If the installation paths are not identical, you will need to start service manually.
+ Start Celeborn master
+ `$CELEBORN_HOME/sbin/start-master.sh`
+ Start Celeborn worker
+ `$CELEBORN_HOME/sbin/start-worker.sh`
+6. If Celeborn start success, the output of Master's log should be like this:
+```
22/10/08 19:29:11,805 INFO [main] Dispatcher: Dispatcher numThreads: 64
22/10/08 19:29:11,875 INFO [main] TransportClientFactory: mode NIO threads 64
22/10/08 19:29:12,057 INFO [main] Utils: Successfully started service 'MasterSys' on port 9097.
@@ -171,7 +174,8 @@ spark.shuffle.service.enabled false
# Sort shuffle writer use less memory than hash shuffle writer, if your shuffle partition count is large, try to use sort hash writer.
spark.celeborn.shuffle.writer hash
-# we recommend set spark.celeborn.push.replicate.enabled to true to enable server-side data replication
+# we recommend set spark.celeborn.push.replicate.enabled to true to enable server-side data replication
+# If you have only one worker, this setting must be false
spark.celeborn.push.replicate.enabled true
# Support for Spark AQE only tested under Spark 3
diff --git a/conf/hosts.template b/conf/hosts.template
new file mode 100644
index 000000000..b5f8f57d4
--- /dev/null
+++ b/conf/hosts.template
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+[master]
+node1
+
+[worker]
+node2
+node3
+node4
\ No newline at end of file
diff --git a/sbin/start-all.sh b/sbin/start-all.sh
index a07471d33..75b8a06bc 100755
--- a/sbin/start-all.sh
+++ b/sbin/start-all.sh
@@ -47,6 +47,8 @@ do
sleep $CELEBORN_SLEEP
fi
done
+# pause 5 seconds to make sure that master is ready.
+sleep 5s
# start workers
for host in `echo "$HOST_LIST" | sed "s/#.*$//;/^$/d" | grep '\[worker\]' | awk '{print $NF}'`