You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by an...@apache.org on 2023/03/17 04:03:23 UTC
[incubator-celeborn-website] 01/01: [CELEBORN-427] Add doc for http request REST API
This is an automated email from the ASF dual-hosted git repository.
angerszhuuuu pushed a commit to branch MONITORING-RESTFUL-API
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn-website.git
commit 7cceeb267e8e7480066e5a1cfbdd1c43d356ceba
Author: Angerszhuuuu <an...@gmail.com>
AuthorDate: Fri Mar 17 12:03:16 2023 +0800
[CELEBORN-427] Add doc for http request REST API
---
docs/user_guide/monitoring.md | 60 +++++++++++++++++++++++++++++++++++++++++++
mkdocs.yml | 1 +
2 files changed, 61 insertions(+)
diff --git a/docs/user_guide/monitoring.md b/docs/user_guide/monitoring.md
new file mode 100644
index 0000000..9937c51
--- /dev/null
+++ b/docs/user_guide/monitoring.md
@@ -0,0 +1,60 @@
+---
+license: |
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+---
+
+
+Monitoring
+===
+
+There are two ways to monitor Celeborn cluster: prometheus metrics and REST API.
+
+# Metrics
+
+# REST API
+
+In addition to viewing the metrics, Celeborn also support REST API. This gives developers
+an easy way to create new visualizations and monitoring tools for Celeborn and
+also easy for users to get the running status of the service. The REST API is available for
+both master and worker. The endpoints are mounted at `host:port`. For example,
+for the master, they would typically be accessible
+at `http://<master-prometheus-host>:<master-prometheus-port><endpoint>`, and
+for the worker, at `http://<worker-prometheus-host>:<worker-prometheus-port><endpoint>`.
+
+The configuration of `<master-prometheus-host>`, `<master-prometheus-port>`, `<worker-prometheus-host>`, `<worker-prometheus-port>` as below:
+
+| Key | Default | Description | Since |
+|-----------------------------------------|---------|----------------------------| ----- |
+| celeborn.master.metrics.prometheus.host | 0.0.0.0 | Master's Prometheus host. | 0.2.0 |
+| celeborn.master.metrics.prometheus.port | 9098 | Master's Prometheus port. | 0.2.0 |
+| celeborn.worker.metrics.prometheus.host | 0.0.0.0 | Worker's Prometheus host. | 0.2.0 |
+| celeborn.worker.metrics.prometheus.port | 9096 | Worker's Prometheus port. | 0.2.0 |
+
+API listed as below:
+
+| Endpoint | Service | Meaning |
+|----------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| /conf | master, worker | List the conf setting of the service. |
+| /workerInfo | master, worker | List worker information of the service. For the master, it will list all registered workers 's information. |
+| /lostWorkers | master | List all lost workers of the master. |
+| /blacklistedWorkers | master | List all blacklisted workers of the master. |
+| /threadDump | master, worker | List the current thread dump of the service. |
+| /hostnames | master | List all running application's LifecycleManager's hostnames of the cluster. |
+| /applications | master | List all running application's ids of the cluster. |
+| /shuffles | master, worker | List all running shuffle keys of the service. For master, will return all running shuffle's key of the cluster, for worker, only return keys of shuffles running in that worker. |
+| /listTopDiskUsedApps | master, worker | List the top disk usage application ids. For master, will return the top disk usage application ids for the cluster, for worker, only return application ids running in that worker. |
+| /listPartitionLocationInfo | worker | List all living PartitionLocation information in that worker. |
+| /unavailablePeers | worker | List the unavailable peers of the worker, this always means the worker connect to the peer failed. |
+| /isShutdown | worker | Show if the worker is during the process of shutdown. |
+| /isRegistered | worker | Show if the worker is registered to the master success. |
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index 89a4911..4898795 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -80,6 +80,7 @@ nav:
- Deploy: user_guide/deploy.md
- Upgrade: user_guide/upgrade.md
- Ratis Shell: user_guide/celeborn_ratis_shell.md
+ - Monitoring: user_guide/monitoring.md
- Configuration:
- configuration/index.md
- Community: