You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by an...@apache.org on 2023/03/17 04:03:23 UTC

[incubator-celeborn-website] 01/01: [CELEBORN-427] Add doc for http request REST API

This is an automated email from the ASF dual-hosted git repository.

angerszhuuuu pushed a commit to branch MONITORING-RESTFUL-API
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn-website.git

commit 7cceeb267e8e7480066e5a1cfbdd1c43d356ceba
Author: Angerszhuuuu <an...@gmail.com>
AuthorDate: Fri Mar 17 12:03:16 2023 +0800

    [CELEBORN-427] Add doc for http request REST API
---
 docs/user_guide/monitoring.md | 60 +++++++++++++++++++++++++++++++++++++++++++
 mkdocs.yml                    |  1 +
 2 files changed, 61 insertions(+)

diff --git a/docs/user_guide/monitoring.md b/docs/user_guide/monitoring.md
new file mode 100644
index 0000000..9937c51
--- /dev/null
+++ b/docs/user_guide/monitoring.md
@@ -0,0 +1,60 @@
+---
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements. See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License. You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+
+Monitoring
+===
+
+There are two ways to monitor Celeborn cluster: prometheus metrics and REST API.
+
+# Metrics
+
+# REST API
+
+In addition to viewing the metrics, Celeborn also support REST API. This gives developers
+an easy way to create new visualizations and monitoring tools for Celeborn and
+also easy for users to get the running status of the service. The REST API is available for
+both master and worker. The endpoints are mounted at `host:port`. For example,
+for the master, they would typically be accessible
+at `http://<master-prometheus-host>:<master-prometheus-port><endpoint>`, and
+for the worker, at `http://<worker-prometheus-host>:<worker-prometheus-port><endpoint>`.
+
+The configuration of `<master-prometheus-host>`, `<master-prometheus-port>`, `<worker-prometheus-host>`, `<worker-prometheus-port>` as below:
+
+| Key                                     | Default | Description                | Since |
+|-----------------------------------------|---------|----------------------------| ----- |
+| celeborn.master.metrics.prometheus.host | 0.0.0.0 | Master's Prometheus host.  | 0.2.0 |
+| celeborn.master.metrics.prometheus.port | 9098    | Master's Prometheus port.  | 0.2.0 |
+| celeborn.worker.metrics.prometheus.host | 0.0.0.0 | Worker's Prometheus host.  | 0.2.0 |
+| celeborn.worker.metrics.prometheus.port | 9096    | Worker's Prometheus port.  | 0.2.0 |
+
+API listed as below:
+
+| Endpoint                   | Service        | Meaning                                                                                                                                                                              |
+|----------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| /conf                      | master, worker | List the conf setting of the service.                                                                                                                                                |
+| /workerInfo                | master, worker | List worker information of the service. For the master, it will list all registered workers 's information.                                                                          |
+| /lostWorkers               | master         | List all lost workers of the master.                                                                                                                                                 |
+| /blacklistedWorkers        | master         | List all  blacklisted workers of the master.                                                                                                                                         |
+| /threadDump                | master, worker | List the current thread dump of the service.                                                                                                                                         |
+| /hostnames                 | master         | List all running application's LifecycleManager's hostnames of the cluster.                                                                                                          |
+| /applications              | master         | List all running application's ids of the cluster.                                                                                                                                   |
+| /shuffles                  | master, worker | List all running shuffle keys of the service. For master, will return all running shuffle's key of the cluster, for worker, only return keys of shuffles running in that worker.     |
+| /listTopDiskUsedApps       | master, worker | List the top disk usage application ids. For master, will return the top disk usage application ids for the cluster, for worker, only return application ids running in that worker. |
+| /listPartitionLocationInfo | worker         | List all living PartitionLocation information in that worker.                                                                                                                        |
+| /unavailablePeers          | worker         | List the unavailable peers of the worker, this always means the worker connect to the peer failed.                                                                                   |
+| /isShutdown                | worker         | Show if the worker is during the process of shutdown.                                                                                                                                |
+| /isRegistered              | worker         | Show if the worker is registered to the master success.                                                                                                                              |
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index 89a4911..4898795 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -80,6 +80,7 @@ nav:
       - Deploy: user_guide/deploy.md
       - Upgrade: user_guide/upgrade.md
       - Ratis Shell: user_guide/celeborn_ratis_shell.md
+      - Monitoring: user_guide/monitoring.md
   - Configuration:
       - configuration/index.md
   - Community: