You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celeborn.apache.org by an...@apache.org on 2023/03/17 04:03:22 UTC

[incubator-celeborn-website] branch MONITORING-RESTFUL-API created (now 7cceeb2)

This is an automated email from the ASF dual-hosted git repository.

angerszhuuuu pushed a change to branch MONITORING-RESTFUL-API
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn-website.git


      at 7cceeb2  [CELEBORN-427] Add doc for http request REST API

This branch includes the following new commits:

     new 7cceeb2  [CELEBORN-427] Add doc for http request REST API

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[incubator-celeborn-website] 01/01: [CELEBORN-427] Add doc for http request REST API

Posted by an...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

angerszhuuuu pushed a commit to branch MONITORING-RESTFUL-API
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn-website.git

commit 7cceeb267e8e7480066e5a1cfbdd1c43d356ceba
Author: Angerszhuuuu <an...@gmail.com>
AuthorDate: Fri Mar 17 12:03:16 2023 +0800

    [CELEBORN-427] Add doc for http request REST API
---
 docs/user_guide/monitoring.md | 60 +++++++++++++++++++++++++++++++++++++++++++
 mkdocs.yml                    |  1 +
 2 files changed, 61 insertions(+)

diff --git a/docs/user_guide/monitoring.md b/docs/user_guide/monitoring.md
new file mode 100644
index 0000000..9937c51
--- /dev/null
+++ b/docs/user_guide/monitoring.md
@@ -0,0 +1,60 @@
+---
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements. See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License. You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+
+Monitoring
+===
+
+There are two ways to monitor Celeborn cluster: prometheus metrics and REST API.
+
+# Metrics
+
+# REST API
+
+In addition to viewing the metrics, Celeborn also support REST API. This gives developers
+an easy way to create new visualizations and monitoring tools for Celeborn and
+also easy for users to get the running status of the service. The REST API is available for
+both master and worker. The endpoints are mounted at `host:port`. For example,
+for the master, they would typically be accessible
+at `http://<master-prometheus-host>:<master-prometheus-port><endpoint>`, and
+for the worker, at `http://<worker-prometheus-host>:<worker-prometheus-port><endpoint>`.
+
+The configuration of `<master-prometheus-host>`, `<master-prometheus-port>`, `<worker-prometheus-host>`, `<worker-prometheus-port>` as below:
+
+| Key                                     | Default | Description                | Since |
+|-----------------------------------------|---------|----------------------------| ----- |
+| celeborn.master.metrics.prometheus.host | 0.0.0.0 | Master's Prometheus host.  | 0.2.0 |
+| celeborn.master.metrics.prometheus.port | 9098    | Master's Prometheus port.  | 0.2.0 |
+| celeborn.worker.metrics.prometheus.host | 0.0.0.0 | Worker's Prometheus host.  | 0.2.0 |
+| celeborn.worker.metrics.prometheus.port | 9096    | Worker's Prometheus port.  | 0.2.0 |
+
+API listed as below:
+
+| Endpoint                   | Service        | Meaning                                                                                                                                                                              |
+|----------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| /conf                      | master, worker | List the conf setting of the service.                                                                                                                                                |
+| /workerInfo                | master, worker | List worker information of the service. For the master, it will list all registered workers 's information.                                                                          |
+| /lostWorkers               | master         | List all lost workers of the master.                                                                                                                                                 |
+| /blacklistedWorkers        | master         | List all  blacklisted workers of the master.                                                                                                                                         |
+| /threadDump                | master, worker | List the current thread dump of the service.                                                                                                                                         |
+| /hostnames                 | master         | List all running application's LifecycleManager's hostnames of the cluster.                                                                                                          |
+| /applications              | master         | List all running application's ids of the cluster.                                                                                                                                   |
+| /shuffles                  | master, worker | List all running shuffle keys of the service. For master, will return all running shuffle's key of the cluster, for worker, only return keys of shuffles running in that worker.     |
+| /listTopDiskUsedApps       | master, worker | List the top disk usage application ids. For master, will return the top disk usage application ids for the cluster, for worker, only return application ids running in that worker. |
+| /listPartitionLocationInfo | worker         | List all living PartitionLocation information in that worker.                                                                                                                        |
+| /unavailablePeers          | worker         | List the unavailable peers of the worker, this always means the worker connect to the peer failed.                                                                                   |
+| /isShutdown                | worker         | Show if the worker is during the process of shutdown.                                                                                                                                |
+| /isRegistered              | worker         | Show if the worker is registered to the master success.                                                                                                                              |
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index 89a4911..4898795 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -80,6 +80,7 @@ nav:
       - Deploy: user_guide/deploy.md
       - Upgrade: user_guide/upgrade.md
       - Ratis Shell: user_guide/celeborn_ratis_shell.md
+      - Monitoring: user_guide/monitoring.md
   - Configuration:
       - configuration/index.md
   - Community: