You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hugegraph.apache.org by zh...@apache.org on 2022/11/27 06:43:43 UTC

[incubator-hugegraph-doc] 01/01: Add HugeGraph-Computer Doc

This is an automated email from the ASF dual-hosted git repository.

zhaocong pushed a commit to branch hugegraph-computer-doc
in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-doc.git

commit 5bb76c06f02c1414f6032520b87c0c517f81ee04
Author: coderzc <zh...@apache.org>
AuthorDate: Sun Nov 27 14:43:07 2022 +0800

    Add HugeGraph-Computer Doc
---
 content/en/docs/quickstart/hugegraph-computer.md | 209 +++++++++++++++++++++++
 content/en/docs/quickstart/hugegraph-hubble.md   |   1 -
 content/en/docs/quickstart/hugegraph-spark.md    |   2 +-
 content/en/docs/quickstart/hugegraph-studio.md   |   2 +-
 content/en/docs/quickstart/hugegraph-tools.md    |   6 +-
 5 files changed, 214 insertions(+), 6 deletions(-)

diff --git a/content/en/docs/quickstart/hugegraph-computer.md b/content/en/docs/quickstart/hugegraph-computer.md
new file mode 100644
index 00000000..f6807d5e
--- /dev/null
+++ b/content/en/docs/quickstart/hugegraph-computer.md
@@ -0,0 +1,209 @@
+---
+title: "HugeGraph-Computer Quick Start"
+linkTitle: "Analysis with HugeGraph-Computer"
+weight: 7
+---
+
+## 1 HugeGraph-Computer Overview
+
+The hugegraph-computer is a distributed graph processing system for hugegraph. It is an implementation of [Pregel](https://kowshik.github.io/JPregel/pregel_paper.pdf). It runs on Kubernetes or YARN framework.
+
+### Features
+
+- Support distributed MPP graph computing, and integrates with HugeGraph as graph input/output storage.
+- Based on BSP(Bulk Synchronous Parallel) model, an algorithm performs computing through multiple parallel iterations, every iteration is a superstep.
+- Auto memory management. The framework will never be OOM(Out of Memory) since it will split some data to disk if it doesn't have enough memory to hold all the data.
+- The part of edges or the messages of super node can be in memory, so you will never lose it.
+- You can output the results to HDFS or HugeGraph, or any other system.
+- Easy to develop a new algorithm. You just need to focus on a vertex only processing just like as in a single server, without worrying about message transfer and memory/storage management.
+
+## 2 Get Started
+
+### 2.1 Run PageRank algorithm locally
+
+> To run algorithm with hugegraph-computer, you need to install 64-bit JRE/JDK 11 or later versions.
+>
+> You also need to deploy Hugegraph-Server and [Etcd](https://etcd.io/docs/v3.5/quickstart/).
+
+#### 2.1.1 Download the compiled archive
+
+Download the latest version of the HugeGraph-Computer release package:
+
+```bash
+wget https://github.com/apache/hugegraph-computer/releases/download/v${version}/hugegraph-loader-${version}.tar.gz
+tar zxvf hugegraph-computer-${version}.tar.gz
+```
+
+#### 2.2 Clone source code to compile and install
+
+Clone the latest version of HugeGraph-Computer source package:
+
+```bash
+$ git clone https://github.com/apache/hugegraph-computer.git
+```
+
+Compile and generate tar package:
+
+```bash
+cd hugegraph-computer
+mvn clean package -DskipTests
+```
+
+#### 2.3 Start master node
+
+```bash
+cd hugegraph-computer-${version}
+bin/start-computer.sh -d local -r master
+```
+
+#### 2.4 Start worker node
+
+```
+bin/start-computer.sh -d local -r worker
+```
+
+#### 2.5 Query algorithm results
+
+2.5.1 Enable `OLAP` index query for server
+
+If OLAP index is not enabled, it needs to be enable, more reference: [modify-graphs-read-mode](/docs/clients/restful-api/graphs/#634-modify-graphs-read-mode-this-operation-requires-administrator-privileges)
+
+```http
+PUT http://localhost:8080/graphs/hugegraph/graph_read_mode
+
+"ALL"
+```
+
+2.5.2 Query `page_rank` propertie value:
+
+```bash
+curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip
+```
+
+### 2.2 Run PageRank algorithm in Kubernetes
+
+#### 2.2.1 Install hugegraph-computer CRD
+
+```bash
+# Kubernetes version >= v1.16
+kubectl apply -f https://raw.githubusercontent.com/hugegraph/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml
+
+# Kubernetes version < v1.16
+kubectl apply -f https://raw.githubusercontent.com/hugegraph/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml
+```
+
+#### 2.2.2 Show CRD
+
+```bash
+kubectl get crd
+
+NAME                                        CREATED AT
+hugegraphcomputerjobs.hugegraph.apache.org   2021-09-16T08:01:08Z
+```
+
+#### 2.2.3 Install hugegraph-computer-operator&etcd-server
+
+```bash
+kubectl apply -f https://raw.githubusercontent.com/hugegraph/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml
+```
+
+#### 2.2.4 Wait for hugegraph-computer-operator&etcd-server deployment to complete
+
+```bash
+kubectl get pod -n hugegraph-computer-operator-system
+
+NAME                                                              READY   STATUS    RESTARTS   AGE
+hugegraph-computer-operator-controller-manager-58c5545949-jqvzl   1/1     Running   0          15h
+hugegraph-computer-operator-etcd-28lm67jxk5                       1/1     Running   0          15h
+```
+
+#### 2.2.5 Submit job
+
+```yaml
+cat <<EOF | kubectl apply --filename -
+apiVersion: hugegraph.apache.org/v1
+kind: HugeGraphComputerJob
+metadata:
+  namespace: hugegraph-computer-system
+  name: &jobName pagerank-sample
+spec:
+  jobId: *jobName
+  algorithmName: page_rank
+  image: hugegraph/hugegraph-computer:latest # algorithm image url
+  jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar # algorithm jar path
+  pullPolicy: Always
+  workerCpu: "4"
+  workerMemory: "4Gi"
+  workerInstances: 5
+  computerConf:
+    job.partitions_count: "20"
+    algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
+    hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port} # hugegraph server url
+    hugegraph.name: hugegraph
+EOF
+```
+
+#### 2.2.6 Show job
+
+```bash
+kubectl get hcjob/pagerank-sample -n hugegraph-computer-system
+
+NAME               JOBID              JOBSTATUS
+pagerank-sample    pagerank-sample    RUNNING
+```
+
+#### 2.2.7 Show log nodes
+
+```bash
+# Show the master log
+kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-system
+
+# Show the worker log
+kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-system
+
+# Show diagnostic log of a job
+# NOTE: diagnostic log exist only when the job fails, and it will only be saved for one hour.
+kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-system
+```
+
+#### 2.2.8 Show success event of a job
+
+> NOTE: it will only be saved for one hour
+
+```bash
+kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-system
+```
+
+#### 2.2.9 Query algorithm results
+
+If the output to `Hugegraph-Server` is consistent with Locally, if output to `HDFS`, please check the result file in the directory of `/hugegraph-computer/results/{jobId}` directory.
+
+### 3 Built-In algorithms document
+
+#### 3.1 Currently supported algorithms list: 
+
+##### Centrality Algorithm:
+
+* PageRank
+* BetweennessCentrality
+* ClosenessCentrality
+* DegreeCentrality
+
+##### Community Algorithm:
+
+* ClusteringCoefficient
+* Kcore
+* Lpa
+* TriangleCount
+* Wcc
+
+##### Path Algorithm:
+
+* RingsDetection
+* RingsDetectionWithFilter
+
+More please see: https://github.com/apache/hugegraph-computer/tree/master/computer-algorithm/src/main/java/com/baidu/hugegraph/computer/algorithm
+
+### 4 Algorithm development guide
+
+TODO
\ No newline at end of file
diff --git a/content/en/docs/quickstart/hugegraph-hubble.md b/content/en/docs/quickstart/hugegraph-hubble.md
index d5c46231..fcf96d29 100644
--- a/content/en/docs/quickstart/hugegraph-hubble.md
+++ b/content/en/docs/quickstart/hugegraph-hubble.md
@@ -432,4 +432,3 @@ There is no visual OLAP algorithm execution on Hubble. You can call the RESTful
 <center>
   <img src="/docs/images/images-hubble/355任务详情.png" alt="image">
 </center>
-
diff --git a/content/en/docs/quickstart/hugegraph-spark.md b/content/en/docs/quickstart/hugegraph-spark.md
index 4ca1d1af..6f3f8a3f 100644
--- a/content/en/docs/quickstart/hugegraph-spark.md
+++ b/content/en/docs/quickstart/hugegraph-spark.md
@@ -2,7 +2,7 @@
 title: "HugeGraph-Spark Quick Start"
 linkTitle: "Analysis with HugeGraph-Spark"
 draft: true
-weight: 7
+weight: 8
 ---
 
 ### 1 HugeGraph-Spark概述 (Deprecated)
diff --git a/content/en/docs/quickstart/hugegraph-studio.md b/content/en/docs/quickstart/hugegraph-studio.md
index 7d1359e5..c399bad1 100644
--- a/content/en/docs/quickstart/hugegraph-studio.md
+++ b/content/en/docs/quickstart/hugegraph-studio.md
@@ -2,7 +2,7 @@
 title: "HugeGraph-Studio Quick Start"
 linkTitle: "Display with HugeGraph-Studio"
 draft: true
-weight: 5
+weight: 9
 ---
 
 ### 1 HugeGraph-Studio概述 (Deprecated)
diff --git a/content/en/docs/quickstart/hugegraph-tools.md b/content/en/docs/quickstart/hugegraph-tools.md
index 5f860d31..8c7f1f2d 100644
--- a/content/en/docs/quickstart/hugegraph-tools.md
+++ b/content/en/docs/quickstart/hugegraph-tools.md
@@ -183,9 +183,9 @@ Usage: hugegraph [options] [command] [command options]
     - --backup-num,选填项,指定保存的最新的备份的数目,默认为 3
     - --interval,选填项,指定进行备份的周期,格式同 Linux crontab 格式
 - dump,把整张图的顶点和边全部导出,默认以`vertex vertex-edge1 vertex-edge2...`JSON格式存储。
-用户也可以自定义存储格式,只需要在`hugegraph-tools/src/main/java/com/baidu/hugegraph/formatter`
-目录下实现一个继承自`Formatter`的类,例如`CustomFormatter`,使用时指定该类为formatter即可,例如
-`bin/hugegraph dump -f CustomFormatter`
+  用户也可以自定义存储格式,只需要在`hugegraph-tools/src/main/java/com/baidu/hugegraph/formatter`
+  目录下实现一个继承自`Formatter`的类,例如`CustomFormatter`,使用时指定该类为formatter即可,例如
+  `bin/hugegraph dump -f CustomFormatter`
     - --formatter 或者 -f,指定使用的 formatter,默认为 JsonFormatter
     - --directory 或者 -d,存储 schema 或者 data 的目录,默认为当前目录
     - --log 或者 -l,指定日志目录,默认为当前目录