You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hugegraph.apache.org by ji...@apache.org on 2023/05/15 03:31:07 UTC

[incubator-hugegraph-doc] branch master updated: Update hugegraph-benchmark-0.5.6.md (#226)

This is an automated email from the ASF dual-hosted git repository.

jin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hugegraph-doc.git


The following commit(s) were added to refs/heads/master by this push:
     new 2e5bf8c6 Update hugegraph-benchmark-0.5.6.md (#226)
2e5bf8c6 is described below

commit 2e5bf8c640d8f602c03795953c71df3164cbf4ef
Author: John Whelan <Wh...@users.noreply.github.com>
AuthorDate: Sun May 14 22:31:02 2023 -0500

    Update hugegraph-benchmark-0.5.6.md (#226)
    
    Completed conversion to English.
---
 .../docs/performance/hugegraph-benchmark-0.5.6.md  | 190 ++++++++++-----------
 1 file changed, 93 insertions(+), 97 deletions(-)

diff --git a/content/en/docs/performance/hugegraph-benchmark-0.5.6.md b/content/en/docs/performance/hugegraph-benchmark-0.5.6.md
index bb3db47c..4df9a9e7 100644
--- a/content/en/docs/performance/hugegraph-benchmark-0.5.6.md
+++ b/content/en/docs/performance/hugegraph-benchmark-0.5.6.md
@@ -4,72 +4,67 @@ linkTitle: "HugeGraph BenchMark Performance"
 weight: 1
 ---
 
-### 1 测试环境
+### 1 Test environment
 
-#### 1.1 硬件信息
+#### 1.1 Hardware information
 
 | CPU                                          | Memory | 网卡        | 磁盘        |
 |----------------------------------------------|--------|-----------|-----------|
 | 48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz | 128G   | 10000Mbps | 750GB SSD |
 
-#### 1.2 软件信息
+#### 1.2 Software information
 
-##### 1.2.1 测试用例
+##### 1.2.1 Test cases
 
-测试使用[graphdb-benchmark](https://github.com/socialsensor/graphdb-benchmarks),一个图数据库测试集。该测试集主要包含4类测试:
+Testing is done using the [graphdb-benchmark](https://github.com/socialsensor/graphdb-benchmarks), a benchmark suite for graph databases. This benchmark suite mainly consists of four types of tests:
 
-- Massive Insertion,批量插入顶点和边,一定数量的顶点或边一次性提交
-- Single Insertion,单条插入,每个顶点或者每条边立即提交
-- Query,主要是图数据库的基本查询操作:
+- Massive Insertion, which involves batch insertion of vertices and edges, with a certain number of vertices or edges being submitted at once.
+- Single Insertion, which involves the immediate insertion of each vertex or edge, one at a time.
+- Query, which mainly includes the basic query operations of the graph database:
+  - Find Neighbors, which queries the neighbors of all vertices.
+  - Find Adjacent Nodes, which queries the adjacent vertices of all edges.
+  - Find Shortest Path, which queries the shortest path from the first vertex to 100 random vertices.
+- Clustering, which is a community detection algorithm based on the Louvain Method.
 
-  - Find Neighbors,查询所有顶点的邻居
-  - Find Adjacent Nodes,查询所有边的邻接顶点
-  - Find Shortest Path,查询第一个顶点到100个随机顶点的最短路径
+##### 1.2.2 Test dataset 
 
-- Clustering,基于Louvain Method的社区发现算法
+Tests are conducted using both synthetic and real data.
 
-##### 1.2.2 测试数据集
-
-测试使用人造数据和真实数据
-
-- MIW、SIW和QW使用SNAP数据集
+- MIW, SIW, and QW use SNAP datasets:
 
   - [Enron Dataset](http://snap.stanford.edu/data/email-Enron.html)
-
   - [Amazon dataset](http://snap.stanford.edu/data/amazon0601.html)
-
   - [Youtube dataset](http://snap.stanford.edu/data/com-Youtube.html)
-
   - [LiveJournal dataset](http://snap.stanford.edu/data/com-LiveJournal.html)
 
-- CW使用[LFR-Benchmark generator](https://sites.google.com/site/andrealancichinetti/files)生成的人造数据
+- CW uses synthetic data generated by the [LFR-Benchmark generator](https://sites.google.com/site/andrealancichinetti/files).
 
-###### 本测试用到的数据集规模
+The size of the datasets used in this test are not mentioned.
 
-| 名称                      | vertex数目  | edge数目    | 文件大小   |
+| Name | Number of Vertices | Number of Edges | File Size |
 |-------------------------|-----------|-----------|--------|
 | email-enron.txt         | 36,691    | 367,661   | 4MB    |
 | com-youtube.ungraph.txt | 1,157,806 | 2,987,624 | 38.7MB |
 | amazon0601.txt          | 403,393   | 3,387,388 | 47.9MB |
 | com-lj.ungraph.txt      | 3997961   | 34681189  | 479MB  |
 
-#### 1.3 服务配置
+#### 1.3 Service configuration
 
-- HugeGraph版本:0.5.6,RestServer和Gremlin Server和backends都在同一台服务器上
+- HugeGraph version: 0.5.6, RestServer and Gremlin Server and backends are on the same server
 
-  - RocksDB版本:rocksdbjni-5.8.6
+  - RocksDB version: rocksdbjni-5.8.6
 
-- Titan版本:0.5.4, 使用thrift+Cassandra模式
+- Titan version: 0.5.4, using thrift+Cassandra mode
 
-  - Cassandra版本:cassandra-3.10,commit-log 和 data 共用SSD
+  - Cassandra version: cassandra-3.10, commit-log and data use SSD together
 
-- Neo4j版本:2.0.1
+- Neo4j version: 2.0.1
 
-> graphdb-benchmark适配的Titan版本为0.5.4
+> The Titan version adapted by graphdb-benchmark is 0.5.4.
 
-### 2 测试结果
+### 2 Test results
 
-#### 2.1 Batch插入性能
+#### 2.1 Batch insertion performance
 
 | Backend   | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
 |-----------|------------------|------------------|---------------------------|-----------------------|
@@ -77,24 +72,24 @@ weight: 1
 | Titan     | 10.15            | 108.569          | 150.266                   | 1217.944              |
 | Neo4j     | 3.884            | 18.938           | 24.890                    | 281.537               |
 
-_说明_
+_Instructions_
 
-- 表头"()"中数据是数据规模,以边为单位
-- 表中数据是批量插入的时间,单位是s
-- 例如,HugeGraph使用RocksDB插入amazon0601数据集的300w条边,花费5.711s
+- The data scale is in the table header in terms of edges
+- The data in the table is the time for batch insertion, in seconds
+- For example, HugeGraph(RocksDB) spent 5.711 seconds to insert 3 million edges of the amazon0601 dataset.
 
-##### 结论
+##### Conclusion
 
-- 批量插入性能 HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)
+- The performance of batch insertion: HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)
 
-#### 2.2 遍历性能
+#### 2.2 Traversal performance
 
-##### 2.2.1 术语说明
+##### 2.2.1 Explanation of terms
 
-- FN(Find Neighbor), 遍历所有vertex, 根据vertex查邻接edge, 通过edge和vertex查other vertex
-- FA(Find Adjacent), 遍历所有edge,根据edge获得source vertex和target vertex
+- FN(Find Neighbor): Traverse all vertices, find the adjacent edges based on each vertex, and use the edges and vertices to find the other vertices adjacent to the original vertex.
+- FA(Find Adjacent): Traverse all edges, get the source vertex and target vertex based on each edge.
 
-##### 2.2.2 FN性能
+##### 2.2.2 FN performance
 
 | Backend   | email-enron(3.6w) | amazon0601(40w) | com-youtube.ungraph(120w) | com-lj.ungraph(400w) |
 |-----------|-------------------|-----------------|---------------------------|----------------------|
@@ -102,11 +97,11 @@ _说明_
 | Titan     | 8.084             | 92.507          | 184.543                   | 1099.371             |
 | Neo4j     | 2.424             | 10.537          | 11.609                    | 106.919              |
 
-_说明_
+_Instructions_
 
-- 表头"()"中数据是数据规模,以顶点为单位
-- 表中数据是遍历顶点花费的时间,单位是s
-- 例如,HugeGraph使用RocksDB后端遍历amazon0601的所有顶点,并查找邻接边和另一顶点,总共耗时45.118s
+- The data in the table header "( )" represents the data scale, in terms of vertices.
+- The data in the table represents the time spent traversing vertices, in seconds.
+- For example, HugeGraph uses the RocksDB backend to traverse all vertices in amazon0601, and search for adjacent edges and another vertex, which takes a total of 45.118 seconds.
 
 ##### 2.2.3 FA性能
 
@@ -116,25 +111,25 @@ _说明_
 | Titan     | 7.361            | 93.344           | 169.218                   | 1085.235              |
 | Neo4j     | 1.673            | 4.775            | 4.284                     | 40.507                |
 
-_说明_
-
-- 表头"()"中数据是数据规模,以边为单位
-- 表中数据是遍历边花费的时间,单位是s
-- 例如,HugeGraph使用RocksDB后端遍历amazon0601的所有边,并查询每条边的两个顶点,总共耗时10.764s
+_Explanation_
 
-###### 结论
+- The data size in the header "( )" is based on the number of vertices.
+- The data in the table is the time it takes to traverse the vertices, in seconds.
+- For example, HugeGraph with RocksDB backend traverses all vertices in the amazon0601 dataset, and looks up adjacent edges and other vertices, taking a total of 45.118 seconds.
+- 
+###### Conclusion
 
-- 遍历性能 Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)
+- Traversal performance: Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)
 
-#### 2.3 HugeGraph-图常用分析方法性能
+#### 2.3 Performance of Common Graph Analysis Methods in HugeGraph
 
-##### 术语说明
+##### Terminology Explanation
 
-- FS(Find Shortest Path), 寻找最短路径
-- K-neighbor,从起始vertex出发,通过K跳边能够到达的所有顶点, 包括1, 2, 3...(K-1), K跳边可达vertex
-- K-out, 从起始vertex出发,恰好经过K跳out边能够到达的顶点
+- FS (Find Shortest Path): finding the shortest path between two vertices
+- K-neighbor: all vertices that can be reached by traversing K hops (including 1, 2, 3...(K-1) hops) from the starting vertex
+- K-out: all vertices that can be reached by traversing exactly K out-edges from the starting vertex.
 
-##### FS性能
+##### FS performance
 
 | Backend   | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
 |-----------|------------------|------------------|---------------------------|-----------------------|
@@ -142,64 +137,65 @@ _说明_
 | Titan     | 11.818           | 0.239            | 377.709                   | 575.678               |
 | Neo4j     | 1.719            | 1.800            | 1.956                     | 8.530                 |
 
-_说明_
+_Explanation_
 
-- 表头"()"中数据是数据规模,以边为单位
-- 表中数据是找到**从第一个顶点出发到达随机选择的100个顶点的最短路径**的时间,单位是s
-- 例如,HugeGraph使用RocksDB后端在图amazon0601中查找第一个顶点到100个随机顶点的最短路径,总共耗时0.103s
+- The data in the header "()" represents the data scale in terms of edges
+- The data in the table is the time it takes to find the shortest path **from the first vertex to 100 randomly selected vertices** in seconds
+- For example, HugeGraph using the RocksDB backend to find the shortest path from the first vertex to 100 randomly selected vertices in the amazon0601 graph took a total of 0.103s.
 
-###### 结论
+###### Conclusion
 
-- 在数据规模小或者顶点关联关系少的场景下,HugeGraph性能优于Neo4j和Titan
-- 随着数据规模增大且顶点的关联度增高,HugeGraph与Neo4j性能趋近,都远高于Titan
+- In scenarios with small data size or few vertex relationships, HugeGraph outperforms Neo4j and Titan.
+- As the data size increases and the degree of vertex association increases, the performance of HugeGraph and Neo4j tends to be similar, both far exceeding Titan.
 
-##### K-neighbor性能
+##### K-neighbor Performance
 
-顶点    | 深度 | 一度     | 二度     | 三度     | 四度     | 五度     | 六度
------ | -- | ------ | ------ | ------ | ------ | ------ | ---
-v1    | 时间 | 0.031s | 0.033s | 0.048s | 0.500s | 11.27s | OOM
-v111  | 时间 | 0.027s | 0.034s | 0.115  | 1.36s  | OOM    | --
-v1111 | 时间 | 0.039s | 0.027s | 0.052s | 0.511s | 10.96s | OOM
+Vertex | Depth | Degree 1 | Degree 2 | Degree 3 | Degree 4 | Degree 5 | Degree 6
+----- | ----- | -------- | -------- | -------- | -------- | -------- | --------
+v1    | Time  | 0.031s   | 0.033s   | 0.048s   | 0.500s   | 11.27s  | OOM
+v111  | Time  | 0.027s   | 0.034s   | 0.115s   | 1.36s    | OOM     | --
+v1111 | Time  | 0.039s   | 0.027s   | 0.052s   | 0.511s   | 10.96s  | OOM
 
-_说明_
+_Explanation_
 
-- HugeGraph-Server的JVM内存设置为32GB,数据量过大时会出现OOM
+- HugeGraph-Server's JVM memory is set to 32GB and may experience OOM when the data is too large.
 
-##### K-out性能
+##### K-out performance
 
-顶点    | 深度 | 一度     | 二度     | 三度     | 四度     | 五度        | 六度
+Vertex  | Depth | 1st Degree | 2nd Degree | 3rd Degree | 4th Degree | 5th Degree | 6th Degree
 ----- | -- | ------ | ------ | ------ | ------ | --------- | ---
-v1    | 时间 | 0.054s | 0.057s | 0.109s | 0.526s | 3.77s     | OOM
-      | 度  | 10     | 133    | 2453   | 50,830 | 1,128,688 |
-v111  | 时间 | 0.032s | 0.042s | 0.136s | 1.25s  | 20.62s    | OOM
-      | 度  | 10     | 211    | 4944   | 113150 | 2,629,970 |
-v1111 | 时间 | 0.039s | 0.045s | 0.053s | 1.10s  | 2.92s     | OOM
-      | 度  | 10     | 140    | 2555   | 50825  | 1,070,230 |
+v1    | Time | 0.054s | 0.057s | 0.109s | 0.526s | 3.77s     | OOM
+      | Degree  | 10     | 133    | 2453   | 50,830 | 1,128,688 |
+v111  | Time | 0.032s | 0.042s | 0.136s | 1.25s  | 20.62s    | OOM
+      | Degree  | 10     | 211    | 4944   | 113150 | 2,629,970 |
+v1111 | Time | 0.039s | 0.045s | 0.053s | 1.10s  | 2.92s     | OOM
+      | Degree  | 10     | 140    | 2555   | 50825  | 1,070,230 |
+
 
-_说明_
+_Explanation_
 
-- HugeGraph-Server的JVM内存设置为32GB,数据量过大时会出现OOM
+- The JVM memory of HugeGraph-Server is set to 32GB, and OOM may occur when the data is too large.
 
-###### 结论
+###### Conclusion
 
-- FS场景,HugeGraph性能优于Neo4j和Titan
-- K-neighbor和K-out场景,HugeGraph能够实现在5度范围内秒级返回结果
+- In the FS scenario, HugeGraph outperforms Neo4j and Titan in terms of performance.
+- In the K-neighbor and K-out scenarios, HugeGraph can achieve results returned within seconds within 5 degrees.
 
-#### 2.4 图综合性能测试-CW
+#### 2.4 Comprehensive Performance Test - CW
 
-| 数据库             | 规模1000 | 规模5000  | 规模10000  | 规模20000  |
+| Database        | Size 1000 | Size 5000 | Size 10000 | Size 20000 |
 |-----------------|--------|---------|----------|----------|
 | HugeGraph(core) | 20.804 | 242.099 | 744.780  | 1700.547 |
 | Titan           | 45.790 | 820.633 | 2652.235 | 9568.623 |
 | Neo4j           | 5.913  | 50.267  | 142.354  | 460.880  |
 
-_说明_
+_Explanation_
 
-- "规模"以顶点为单位
-- 表中数据是社区发现完成需要的时间,单位是s,例如HugeGraph使用RocksDB后端在规模10000的数据集,社区聚合不再变化,需要耗时744.780s
-- CW测试是CRUD的综合评估
-- 该测试中HugeGraph跟Titan一样,没有通过client,直接对core操作
+- The "scale" is based on the number of vertices.
+- The data in the table is the time required to complete community discovery, in seconds. For example, if HugeGraph uses the RocksDB backend and operates on a dataset of 10,000 vertices, and the community aggregation is no longer changing, it takes 744.780 seconds.
+- The CW test is a comprehensive evaluation of CRUD operations.
+- In this test, HugeGraph, like Titan, did not use the client and directly operated on the core.
 
-##### 结论
+##### Conclusion
 
-- 社区聚类算法性能 Neo4j > HugeGraph > Titan
+- Performance of community detection algorithm: Neo4j > HugeGraph > Titan