You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/12/05 08:51:00 UTC
[jira] [Commented] (TINKERPOP-2834) CloneVertexProgram optimization on SparkGraphComputer
[ https://issues.apache.org/jira/browse/TINKERPOP-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643168#comment-17643168 ]
ASF GitHub Bot commented on TINKERPOP-2834:
-------------------------------------------
ministat opened a new pull request, #1885:
URL: https://github.com/apache/tinkerpop/pull/1885
The current CloneVertexProgram does nothing in its execute method, and the SparkGraphComputer needs to run general VertexProgram which requires a shuffle stage, which can be removed. Here a shortcut is implemented. When I exported two big graph, the overall exporting time was improved a lot. See the following table.
```
-----------------------------
|Graph 1 |Graph 2
-----------------------------
Before fix |3.6h |22min
-----------------------------
After fix |2.4h |16min
```
Graph 1 has 15 billion vertice and 23 billion edges. Graph 2 has 130 million vertices and 650 million edges.
> CloneVertexProgram optimization on SparkGraphComputer
> -----------------------------------------------------
>
> Key: TINKERPOP-2834
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2834
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop
> Reporter: Redriver
> Priority: Major
>
> The CloneVertexProgram does nothing in its execute() method, but in SparkGraphComputer it has to process as standard GraphComputer semantics, which takes many unnecessary computation. In fact, registering a special SparkVertexProgramInterceptor with empty apply() can improve the overall performance a lot.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)