You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Daniel Kuppitz (JIRA)" <ji...@apache.org> on 2015/10/21 14:48:27 UTC

[jira] [Created] (TINKERPOP3-904) BulkLoaderVertexProgram optimizations

Daniel Kuppitz created TINKERPOP3-904:
-----------------------------------------

             Summary: BulkLoaderVertexProgram optimizations
                 Key: TINKERPOP3-904
                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-904
             Project: TinkerPop 3
          Issue Type: Improvement
          Components: process
    Affects Versions: 3.1.0-incubating
            Reporter: Daniel Kuppitz
            Assignee: Daniel Kuppitz
             Fix For: 3.1.0-incubating


This is the continuation of https://issues.apache.org/jira/browse/TINKERPOP3-319. A few suggestion were made by [~mbroecheler] on how to optimize the current BLVP implementation. Since these changes require breaking changes, they were not implemented for 3.0.2.

{quote}
The following optimizations should be implemented to improve the performance of BLVP:
* In line 212, BLVP should get the information whether the vertex was created or retrieved. If it was created (i.e. it did not exist before) then we are guaranteed that it cannot have any vertex properties. As such, the BLVP should then just create the vertex properties without checking for their existence first - this will be significantly faster.
* Similarly, when loading edges in the second iteration, it should first compute this boolean variable {{requiresIncremental = sourceVertex.edges(OUT).hasNext() && outV.edges(OUT).hasNext()}} and then only do incremental loading on edges if this variable is true. If it is not true incremental loading (i.e. checking for edge existence) isn't necessary.

Both improvement together should dramatically improve the performance of BLVP since it will require a read per edge/vertex property only in those cases where a previous job failed. Under "normal" operational conditions it only requires one read per vertex per iteration. That is, the reads scale in O(|V|) and not O(|E|).

In addition, there should be an option for IncrementalBulkLoader so that it does not attempt to update edges and vertex properties when those already exist. In most cases, the edge will be identical when it has been loaded in a previous job (since edge and property mutations are atomic in most graph databases) and hence this check is unnecessary and being able to make it optional can save time.

Note, that these are important optimizations for large scale graph databases where bulk loading is necessary to get started.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)