You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "stephen mallette (JIRA)" <ji...@apache.org> on 2015/09/02 16:11:45 UTC
[jira] [Reopened] (TINKERPOP3-319) BulkLoaderVertexProgram for
generalized batch loading across graphs
[ https://issues.apache.org/jira/browse/TINKERPOP3-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stephen mallette reopened TINKERPOP3-319:
-----------------------------------------
this issue isn't documented yet and therefore not fully complete.
> BulkLoaderVertexProgram for generalized batch loading across graphs
> -------------------------------------------------------------------
>
> Key: TINKERPOP3-319
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-319
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.0.0-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Daniel Kuppitz
> Fix For: 3.1.0-incubating, 3.0.2-incubating
>
>
> After working on {{BulkLoaderVertexProgram}} for Titan, it is trivial to add this generally to TinkerPop -- equivalent to BlueprintsOutputFormat (or whatever the bulk loader was known that was blueprints specific). However, given that Titan and TinkerPop have the same data model, Titan having its own {{BulkLoaderVertexProgram}} isn't necessary as there is no longer a data model alignment issue. The difference would be that instead of:
> {code:groovy}
> g.V.compute().program(BulkLoaderVertexProgram.build().titan(propertiesFile).create()).submit()
> {code}
> It would simply be:
> {code:groovy}
> g.V.compute().program(BulkLoaderVertexProgram.build().factory(propertiesFile).create()).submit()
> {code}
> ...and {{BulkLoaderVertexProgram}} would use {{GraphFactory.open()}} to instantiate the connection to the graph. Moreover, (and [~spmallette] will need to clear my head here), if the factory opened up a Gremlin Server connection, then we get parallel writing to embedded graph databases like Neo4j.
> {{BulkLoaderVertexProgram}} is simply a vertex program that parallel loads a graph (with a graph computer) to any other graph that can be accessed via {{GraphFactory}} (which is every TP3 graph).
> [~dalaro] @mbroecheler [~dkuppitz]
> EXTENDED NOTES:
> * {{SchemaInference}} would be a MapReduce job executed prior to {{BulkLoaderVertexProgram}}
> * Titan and Neo4j can each have their own {{SchemaInference}} implementations.
> * Incremental loading .... I forget how this worked.
> * Bulk mutations ... this is possible at the TP3 level with hidden properties and smart add/remove/etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)