You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by sp...@apache.org on 2016/06/30 17:33:42 UTC

[13/50] [abbrv] tinkerpop git commit: Merge remote-tracking branch 'origin/tp31'

Merge remote-tracking branch 'origin/tp31'

Conflicts:
	docs/src/reference/implementations-hadoop.asciidoc


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/6c166325
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/6c166325
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/6c166325

Branch: refs/heads/TINKERPOP-1274
Commit: 6c1663255591748deb1fab7fd5cf9f7ab987ad08
Parents: c2d2a7d e3c5d8e
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Wed Jun 15 11:28:43 2016 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Wed Jun 15 11:28:43 2016 -0400

----------------------------------------------------------------------
 docs/src/dev/provider/index.asciidoc            | 67 +++++++++++++++++++
 .../reference/implementations-hadoop.asciidoc   | 70 +-------------------
 2 files changed, 68 insertions(+), 69 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/6c166325/docs/src/dev/provider/index.asciidoc
----------------------------------------------------------------------
diff --cc docs/src/dev/provider/index.asciidoc
index 00dd770,7b876a1..90ab5c0
--- a/docs/src/dev/provider/index.asciidoc
+++ b/docs/src/dev/provider/index.asciidoc
@@@ -290,6 -288,48 +290,73 @@@ for (final MapReduce mapReduce : mapRed
  <2> If there is no reduce stage, the the map-stage results are inserted into Memory as specified by the application
  developer's `MapReduce.addResultToMemory()` implementation.
  
+ Hadoop-Gremlin Usage
+ ^^^^^^^^^^^^^^^^^^^^
+ 
+ Hadoop-Gremlin is centered around `InputFormats` and `OutputFormats`. If a 3rd-party graph system provider wishes to
+ leverage Hadoop-Gremlin (and its respective `GraphComputer` engines), then they need to provide, at minimum, a
+ Hadoop2 `InputFormat<NullWritable,VertexWritable>` for their graph system. If the provider wishes to persist computed
+ results back to their graph system (and not just to HDFS via a `FileOutputFormat`), then a graph system specific
+ `OutputFormat<NullWritable,VertexWritable>` must be developed as well.
+ 
+ Conceptually, `HadoopGraph` is a wrapper around a `Configuration` object. There is no "data" in the `HadoopGraph` as
+ the `InputFormat` specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, `HadoopGraph` is a
+ small object with little overhead. Graph system providers should realize `HadoopGraph` as the gateway to the OLAP
+ features offered by Hadoop-Gremlin. For example, a graph system specific `Graph.compute(Class<? extends GraphComputer>
+ graphComputerClass)`-method may look as follows:
+ 
+ [source,java]
+ ----
+ public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException {
+   try {
+     if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass))
+       return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this);
+     else
+       throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass);
+   } catch (final Exception e) {
+     throw new IllegalArgumentException(e.getMessage(),e);
+   }
+ }
+ ----
+ 
+ Note that the configurations for Hadoop are assumed to be in the `Graph.configuration()` object. If this is not the
+ case, then the `Configuration` provided to `HadoopGraph.open()` should be dynamically created within the
+ `compute()`-method. It is in the provided configuration that `HadoopGraph` gets the various properties which
 -determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphInputFormat` and
 -`gremlin.hadoop.graphOutputFormat`.
++determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphReader` and
++`gremlin.hadoop.graphWriter`.
++
++GraphFilterAware Interface
++++++++++++++++++++++++++++
++
++<<graph-filter,Graph filters>> by OLAP processors to only pull a subgraph of the full graph from the graph data source. For instance, the
++example below constructs a `GraphFilter` that will only pull the the "knows"-graph amongst people into the `GraphComputer`
++for processing.
++
++[source,java]
++----
++graph.compute().vertices(hasLabel("person")).edges(bothE("knows"))
++----
++
++If the provider has a custom `InputRDD`, they can implement `GraphFilterAware` and that graph filter will be provided to their
++`InputRDD` at load time. For providers that use an `InputFormat`, state but the graph filter can be accessed from the configuration
++as such:
++
++[source,java]
++----
++if (configuration.containsKey(Constants.GREMLIN_HADOOP_GRAPH_FILTER))
++  this.graphFilter = VertexProgramHelper.deserialize(configuration, Constants.GREMLIN_HADOOP_GRAPH_FILTER);
++----
++
++PersistResultGraphAware Interface
+++++++++++++++++++++++++++++++++++
+ 
 -IMPORTANT: A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which
++A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which
+ determines which persistence options are available to the user. For the standard file-based `OutputFormats` provided
+ by Hadoop-Gremlin (e.g. <<gryo-io-format,`GryoOutputFormat`>>, <<graphson-io-format,`GraphSONOutputFormat`>>,
+ and <<script-io-format,`ScriptInputOutputFormat`>>) `ResultGraph.ORIGINAL` is not supported as the original graph
+ data files are not random access and are, in essence, immutable. Thus, these file-based `OutputFormats` only support
+ `ResultGraph.NEW` which creates a copy of the data specified by the `Persist` enum.
+ 
  [[io-implementations]]
  IO Implementations
  ^^^^^^^^^^^^^^^^^^

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/6c166325/docs/src/reference/implementations-hadoop.asciidoc
----------------------------------------------------------------------