You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by sp...@apache.org on 2016/06/30 17:33:42 UTC
[13/50] [abbrv] tinkerpop git commit: Merge remote-tracking branch
'origin/tp31'
Merge remote-tracking branch 'origin/tp31'
Conflicts:
docs/src/reference/implementations-hadoop.asciidoc
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/6c166325
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/6c166325
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/6c166325
Branch: refs/heads/TINKERPOP-1274
Commit: 6c1663255591748deb1fab7fd5cf9f7ab987ad08
Parents: c2d2a7d e3c5d8e
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Wed Jun 15 11:28:43 2016 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Wed Jun 15 11:28:43 2016 -0400
----------------------------------------------------------------------
docs/src/dev/provider/index.asciidoc | 67 +++++++++++++++++++
.../reference/implementations-hadoop.asciidoc | 70 +-------------------
2 files changed, 68 insertions(+), 69 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/6c166325/docs/src/dev/provider/index.asciidoc
----------------------------------------------------------------------
diff --cc docs/src/dev/provider/index.asciidoc
index 00dd770,7b876a1..90ab5c0
--- a/docs/src/dev/provider/index.asciidoc
+++ b/docs/src/dev/provider/index.asciidoc
@@@ -290,6 -288,48 +290,73 @@@ for (final MapReduce mapReduce : mapRed
<2> If there is no reduce stage, the the map-stage results are inserted into Memory as specified by the application
developer's `MapReduce.addResultToMemory()` implementation.
+ Hadoop-Gremlin Usage
+ ^^^^^^^^^^^^^^^^^^^^
+
+ Hadoop-Gremlin is centered around `InputFormats` and `OutputFormats`. If a 3rd-party graph system provider wishes to
+ leverage Hadoop-Gremlin (and its respective `GraphComputer` engines), then they need to provide, at minimum, a
+ Hadoop2 `InputFormat<NullWritable,VertexWritable>` for their graph system. If the provider wishes to persist computed
+ results back to their graph system (and not just to HDFS via a `FileOutputFormat`), then a graph system specific
+ `OutputFormat<NullWritable,VertexWritable>` must be developed as well.
+
+ Conceptually, `HadoopGraph` is a wrapper around a `Configuration` object. There is no "data" in the `HadoopGraph` as
+ the `InputFormat` specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, `HadoopGraph` is a
+ small object with little overhead. Graph system providers should realize `HadoopGraph` as the gateway to the OLAP
+ features offered by Hadoop-Gremlin. For example, a graph system specific `Graph.compute(Class<? extends GraphComputer>
+ graphComputerClass)`-method may look as follows:
+
+ [source,java]
+ ----
+ public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException {
+ try {
+ if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass))
+ return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this);
+ else
+ throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass);
+ } catch (final Exception e) {
+ throw new IllegalArgumentException(e.getMessage(),e);
+ }
+ }
+ ----
+
+ Note that the configurations for Hadoop are assumed to be in the `Graph.configuration()` object. If this is not the
+ case, then the `Configuration` provided to `HadoopGraph.open()` should be dynamically created within the
+ `compute()`-method. It is in the provided configuration that `HadoopGraph` gets the various properties which
-determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphInputFormat` and
-`gremlin.hadoop.graphOutputFormat`.
++determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphReader` and
++`gremlin.hadoop.graphWriter`.
++
++GraphFilterAware Interface
++++++++++++++++++++++++++++
++
++<<graph-filter,Graph filters>> by OLAP processors to only pull a subgraph of the full graph from the graph data source. For instance, the
++example below constructs a `GraphFilter` that will only pull the the "knows"-graph amongst people into the `GraphComputer`
++for processing.
++
++[source,java]
++----
++graph.compute().vertices(hasLabel("person")).edges(bothE("knows"))
++----
++
++If the provider has a custom `InputRDD`, they can implement `GraphFilterAware` and that graph filter will be provided to their
++`InputRDD` at load time. For providers that use an `InputFormat`, state but the graph filter can be accessed from the configuration
++as such:
++
++[source,java]
++----
++if (configuration.containsKey(Constants.GREMLIN_HADOOP_GRAPH_FILTER))
++ this.graphFilter = VertexProgramHelper.deserialize(configuration, Constants.GREMLIN_HADOOP_GRAPH_FILTER);
++----
++
++PersistResultGraphAware Interface
+++++++++++++++++++++++++++++++++++
+
-IMPORTANT: A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which
++A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which
+ determines which persistence options are available to the user. For the standard file-based `OutputFormats` provided
+ by Hadoop-Gremlin (e.g. <<gryo-io-format,`GryoOutputFormat`>>, <<graphson-io-format,`GraphSONOutputFormat`>>,
+ and <<script-io-format,`ScriptInputOutputFormat`>>) `ResultGraph.ORIGINAL` is not supported as the original graph
+ data files are not random access and are, in essence, immutable. Thus, these file-based `OutputFormats` only support
+ `ResultGraph.NEW` which creates a copy of the data specified by the `Persist` enum.
+
[[io-implementations]]
IO Implementations
^^^^^^^^^^^^^^^^^^
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/6c166325/docs/src/reference/implementations-hadoop.asciidoc
----------------------------------------------------------------------