You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tinkerpop.apache.org by dk...@apache.org on 2016/02/19 21:40:55 UTC

[01/19] incubator-tinkerpop git commit: Update release docs.

Repository: incubator-tinkerpop
Updated Branches:
  refs/heads/master 12a691724 -> e23c00bdd


Update release docs.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/4aeb4b28
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/4aeb4b28
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/4aeb4b28

Branch: refs/heads/master
Commit: 4aeb4b281f12e78818813577e6e6df743593ec11
Parents: fea3e31
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Tue Feb 16 07:42:13 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Tue Feb 16 07:45:59 2016 -0500

----------------------------------------------------------------------
 docs/src/dev/developer/release.asciidoc | 36 ++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/4aeb4b28/docs/src/dev/developer/release.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/dev/developer/release.asciidoc b/docs/src/dev/developer/release.asciidoc
index 30749bf..ab63d52 100644
--- a/docs/src/dev/developer/release.asciidoc
+++ b/docs/src/dev/developer/release.asciidoc
@@ -153,9 +153,9 @@ Release & Promote
 . `svn co --depth empty https://dist.apache.org/repos/dist/release/incubator/tinkerpop release; mkdir release/xx.yy.zz`
 . Copy release files from `dev/xx.yy.zz` to `release/xx.yy.zz`.
 . `cd release; svn add xx.yy.zz/; svn ci -m "TinkerPop xx.yy.zz release"`
-. If there is are releases present in SVN that represents lines of code that are no longer under development, then remove those releases. In other words, if `3.1.0-incubating` is present and `3.1.1-incubating` is released then remove `3.1.0-incubating`.  However, if `3.0.2-incubating` is present and that line of code is still under potential development, it may stay.
 . Update homepage with references to latest distribution and to other internal links elsewhere on the page.
 . Wait for Apache Central to sync the jars and src (link:http://repo1.maven.org/maven2/org/apache/tinkerpop/tinkerpop/[http://repo1.maven.org/maven2/org/apache/tinkerpop/tinkerpop/]).
+. If there is are releases present in SVN that represents lines of code that are no longer under development, then remove those releases. In other words, if `3.1.0-incubating` is present and `3.1.1-incubating` is released then remove `3.1.0-incubating`.  However, if `3.0.2-incubating` is present and that line of code is still under potential development, it may stay.
 . Announce release on `dev@`/`gremlin-users@` mailing lists and tweet from `@apachetinkerpop`
 
 Email Templates
@@ -191,10 +191,10 @@ The online docs can be found here:
 	http://tinkerpop.apache.org/javadocs/xx.yy.zz/full/ (full javadoc)
 
 The tag in Apache Git can be found here:
-	https://git-wip-us.apache.org/repos/asf?p=incubator-tinkerpop.git;...
+	https://git-wip-us.apache.org/repos/asf?p=incubator-tinkerpop.git;XXXXXXXXXXXXXXXXXX
 
 The release notes are available here:
-	https://github.com/apache/incubator-tinkerpop/blob/master/CHANGELOG.asciidoc#...
+	https://github.com/apache/incubator-tinkerpop/blob/master/CHANGELOG.asciidoc#XXXXXXXXXXXXXXXXXX
 
 The [VOTE] will be open for the next 72 hours --- closing <DayOfTheWeek> (<Month> <Day> <Year>) at <Time> <TimeZone>.
 
@@ -210,7 +210,7 @@ If the above email template is used for the "Incubator Vote", then it should inc
 ```
 Finally, the dev@tinkerpop [VOTE] thread can be found at this location:
 
-    https://pony-poc.apache.org/thread.html/...
+    https://pony-poc.apache.org/thread.html/XXXXXXXXXXXXXXXXXX
 
 Result summary: +X (Y binding, Z non-binding), 0 (0), -1 (0)
 ```
@@ -239,4 +239,30 @@ NON-BINDING VOTES:
 
 Thank you very much,
 <TinkerPop Committer Name>
-```
\ No newline at end of file
+```
+
+General Release Announcement
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Subject: TinkerPop xx.yy.zz Released: [name of release line]
+
+Hello,
+
+TinkerPop xx.yy.zz has just been released. You can find all the links and goodies on our homepage at http://tinkerpop.apache.org. However, if that is just one click too many, then the respective links are below.
+
+The release artifacts can be found at this location:
+        https://www.apache.org/dyn/closer.lua/incubator/tinkerpop/xx.yy.zz/apache-gremlin-console-xx.yy.zz-bin.zip
+        https://www.apache.org/dyn/closer.lua/incubator/tinkerpop/xx.yy.zz/apache-gremlin-server-xx.yy.zz-bin.zip
+
+The online docs can be found here (note that the URL structure has changed with this release):
+        http://tinkerpop.apache.org/docs/xx.yy.zz/reference/ (user docs)
+        http://tinkerpop.apache.org/docs/xx.yy.zz/upgrade.html#XXXXXXXXXXXXXXXXXX (upgrade docs)
+        http://tinkerpop.apache.org/docs/xx.yy.zz/tutorials/the-gremlin-console/ (gremlin console tutorial) [NEW!]
+        http://tinkerpop.apache.org/javadocs/xx.yy.zz/core/ (core javadoc)
+        http://tinkerpop.apache.org/javadocs/xx.yy.zz/full/ (full javadoc)
+
+The release notes are available here:
+        https://github.com/apache/incubator-tinkerpop/blob/xx.yy.zz/CHANGELOG.asciidoc#XXXXXXXXXXXXXXXXXX
+
+The Central Maven repo has sync'd as well:
+        https://repo1.maven.org/maven2/org/apache/tinkerpop/tinkerpop/xx.yy.zz/

[07/19] incubator-tinkerpop git commit: Added a TIP about returning empty list instead of null in gremlin console tutorial.

Posted by dk...@apache.org.

Added a TIP about returning empty list instead of null in gremlin console tutorial.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/b0dfde36
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/b0dfde36
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/b0dfde36

Branch: refs/heads/master
Commit: b0dfde366cf57211d2596630dea870d87e93de59
Parents: 888ac6a
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Wed Feb 17 07:24:08 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Wed Feb 17 07:24:08 2016 -0500

----------------------------------------------------------------------
 docs/src/tutorials/the-gremlin-console/index.asciidoc | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/b0dfde36/docs/src/tutorials/the-gremlin-console/index.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/tutorials/the-gremlin-console/index.asciidoc b/docs/src/tutorials/the-gremlin-console/index.asciidoc
index 7b749fa..f477877 100644
--- a/docs/src/tutorials/the-gremlin-console/index.asciidoc
+++ b/docs/src/tutorials/the-gremlin-console/index.asciidoc
@@ -258,6 +258,8 @@ t = g.V(1).outE().group().by(label).by(inV());null
 t.next()
 ----
 
+TIP: In addition to "returning null", you could also return an empty list as in: `t = g.V(1);[]'.
+
 image:gremlin-console-ide.png[float=left,width=300] The first line assigns the `Traversal` to `t`, but the line itself
 is actually two lines of code as denoted by the semi-colon. The line of execution actually returns `null`, which is
 what the console actual auto-iterates. At that point you can work with `t` as you desire.

[11/19] incubator-tinkerpop git commit: splitted implementations.asciidoc into implementations-hadoop.asciidoc and implementations-neo4j.asciidoc

Posted by dk...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/bbf5b3f4/docs/src/reference/implementations.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/implementations.asciidoc b/docs/src/reference/implementations.asciidoc
deleted file mode 100644
index 38c758c..0000000
--- a/docs/src/reference/implementations.asciidoc
+++ /dev/null
@@ -1,1835 +0,0 @@
-////
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements.  See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-////
-[[implementations]]
-Implementations
-===============
-
-image::gremlin-racecar.png[width=325]
-
-[[graph-system-provider-requirements]]
-Graph System Provider Requirements
-----------------------------------
-
-image:tinkerpop-enabled.png[width=140,float=left] At the core of TinkerPop3 is a Java8 API. The implementation of this
-core API and its validation via the `gremlin-test` suite is all that is required of a graph system provider wishing to
-provide a TinkerPop3-enabled graph engine. Once a graph system has a valid implementation, then all the applications
-provided by TinkerPop (e.g. Gremlin Console, Gremlin Server, etc.) and 3rd-party developers (e.g. Gremlin-Scala,
-Gremlin-JS, etc.) will integrate properly. Finally, please feel free to use the logo on the left to promote your
-TinkerPop3 implementation.
-
-Implementing Gremlin-Core
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The classes that a graph system provider should focus on implementing are itemized below. It is a good idea to study
-the <<tinkergraph-gremlin,TinkerGraph>> (in-memory OLTP and OLAP in `tinkergraph-gremlin`), <<neo4j-gremlin,Neo4jGraph>>
-(OTLP w/ transactions in `neo4j-gremlin`) and/or <<hadoop-gremlin,HadoopGraph>> (OLAP in `hadoop-gremlin`)
-implementations for ideas and patterns.
-
-. Online Transactional Processing Graph Systems (*OLTP*)
- .. Structure API: `Graph`, `Element`, `Vertex`, `Edge`, `Property` and `Transaction` (if transactions are supported).
- .. Process API: `TraversalStrategy` instances for optimizing Gremlin traversals to the provider's graph system (i.e. `TinkerGraphStepStrategy`).
-. Online Analytics Processing Graph Systems (*OLAP*)
- .. Everything required of OTLP is required of OLAP (but not vice versa).
- .. GraphComputer API: `GraphComputer`, `Messenger`, `Memory`.
-
-Please consider the following implementation notes:
-
-* Be sure your `Graph` implementation is named as `XXXGraph` (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
-* Use `StringHelper` to ensuring that the `toString()` representation of classes are consistent with other implementations.
-* Ensure that your implementation's `Features` (Graph, Vertex, etc.) are correct so that test cases handle particulars accordingly.
-* Use the numerous static method helper classes such as `ElementHelper`, `GraphComputerHelper`, `VertexProgramHelper`, etc.
-* There are a number of default methods on the provided interfaces that are semantically correct. However, if they are
-not efficient for the implementation, override them.
-* Implement the `structure/` package interfaces first and then, if desired, interfaces in the `process/` package interfaces.
-* `ComputerGraph` is a `Wrapper` system that ensure proper semantics during a GraphComputer computation.
-
-[[oltp-implementations]]
-OLTP Implementations
-^^^^^^^^^^^^^^^^^^^^
-
-image:pipes-character-1.png[width=110,float=right] The most important interfaces to implement are in the `structure/`
-package. These include interfaces like Graph, Vertex, Edge, Property, Transaction, etc. The `StructureStandardSuite`
-will ensure that the semantics of the methods implemented are correct. Moreover, there are numerous `Exceptions`
-classes with static exceptions that should be thrown by the graph system so that all the exceptions and their
-messages are consistent amongst all TinkerPop3 implementations.
-
-[[olap-implementations]]
-OLAP Implementations
-^^^^^^^^^^^^^^^^^^^^
-
-image:furnace-character-1.png[width=110,float=right] Implementing the OLAP interfaces may be a bit more complicated.
-Note that before OLAP interfaces are implemented, it is necessary for the OLTP interfaces to be, at minimal,
-implemented as specified in <<oltp-implementations,OLTP Implementations>>. A summary of each required interface
-implementation is presented below:
-
-. `GraphComputer`: A fluent builder for specifying an isolation level, a VertexProgram, and any number of MapReduce jobs to be submitted.
-. `Memory`: A global blackboard for ANDing, ORing, INCRing, and SETing values for specified keys.
-. `Messenger`: The system that collects and distributes messages being propagated by vertices executing the VertexProgram application.
-. `MapReduce.MapEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications map-phase.
-. `MapReduce.ReduceEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications combine- and reduce-phases.
-
-NOTE: The VertexProgram and MapReduce interfaces in the `process/computer/` package are not required by the graph
-system. Instead, these are interfaces to be implemented by application developers writing VertexPrograms and MapReduce jobs.
-
-IMPORTANT: TinkerPop3 provides three OLAP implementations: <<tinkergraph-gremlin,TinkerGraphComputer>> (TinkerGraph),
-<<giraphgraphcomputer,GiraphGraphComputer>> (HadoopGraph), and <<sparkgraphcomputer,`SparkGraphComputer`>> (Hadoop).
-Given the complexity of the OLAP system, it is good to study and copy many of the patterns used in these reference
-implementations.
-
-Implementing GraphComputer
-++++++++++++++++++++++++++
-
-image:furnace-character-3.png[width=150,float=right] The most complex method in GraphComputer is the `submit()`-method. The method must do the following:
-
-. Ensure the the GraphComputer has not already been executed.
-. Ensure that at least there is a VertexProgram or 1 MapReduce job.
-. If there is a VertexProgram, validate that it can execute on the GraphComputer given the respectively defined features.
-. Create the Memory to be used for the computation.
-. Execute the VertexProgram.setup() method once and only once.
-. Execute the VertexProgram.execute() method for each vertex.
-. Execute the VertexProgram.terminate() method once and if true, repeat VertexProgram.execute().
-. When VertexProgram.terminate() returns true, move to MapReduce job execution.
-. MapReduce jobs are not required to be executed in any specified order.
-. For each Vertex, execute MapReduce.map(). Then (if defined) execute MapReduce.combine() and MapReduce.reduce().
-. Update Memory with runtime information.
-. Construct a new `ComputerResult` containing the compute Graph and Memory.
-
-Implementing Memory
-+++++++++++++++++++
-
-image:gremlin-brain.png[width=175,float=left] The Memory object is initially defined by `VertexProgram.setup()`.
-The memory data is available in the first round of the `VertexProgram.execute()` method. Each Vertex, when executing
-the VertexProgram, can update the Memory in its round. However, the update is not seen by the other vertices until
-the next round. At the end of the first round, all the updates are aggregated and the new memory data is available
-on the second round. This process repeats until the VertexProgram terminates.
-
-Implementing Messenger
-++++++++++++++++++++++
-
-The Messenger object is similar to the Memory object in that a vertex can read and write to the Messenger. However,
-the data it reads are the messages sent to the vertex in the previous step and the data it writes are the messages
-that will be readable by the receiving vertices in the subsequent round.
-
-Implementing MapReduce Emitters
-+++++++++++++++++++++++++++++++
-
-image:hadoop-logo-notext.png[width=150,float=left] The MapReduce framework in TinkerPop3 is similar to the model
-popularized by link:http://apache.hadoop.org[Hadoop]. The primary difference is that all Mappers process the vertices
-of the graph, not an arbitrary key/value pair. However, the vertices' edges can not be accessed -- only their
-properties. This greatly reduces the amount of data needed to be pushed through the MapReduce engine as any edge
-information required, can be computed in the VertexProgram.execute() method. Moreover, at this stage, vertices can
-not be mutated, only their token and property data read. A Gremlin OLAP system needs to provide implementations for
-to particular classes: `MapReduce.MapEmitter` and `MapReduce.ReduceEmitter`. TinkerGraph's implementation is provided
-below which demonstrates the simplicity of the algorithm (especially when the data is all within the same JVM).
-
-[source,java]
-----
-public class TinkerMapEmitter<K, V> implements MapReduce.MapEmitter<K, V> {
-
-    public Map<K, Queue<V>> reduceMap;
-    public Queue<KeyValue<K, V>> mapQueue;
-    private final boolean doReduce;
-
-    public TinkerMapEmitter(final boolean doReduce) { <1>
-        this.doReduce = doReduce;
-        if (this.doReduce)
-            this.reduceMap = new ConcurrentHashMap<>();
-        else
-            this.mapQueue = new ConcurrentLinkedQueue<>();
-    }
-
-    @Override
-    public void emit(K key, V value) {
-        if (this.doReduce)
-            this.reduceMap.computeIfAbsent(key, k -> new ConcurrentLinkedQueue<>()).add(value); <2>
-        else
-            this.mapQueue.add(new KeyValue<>(key, value)); <3>
-    }
-
-    protected void complete(final MapReduce<K, V, ?, ?, ?> mapReduce) {
-        if (!this.doReduce && mapReduce.getMapKeySort().isPresent()) { <4>
-            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
-            final List<KeyValue<K, V>> list = new ArrayList<>(this.mapQueue);
-            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
-            this.mapQueue.clear();
-            this.mapQueue.addAll(list);
-        } else if (mapReduce.getMapKeySort().isPresent()) {
-            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
-            final List<Map.Entry<K, Queue<V>>> list = new ArrayList<>();
-            list.addAll(this.reduceMap.entrySet());
-            Collections.sort(list, Comparator.comparing(Map.Entry::getKey, comparator));
-            this.reduceMap = new LinkedHashMap<>();
-            list.forEach(entry -> this.reduceMap.put(entry.getKey(), entry.getValue()));
-        }
-    }
-}
-----
-
-<1> If the MapReduce job has a reduce, then use one data structure (`reduceMap`), else use another (`mapList`). The
-difference being that a reduction requires a grouping by key and therefore, the `Map<K,Queue<V>>` definition. If no
-reduction/grouping is required, then a simple `Queue<KeyValue<K,V>>` can be leveraged.
-<2> If reduce is to follow, then increment the Map with a new value for the key. `MapHelper` is a TinkerPop3 class
-with static methods for adding data to a Map.
-<3> If no reduce is to follow, then simply append a KeyValue to the queue.
-<4> When the map phase is complete, any map-result sorting required can be executed at this point.
-
-[source,java]
-----
-public class TinkerReduceEmitter<OK, OV> implements MapReduce.ReduceEmitter<OK, OV> {
-
-    protected Queue<KeyValue<OK, OV>> reduceQueue = new ConcurrentLinkedQueue<>();
-
-    @Override
-    public void emit(final OK key, final OV value) {
-        this.reduceQueue.add(new KeyValue<>(key, value));
-    }
-
-    protected void complete(final MapReduce<?, ?, OK, OV, ?> mapReduce) {
-        if (mapReduce.getReduceKeySort().isPresent()) {
-            final Comparator<OK> comparator = mapReduce.getReduceKeySort().get();
-            final List<KeyValue<OK, OV>> list = new ArrayList<>(this.reduceQueue);
-            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
-            this.reduceQueue.clear();
-            this.reduceQueue.addAll(list);
-        }
-    }
-}
-----
-
-The method `MapReduce.reduce()` is defined as:
-
-[source,java]
-public void reduce(final OK key, final Iterator<OV> values, final ReduceEmitter<OK, OV> emitter) { ... }
-
-In other words, for the TinkerGraph implementation, iterate through the entrySet of the `reduceMap` and call the
-`reduce()` method on each entry. The `reduce()` method can emit key/value pairs which are simply aggregated into a
-`Queue<KeyValue<OK,OV>>` in an analogous fashion to `TinkerMapEmitter` when no reduce is to follow. These two emitters
-are tied together in `TinkerGraphComputer.submit()`.
-
-[source,java]
-----
-...
-for (final MapReduce mapReduce : mapReducers) {
-    if (mapReduce.doStage(MapReduce.Stage.MAP)) {
-        final TinkerMapEmitter<?, ?> mapEmitter = new TinkerMapEmitter<>(mapReduce.doStage(MapReduce.Stage.REDUCE));
-        final SynchronizedIterator<Vertex> vertices = new SynchronizedIterator<>(this.graph.vertices());
-        workers.setMapReduce(mapReduce);
-        workers.mapReduceWorkerStart(MapReduce.Stage.MAP);
-        workers.executeMapReduce(workerMapReduce -> {
-            while (true) {
-                final Vertex vertex = vertices.next();
-                if (null == vertex) return;
-                workerMapReduce.map(ComputerGraph.mapReduce(vertex), mapEmitter);
-            }
-        });
-        workers.mapReduceWorkerEnd(MapReduce.Stage.MAP);
-
-        // sort results if a map output sort is defined
-        mapEmitter.complete(mapReduce);
-
-        // no need to run combiners as this is single machine
-        if (mapReduce.doStage(MapReduce.Stage.REDUCE)) {
-            final TinkerReduceEmitter<?, ?> reduceEmitter = new TinkerReduceEmitter<>();
-            final SynchronizedIterator<Map.Entry<?, Queue<?>>> keyValues = new SynchronizedIterator((Iterator) mapEmitter.reduceMap.entrySet().iterator());
-            workers.mapReduceWorkerStart(MapReduce.Stage.REDUCE);
-            workers.executeMapReduce(workerMapReduce -> {
-                while (true) {
-                    final Map.Entry<?, Queue<?>> entry = keyValues.next();
-                    if (null == entry) return;
-                        workerMapReduce.reduce(entry.getKey(), entry.getValue().iterator(), reduceEmitter);
-                    }
-                });
-            workers.mapReduceWorkerEnd(MapReduce.Stage.REDUCE);
-            reduceEmitter.complete(mapReduce); // sort results if a reduce output sort is defined
-            mapReduce.addResultToMemory(this.memory, reduceEmitter.reduceQueue.iterator()); <1>
-        } else {
-            mapReduce.addResultToMemory(this.memory, mapEmitter.mapQueue.iterator()); <2>
-        }
-    }
-}
-...
-----
-
-<1> Note that the final results of the reducer are provided to the Memory as specified by the application developer's
-`MapReduce.addResultToMemory()` implementation.
-<2> If there is no reduce stage, the the map-stage results are inserted into Memory as specified by the application
-developer's `MapReduce.addResultToMemory()` implementation.
-
-[[io-implementations]]
-IO Implementations
-^^^^^^^^^^^^^^^^^^
-
-If a `Graph` requires custom serializers for IO to work properly, implement the `Graph.io` method.  A typical example
-of where a `Graph` would require such a custom serializers is if their identifier system uses non-primitive values,
-such as OrientDB's `Rid` class.  From basic serialization of a single `Vertex` all the way up the stack to Gremlin
-Server, the need to know how to handle these complex identifiers is an important requirement.
-
-The first step to implementing custom serializers is to first implement the `IoRegistry` interface and register the
-custom classes and serializers to it. Each `Io` implementation has different requirements for what it expects from the
-`IoRegistry`:
-
-* *GraphML* - No custom serializers expected/allowed.
-* *GraphSON* - Register a Jackson `SimpleModule`.  The `SimpleModule` encapsulates specific classes to be serialized,
-so it does not need to be registered to a specific class in the `IoRegistry` (use `null`).
-* *Gryo* - Expects registration of one of three objects:
-** Register just the custom class with a `null` Kryo `Serializer` implementation - this class will use default "field-level" Kryo serialization.
-** Register the custom class with a specific Kryo `Serializer' implementation.
-** Register the custom class with a `Function<Kryo, Serializer>` for those cases where the Kryo `Serializer` requires the `Kryo` instance to get constructed.
-
-This implementation should provide a zero-arg constructor as the stack may require instantiation via reflection.
-Consider extending `AbstractIoRegistry` for convenience as follows:
-
-[source,java]
-----
-public class MyGraphIoRegistry extends AbstractIoRegistry {
-    public MyGraphIoRegistry() {
-        register(GraphSONIo.class, null, new MyGraphSimpleModule());
-        register(GryoIo.class, MyGraphIdClass.class, new MyGraphIdSerializer());
-    }
-}
-----
-
-In the `Graph.io` method, provide the `IoRegistry` object to the supplied `Builder` and call the `create` method to
-return that `Io` instance as follows:
-
-[source,java]
-----
-public <I extends Io> I io(final Io.Builder<I> builder) {
-    return (I) builder.graph(this).registry(myGraphIoRegistry).create();
-}}
-----
-
-In this way, `Graph` implementations can pre-configure custom serializers for IO interactions and users will not need
-to know about those details. Following this pattern will ensure proper execution of the test suite as well as
-simplified usage for end-users.
-
-IMPORTANT: Proper implementation of IO is critical to successful `Graph` operations in Gremlin Server.  The Test Suite
-does have "serialization" tests that provide some assurance that an implementation is working properly, but those
-tests cannot make assertions against any specifics of a custom serializer.  It is the responsibility of the
-implementer to test the specifics of their custom serializers.
-
-TIP: Consider separating serializer code into its own module, if possible, so that clients that use the `Graph`
-implementation remotely don't need a full dependency on the entire `Graph` - just the IO components and related
-classes being serialized.
-
-[[validating-with-gremlin-test]]
-Validating with Gremlin-Test
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-image:gremlin-edumacated.png[width=225]
-
-[source,xml]
-<dependency>
-  <groupId>org.apache.tinkerpop</groupId>
-  <artifactId>gremlin-test</artifactId>
-  <version>x.y.z</version>
-</dependency>
-<dependency>
-  <groupId>org.apache.tinkerpop</groupId>
-  <artifactId>gremlin-groovy-test</artifactId>
-  <version>x.y.z</version>
-</dependency>
-
-The operational semantics of any OLTP or OLAP implementation are validated by `gremlin-test` and functional
-interoperability with the Groovy environment is ensured by `gremlin-groovy-test`. To implement these tests, provide
-test case implementations as shown below, where `XXX` below denotes the name of the graph implementation (e.g.
-TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
-
-[source,java]
-----
-// Structure API tests
-@RunWith(StructureStandardSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
-public class XXXStructureStandardTest {}
-
-// Process API tests
-@RunWith(ProcessComputerSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
-public class XXXProcessComputerTest {}
-
-@RunWith(ProcessStandardSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
-public class XXXProcessStandardTest {}
-
-@RunWith(GroovyEnvironmentSuite.class)
-@GraphProviderClass(provider = XXXProvider.class, graph = TinkerGraph.class)
-public class XXXGroovyEnvironmentTest {}
-
-@RunWith(GroovyProcessStandardSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = TinkerGraph.class)
-public class XXXGroovyProcessStandardTest {}
-
-@RunWith(GroovyProcessComputerSuite.class)
-@GraphProviderClass(provider = XXXGraphComputerProvider.class, graph = TinkerGraph.class)
-public class XXXGroovyProcessComputerTest {}
-----
-
-The above set of tests represent the minimum test suite set to implement.  There are other "integration" and
-"performance" tests that should be considered optional.  Implementing those tests requires the same pattern as shown above.
-
-IMPORTANT: It is as important to look at "ignored" tests as it is to look at ones that fail.  The `gremlin-test`
-suite utilizes the `Feature` implementation exposed by the `Graph` to determine which tests to execute.  If a test
-utilizes features that are not supported by the graph, it will ignore them.  While that may be fine, implementers
-should validate that the ignored tests are appropriately bypassed and that there are no mistakes in their feature
-definitions.  Moreover, implementers should consider filling gaps in their own test suites, especially when
-IO-related tests are being ignored.
-
-The only test-class that requires any code investment is the `GraphProvider` implementation class. This class is a
-used by the test suite to construct `Graph` configurations and instances and provides information about the
-implementation itself.  In most cases, it is best to simply extend `AbstractGraphProvider` as it provides many
-default implementations of the `GraphProvider` interface.
-
-Finally, specify the test suites that will be supported by the `Graph` implementation using the `@Graph.OptIn`
-annotation.  See the `TinkerGraph` implementation below as an example:
-
-[source,java]
-----
-@Graph.OptIn(Graph.OptIn.SUITE_STRUCTURE_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
-@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_COMPUTER)
-@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_ENVIRONMENT)
-public class TinkerGraph implements Graph {
-----
-
-Only include annotations for the suites the implementation will support.  Note that implementing the suite, but
-not specifying the appropriate annotation will prevent the suite from running (an obvious error message will appear
-in this case when running the mis-configured suite).
-
-There are times when there may be a specific test in the suite that the implementation cannot support (despite the
-features it implements) or should not otherwise be executed.  It is possible for implementers to "opt-out" of a test
-by using the `@Graph.OptOut` annotation.  The following is an example of this annotation usage as taken from
-`HadoopGraph`:
-
-[source, java]
-----
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
-        method = "g_V_matchXa_hasXname_GarciaX__a_inXwrittenByX_b__a_inXsungByX_bX",
-        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
-        method = "g_V_matchXa_inXsungByX_b__a_inXsungByX_c__b_outXwrittenByX_d__c_outXwrittenByX_e__d_hasXname_George_HarisonX__e_hasXname_Bob_MarleyXX",
-        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
-        method = "shouldNotAllowBadMemoryKeys",
-        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
-        method = "shouldRequireRegisteringMemoryKeys",
-        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
-public class HadoopGraph implements Graph {
-----
-
-The above examples show how to ignore individual tests.  It is also possible to:
-
-* Ignore an entire test case (i.e. all the methods within the test) by setting the `method` to "*".
-* Ignore a "base" test class such that test that extend from those classes will all be ignored.  This style of
-ignoring is useful for Gremlin "process" tests that have bases classes that are extended by various Gremlin flavors (e.g. groovy).
-* Ignore a `GraphComputer` test based on the type of `GraphComputer` being used.  Specify the "computer" attribute on
-the `OptOut` (which is an array specification) which should have a value of the `GraphComputer` implementation class
-that should ignore that test. This attribute should be left empty for "standard" execution and by default all
-`GraphComputer` implementations will be included in the `OptOut` so if there are multiple implementations, explicitly
-specify the ones that should be excluded.
-
-Also note that some of the tests in the Gremlin Test Suite are parameterized tests and require an additional level of
-specificity to be properly ignored.  To ignore these types of tests, examine the name template of the parameterized
-tests.  It is defined by a Java annotation that looks like this:
-
-[source, java]
-@Parameterized.Parameters(name = "expect({0})")
-
-The annotation above shows that the name of each parameterized test will be prefixed with "expect" and have
-parentheses wrapped around the first parameter (at index 0) value supplied to each test.  This information can
-only be garnered by studying the test set up itself.  Once the pattern is determined and the specific unique name of
-the parameterized test is identified, add it to the `specific` property on the `OptOut` annotation in addition to
-the other arguments.
-
-These annotations help provide users a level of transparency into test suite compliance (via the
-xref:describe-graph[describeGraph()] utility function). It also allows implementers to have a lot of flexibility in
-terms of how they wish to support TinkerPop.  For example, maybe there is a single test case that prevents an
-implementer from claiming support of a `Feature`.  The implementer could choose to either not support the `Feature`
-or to support it but "opt-out" of the test with a "reason" as to why so that users understand the limitation.
-
-IMPORTANT: Before using `OptOut` be sure that the reason for using it is sound and it is more of a last resort.
-It is possible that a test from the suite doesn't properly represent the expectations of a feature, is too broad or
-narrow for the semantics it is trying to enforce or simply contains a bug.  Please consider raising issues in the
-developer mailing list with such concerns before assuming `OptOut` is the only answer.
-
-IMPORTANT: There are no tests that specifically validate complete compliance with Gremlin Server.  Generally speaking,
-a `Graph` that passes the full Test Suite, should be compliant with Gremlin Server.  The one area where problems can
-occur is in serialization.  Always ensure that IO is properly implemented, that custom serializers are tested fully
-and ultimately integration test the `Graph` with an actual Gremlin Server instance.
-
-CAUTION: Configuring tests to run in parallel might result in errors that are difficult to debug as there is some
-shared state in test execution around graph configuration.  It is therefore recommended that parallelism be turned
-off for the test suite (the Maven SureFire Plugin is configured this way by default).  It may also be important to
-include this setting, `<reuseForks>false</reuseForks>`, in the SureFire configuration if tests are failing in an
-unexplainable way.
-
-Accessibility via GremlinPlugin
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-image:gremlin-plugin.png[width=100,float=left] The applications distributed with TinkerPop3 do not distribute with
-any graph system implementations besides TinkerGraph. If your implementation is stored in a Maven repository (e.g.
-Maven Central Repository), then it is best to provide a `GremlinPlugin` implementation so the respective jars can be
-downloaded according and when required by the user. Neo4j's GremlinPlugin is provided below for reference.
-
-[source,java]
-----
-public class Neo4jGremlinPlugin implements GremlinPlugin {
-
-    private static final String IMPORT = "import ";
-    private static final String DOT_STAR = ".*";
-
-    private static final Set<String> IMPORTS = new HashSet<String>() {{
-        add(IMPORT + Neo4jGraph.class.getPackage().getName() + DOT_STAR);
-    }};
-
-    @Override
-    public String getName() {
-        return "neo4j";
-    }
-
-    @Override
-    public void pluginTo(final PluginAcceptor pluginAcceptor) {
-        pluginAcceptor.addImports(IMPORTS);
-    }
-}
----- 
-
-With the above plugin implementations, users can now download respective binaries for Gremlin Console, Gremlin Server, etc.
-
-[source,groovy]
-gremlin> g = Neo4jGraph.open('/tmp/neo4j')
-No such property: Neo4jGraph for class: groovysh_evaluate
-Display stack trace? [yN]
-gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z
-==>loaded: [org.apache.tinkerpop, neo4j-gremlin, …]
-gremlin> :plugin use tinkerpop.neo4j
-==>tinkerpop.neo4j activated
-gremlin> g = Neo4jGraph.open('/tmp/neo4j')
-==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
-
-In-Depth Implementations
-~~~~~~~~~~~~~~~~~~~~~~~~
-
-image:gremlin-painting.png[width=200,float=right] The graph system implementation details presented thus far are
-minimum requirements necessary to yield a valid TinkerPop3 implementation. However, there are other areas that a
-graph system provider can tweak to provide an implementation more optimized for their underlying graph engine. Typical
-areas of focus include:
-
-* Traversal Strategies: A <<traversalstrategy,TraversalStrategy>> can be used to alter a traversal prior to its
-execution. A typical example is converting a pattern of `g.V().has('name','marko')` into a global index lookup for
-all vertices with name "marko". In this way, a `O(|V|)` lookup becomes an `O(log(|V|))`. Please review
-`TinkerGraphStepStrategy` for ideas.
-* Step Implementations: Every <<graph-traversal-steps,step>> is ultimately referenced by the `GraphTraversal`
-interface. It is possible to extend `GraphTraversal` to use a graph system specific step implementation.
-
-
-[[tinkergraph-gremlin]]
-TinkerGraph-Gremlin
--------------------
-
-[source,xml]
-----
-<dependency>
-   <groupId>org.apache.tinkerpop</groupId>
-   <artifactId>tinkergraph-gremlin</artifactId>
-   <version>x.y.z</version>
-</dependency>
-----
-
-image:tinkerpop-character.png[width=100,float=left] TinkerGraph is a single machine, in-memory (with optional
-persistence), non-transactional graph engine that provides both OLTP and OLAP functionality. It is deployed with
-TinkerPop3 and serves as the reference implementation for other providers to study in order to understand the
-semantics of the various methods of the TinkerPop3 API. Constructing a simple graph in Java8 is presented below.
-
-[source,java]
-Graph g = TinkerGraph.open();
-Vertex marko = g.addVertex("name","marko","age",29);
-Vertex lop = g.addVertex("name","lop","lang","java");
-marko.addEdge("created",lop,"weight",0.6d);
-
-The above graph creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6
-property. Next, the graph can be queried as such.
-
-[source,java]
-g.V().has("name","marko").out("created").values("name")
-
-The `g.V().has("name","marko")` part of the query can be executed in two ways.
-
- * A linear scan of all vertices filtering out those vertices that don't have the name "marko"
- * A `O(log(|V|))` index lookup for all vertices with the name "marko"
-
-Given the initial graph construction in the first code block, no index was defined and thus, a linear scan is executed.
-However, if the graph was constructed as such, then an index lookup would be used.
-
-[source,java]
-Graph g = TinkerGraph.open();
-g.createIndex("name",Vertex.class)
-
-The execution times for a vertex lookup by property is provided below for both no-index and indexed version of
-TinkerGraph over the Grateful Dead graph.
-
-[gremlin-groovy]
-----
-graph = TinkerGraph.open()
-g = graph.traversal()
-graph.io(graphml()).readGraph('data/grateful-dead.xml')
-clock(1000) {g.V().has('name','Garcia').iterate()} <1>
-graph = TinkerGraph.open()
-g = graph.traversal()
-graph.createIndex('name',Vertex.class)
-graph.io(graphml()).readGraph('data/grateful-dead.xml')
-clock(1000){g.V().has('name','Garcia').iterate()} <2>
-----
-
-<1> Determine the average runtime of 1000 vertex lookups when no `name`-index is defined.
-<2> Determine the average runtime of 1000 vertex lookups when a `name`-index is defined.
-
-IMPORTANT: Each graph system will have different mechanism by which indices and schemas are defined. TinkerPop3
-does not require any conformance in this area. In TinkerGraph, the only definitions are around indices. With other
-graph systems, property value types, indices, edge labels, etc. may be required to be defined _a priori_ to adding
-data to the graph.
-
-NOTE: TinkerGraph is distributed with Gremlin Server and is therefore automatically available to it for configuration.
-
-Configuration
-~~~~~~~~~~~~~
-
-TinkerGraph has several settings that can be provided on creation via `Configuration` object:
-
-[width="100%",cols="2,10",options="header"]
-|=========================================================
-|Property |Description
-|gremlin.graph |`org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph`
-|gremlin.tinkergraph.vertexIdManager |The `IdManager` implementation to use for vertices.
-|gremlin.tinkergraph.edgeIdManager |The `IdManager` implementation to use for edges.
-|gremlin.tinkergraph.vertexPropertyIdManager |The `IdManager` implementation to use for vertex properties.
-|gremlin.tinkergraph.defaultVertexPropertyCardinality |The default `VertexProperty.Cardinality` to use when `Vertex.property(k,v)` is called.
-|gremlin.tinkergraph.graphLocation |The path and file name for where TinkerGraph should persist the graph data. If a
-value is specified here, the the `gremlin.tinkergraph.graphFormat` should also be specified.  If this value is not
-included (default), then the graph will stay in-memory and not be loaded/persisted to disk.
-|gremlin.tinkergraph.graphFormat |The format to use to serialize the graph which may be one of the following:
-`graphml`, `graphson`, `gryo`, or a fully qualified class name that implements Io.Builder interface (which allows for
-external third party graph reader/writer formats to be used for persistence).
-If a value is specified here, then the `gremlin.tinkergraph.graphLocation` should
-also be specified.  If this value is not included (default), then the graph will stay in-memory and not be
-loaded/persisted to disk.
-|=========================================================
-
-The `IdManager` settings above refer to how TinkerGraph will control identifiers for vertices, edges and vertex
-properties.  There are several options for each of these settings: `ANY`, `LONG`, `INTEGER`, `UUID`, or the fully
-qualified class name of an `IdManager` implementation on the classpath.  When not specified, the default values
-for all settings is `ANY`, meaning that the graph will work with any object on the JVM as the identifier and will
-generate new identifiers from `Long` when the identifier is not user supplied.  TinkerGraph will also expect the
-user to understand the types used for identifiers when querying, meaning that `g.V(1)` and `g.V(1L)` could return
-two different vertices.  `LONG`, `INTEGER` and `UUID` settings will try to coerce identifier values to the expected
-type as well as generate new identifiers with that specified type.
-
-If the TinkerGraph is configured for persistence with `gremlin.tinkergraph.graphLocation` and
-`gremlin.tinkergraph.graphFormat`, then the graph will be written to the specified location with the specified
-format when `Graph.close()` is called.  In addition, if these settings are present, TinkerGraph will attempt to
-load the graph from the specified location.
-
-IMPORTANT: If choosing `graphson` as the `gremlin.tinkergraph.graphFormat`, be sure to also establish the  various
-`IdManager` settings as well to ensure that identifiers are properly coerced to the appropriate types as GraphSON
-can lose the identifier's type during serialization (i.e. it will assume `Integer` when the default for TinkerGraph
-is `Long`, which could lead to load errors that result in a message like, "Vertex with id already exists").
-
-It is important to consider the data being imported to TinkerGraph with respect to `defaultVertexPropertyCardinality`
-setting.  For example, if a `.gryo` file is known to contain multi-property data, be sure to set the default
-cardinality to `list` or else the data will import as `single`.  Consider the following:
-
-[gremlin-groovy]
-----
-graph = TinkerGraph.open()
-graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
-g = graph.traversal()
-g.V().properties()
-conf = new BaseConfiguration()
-conf.setProperty("gremlin.tinkergraph.defaultVertexPropertyCardinality","list")
-graph = TinkerGraph.open(conf)
-graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
-g = graph.traversal()
-g.V().properties()
-----
-
-[[neo4j-gremlin]]
-Neo4j-Gremlin
--------------
-
-[source,xml]
-----
-<dependency>
-   <groupId>org.apache.tinkerpop</groupId>
-   <artifactId>neo4j-gremlin</artifactId>
-   <version>x.y.z</version>
-</dependency>
-<!-- neo4j-tinkerpop-api-impl is NOT Apache 2 licensed - more information below -->
-<dependency>
-  <groupId>org.neo4j</groupId>
-  <artifactId>neo4j-tinkerpop-api-impl</artifactId>
-  <version>0.1-2.2</version>
-</dependency>
-----
-
-link:http://neotechnology.com[Neo Technology] are the developers of the OLTP-based link:http://neo4j.org[Neo4j graph database].
-
-CAUTION: Unless under a commercial agreement with Neo Technology, Neo4j is licensed
-link:http://en.wikipedia.org/wiki/Affero_General_Public_License[AGPL]. The `neo4j-gremlin` module is licensed Apache2
-because it only references the Apache2-licensed Neo4j API (not its implementation). Note that neither the
-<<gremlin-console,Gremlin Console>> nor <<gremlin-server,Gremlin Server>> distribute with the Neo4j implementation
-binaries. To access the binaries, use the `:install` command to download binaries from
-link:http://search.maven.org/[Maven Central Repository].
-
-[source,groovy]
-----
-gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z
-==>Loaded: [org.apache.tinkerpop, neo4j-gremlin, x.y.z] - restart the console to use [tinkerpop.neo4j]
-gremlin> :q
-...
-gremlin> :plugin use tinkerpop.neo4j
-==>tinkerpop.neo4j activated
-gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
-==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
-----
-
-NOTE: Neo4j link:http://docs.neo4j.org/chunked/stable/ha.html[High Availability] is currently not supported by
-Neo4j-Gremlin.
-
-TIP: To host Neo4j in <<gremlin-server,Gremlin Server>>, the dependencies must first be "installed" or otherwise
-copied to the Gremlin Server path. The automated method for doing this would be to execute
-`bin/gremlin-server.sh -i org.apache.tinkerpop neo4j-gremlin x.y.z`.
-
-Indices
-~~~~~~~
-
-Neo4j 2.x indices leverage vertex labels to partition the index space. TinkerPop3 does not provide method interfaces
-for defining schemas/indices for the underlying graph system. Thus, in order to create indices, it is important to
-call the Neo4j API directly.
-
-NOTE: `Neo4jGraphStep` will attempt to discern which indices to use when executing a traversal of the form `g.V().has()`.
-
-The Gremlin-Console session below demonstrates Neo4j indices. For more information, please refer to the Neo4j documentation:
-
-* Manipulating indices with link:http://docs.neo4j.org/chunked/stable/query-schema-index.html[Cypher].
-* Manipulating indices with the Neo4j link:http://docs.neo4j.org/chunked/stable/tutorials-java-embedded-new-index.html[Java API].
-
-[gremlin-groovy]
-----
-graph = Neo4jGraph.open('/tmp/neo4j')
-graph.cypher("CREATE INDEX ON :person(name)")
-graph.tx().commit()  <1>
-graph.addVertex(label,'person','name','marko')
-graph.addVertex(label,'dog','name','puppy')
-g = graph.traversal()
-g.V().hasLabel('person').has('name','marko').values('name')
-graph.close()
-----
-
-<1> Schema mutations must happen in a different transaction than graph mutations
-
-Below demonstrates the runtime benefits of indices and demonstrates how if there is no defined index (only vertex
-labels), a linear scan of the vertex-label partition is still faster than a linear scan of all vertices.
-
-[gremlin-groovy]
-----
-graph = Neo4jGraph.open('/tmp/neo4j')
-graph.io(graphml()).readGraph('data/grateful-dead.xml')
-g = graph.traversal()
-g.tx().commit()
-clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()}  <1>
-graph.cypher("CREATE INDEX ON :artist(name)") <2>
-g.tx().commit()
-Thread.sleep(5000) <3>
-clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} <4>
-clock(1000) {g.V().has('name','Garcia').iterate()} <5>
-graph.cypher("DROP INDEX ON :artist(name)") <6>
-g.tx().commit()
-graph.close()
-----
-
-<1> Find all artists whose name is Garcia which does a linear scan of the artist vertex-label partition.
-<2> Create an index for all artist vertices on their name property.
-<3> Neo4j indices are eventually consistent so this stalls to give the index time to populate itself.
-<4> Find all artists whose name is Garcia which uses the pre-defined schema index.
-<5> Find all vertices whose name is Garcia which requires a linear scan of all the data in the graph.
-<6> Drop the created index.
-
-Multi/Meta-Properties
-~~~~~~~~~~~~~~~~~~~~~
-
-`Neo4jGraph` supports both multi- and meta-properties (see <<_vertex_properties,vertex properties>>). These features
-are not native to Neo4j and are implemented using "hidden" Neo4j nodes. For example, when a vertex has multiple
-"name" properties, each property is a new node (multi-properties) which can have properties attached to it
-(meta-properties). As such, the native, underlying representation may become difficult to query directly using
-another graph language such as <<_cypher,Cypher>>. The default setting is to disable multi- and meta-properties.
-However, if this feature is desired, then it can be activated via `gremlin.neo4j.metaProperties` and
-`gremlin.neo4j.multiProperties` configurations being set to `true`. Once the configuration is set, it can not be
-changed for the lifetime of the graph.
-
-[gremlin-groovy]
-----
-conf = new BaseConfiguration()
-conf.setProperty('gremlin.neo4j.directory','/tmp/neo4j')
-conf.setProperty('gremlin.neo4j.multiProperties',true)
-conf.setProperty('gremlin.neo4j.metaProperties',true)
-graph = Neo4jGraph.open(conf)
-g = graph.traversal()
-g.addV('name','michael','name','michael hunger','name','mhunger')
-g.V().properties('name').property('acl', 'public')
-g.V(0).valueMap()
-g.V(0).properties()
-g.V(0).properties().valueMap()
-graph.close()
-----
-
-WARNING: `Neo4jGraph` without multi- and meta-properties is in 1-to-1 correspondence with the native, underlying Neo4j
-representation. It is recommended that if the user does not require multi/meta-properties, then they should not
-enable them. Without multi- and meta-properties enabled, Neo4j can be interacted with with other tools and technologies
-that do not leverage TinkerPop.
-
-IMPORTANT: When using a multi-property enabled `Neo4jGraph`, vertices may represent their properties on "hidden
-nodes" adjacent to the vertex. If a vertex property key/value is required for indexing, then two indices are
-required -- e.g. `CREATE INDEX ON :person(name)` and `CREATE INDEX ON :vertexProperty(name)`
-(see <<_indices,Neo4j indices>>).
-
-Cypher
-~~~~~~
-
-image::gremlin-loves-cypher.png[width=400]
-
-NeoTechnology are the creators of the graph pattern-match query language link:http://www.neo4j.org/learn/cypher[Cypher].
-It is possible to leverage Cypher from within Gremlin by using the `Neo4jGraph.cypher()` graph traversal method.
-
-[gremlin-groovy]
-----
-graph = Neo4jGraph.open('/tmp/neo4j')
-graph.io(gryo()).readGraph('data/tinkerpop-modern.kryo')
-graph.cypher('MATCH (a {name:"marko"}) RETURN a')
-graph.cypher('MATCH (a {name:"marko"}) RETURN a').select('a').out('knows').values('name')
-graph.close()
-----
-
-Thus, like <<match-step,`match()`>>-step in Gremlin, it is possible to do a declarative pattern match and then move
-back into imperative Gremlin.
-
-TIP: For those developers using <<gremlin-server,Gremlin Server>> against Neo4j, it is possible to do Cypher queries
-by simply placing the Cypher string in `graph.cypher(...)` before submission to the server.
-
-Multi-Label
-~~~~~~~~~~~
-
-TinkerPop3 requires every `Element` to have a single, immutable string label (i.e. a `Vertex`, `Edge`, and
-`VertexProperty`). In Neo4j, a `Node` (vertex) can have an
-link:http://neo4j.com/docs/stable/graphdb-neo4j-labels.html[arbitrary number of labels] while a `Relationship`
-(edge) can have one and only one. Furthermore, in Neo4j, `Node` labels are mutable while `Relationship` labels are
-not. In order to handle this mismatch, three `Neo4jVertex` specific methods exist in Neo4j-Gremlin.
-
-[source,java]
-public Set<String> labels() // get all the labels of the vertex
-public void addLabel(String label) // add a label to the vertex
-public void removeLabel(String label) // remove a label from the vertex
-
-An example use case is presented below.
-
-[gremlin-groovy]
-----
-graph = Neo4jGraph.open('/tmp/neo4j')
-vertex = (Neo4jVertex) graph.addVertex('human::animal') <1>
-vertex.label() <2>
-vertex.labels() <3>
-vertex.addLabel('organism') <4>
-vertex.label()
-vertex.removeLabel('human') <5>
-vertex.labels()
-vertex.addLabel('organism') <6>
-vertex.labels()
-vertex.removeLabel('human') <7>
-vertex.label()
-g = graph.traversal()
-g.V().has(label,'organism') <8>
-g.V().has(label,of('organism')) <9>
-g.V().has(label,of('organism')).has(label,of('animal'))
-g.V().has(label,of('organism').and(of('animal')))
-graph.close()
-----
-
-<1> Typecasting to a `Neo4jVertex` is only required in Java.
-<2> The standard `Vertex.label()` method returns all the labels in alphabetical order concatenated using `::`.
-<3> `Neo4jVertex.labels()` method returns the individual labels as a set.
-<4> `Neo4jVertex.addLabel()` method adds a single label.
-<5> `Neo4jVertex.removeLabel()` method removes a single label.
-<6> Labels are unique and thus duplicate labels don't exist.
-<7> If a label that does not exist is removed, nothing happens.
-<8> `P.eq()` does a full string match and should only be used if multi-labels are not leveraged.
-<9> `LabelP.of()` is specific to `Neo4jGraph` and used for multi-label matching.
-
-IMPORTANT: `LabelP.of()` is only required if multi-labels are leveraged. `LabelP.of()` is used when
-filtering/looking-up vertices by their label(s) as the standard `P.eq()` does a direct match on the `::`-representation
-of `vertex.label()`
-
-Loading with BulkLoaderVertexProgram
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load
-large amounts of data to and from Neo4j. The following code demonstrates how to load the modern graph from TinkerGraph
-into Neo4j:
-
-[gremlin-groovy]
-----
-wgConf = 'conf/neo4j-standalone.properties'
-modern = TinkerFactory.createModern()
-blvp = BulkLoaderVertexProgram.build().
-           keepOriginalIds(false).
-           writeGraph(wgConf).create(modern)
-modern.compute().workers(1).program(blvp).submit().get()
-graph = GraphFactory.open(wgConf)
-g = graph.traversal()
-g.V().valueMap()
-graph.close()
-----
-
-[source,properties]
-----
-# neo4j-standalone.properties
-
-gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
-gremlin.neo4j.directory=/tmp/neo4j
-gremlin.neo4j.conf.node_auto_indexing=true
-gremlin.neo4j.conf.relationship_auto_indexing=true
-----
-
-[[hadoop-gremlin]]
-Hadoop-Gremlin
---------------
-
-[source,xml]
-----
-<dependency>
-   <groupId>org.apache.tinkerpop</groupId>
-   <artifactId>hadoop-gremlin</artifactId>
-   <version>x.y.z</version>
-</dependency>
-----
-
-image:hadoop-logo-notext.png[width=100,float=left] link:http://hadoop.apache.org/[Hadoop] is a distributed
-computing framework that is used to process data represented across a multi-machine compute cluster. When the
-data in the Hadoop cluster represents a TinkerPop3 graph, then Hadoop-Gremlin can be used to process the graph
-using both TinkerPop3's OLTP and OLAP graph computing models.
-
-IMPORTANT: This section assumes that the user has a Hadoop 2.x cluster functioning. For more information on getting
-started with Hadoop, please see the
-link:http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html[Single Node Setup]
-tutorial. Moreover, if using `GiraphGraphComputer` or `SparkGraphComputer` it is advisable that the reader also
-familiarize their self with Giraph (link:http://giraph.apache.org/quick_start.html[Getting Started]) and Spark
-(link:http://spark.apache.org/docs/latest/quick-start.html[Quick Start]).
-
-Installing Hadoop-Gremlin
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The `HADOOP_GREMLIN_LIBS` references locations that contains jars that should be uploaded to a respective
-distributed cache (link:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html[YARN] or SparkServer).
-Note that the locations in `HADOOP_GREMLIN_LIBS` can be a colon-separated (`:`) and all jars from all locations will
-be loaded into the cluster. Typically, only the jars of the respective GraphComputer are required to be loaded (e.g.
-`GiraphGraphComputer` plugin lib directory).
-
-[source,shell]
-export HADOOP_GREMLIN_LIBS=/usr/local/gremlin-console/ext/giraph-gremlin/lib
-
-If using <<gremlin-console,Gremlin Console>>, it is important to install the Hadoop-Gremlin plugin. Note that
-Hadoop-Gremlin requires a Gremlin Console restart after installing.
-
-[source,text]
-----
-$ bin/gremlin.sh
-
-         \,,,/
-         (o o)
------oOOo-(3)-oOOo-----
-plugin activated: tinkerpop.server
-plugin activated: tinkerpop.utilities
-plugin activated: tinkerpop.tinkergraph
-gremlin> :install org.apache.tinkerpop hadoop-gremlin x.y.z
-==>loaded: [org.apache.tinkerpop, hadoop-gremlin, x.y.z] - restart the console to use [tinkerpop.hadoop]
-gremlin> :q
-$ bin/gremlin.sh
-
-         \,,,/
-         (o o)
------oOOo-(3)-oOOo-----
-plugin activated: tinkerpop.server
-plugin activated: tinkerpop.utilities
-plugin activated: tinkerpop.tinkergraph
-gremlin> :plugin use tinkerpop.hadoop
-==>tinkerpop.hadoop activated
-gremlin>
-----
-
-Properties Files
-~~~~~~~~~~~~~~~~
-
-`HadoopGraph` makes use of properties files which ultimately get turned into Apache configurations and/or
-Hadoop configurations. The example properties file presented below is located at `conf/hadoop/hadoop-gryo.properties`.
-
-[source,text]
-gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
-gremlin.hadoop.inputLocation=tinkerpop-modern.kryo
-gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
-gremlin.hadoop.outputLocation=output
-gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
-gremlin.hadoop.jarsInDistributedCache=true
-####################################
-# Spark Configuration              #
-####################################
-spark.master=local[4]
-spark.executor.memory=1g
-spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
-####################################
-# SparkGraphComputer Configuration #
-####################################
-gremlin.spark.graphInputRDD=org.apache.tinkerpop.gremlin.spark.structure.io.InputRDDFormat
-gremlin.spark.graphOutputRDD=org.apache.tinkerpop.gremlin.spark.structure.io.OutputRDDFormat
-gremlin.spark.persistContext=true
-#####################################
-# GiraphGraphComputer Configuration #
-#####################################
-giraph.minWorkers=2
-giraph.maxWorkers=2
-giraph.useOutOfCoreGraph=true
-giraph.useOutOfCoreMessages=true
-mapreduce.map.java.opts=-Xmx1024m
-mapreduce.reduce.java.opts=-Xmx1024m
-giraph.numInputThreads=2
-giraph.numComputeThreads=2
-
-A review of the Hadoop-Gremlin specific properties are provided in the table below. For the respective OLAP
-engines (<<sparkgraphcomputer,`SparkGraphComputer`>> or <<giraphgraphcomputer,`GiraphGraphComputer`>>) refer
-to their respective documentation for configuration options.
-
-[width="100%",cols="2,10",options="header"]
-|=========================================================
-|Property |Description
-|gremlin.graph |The class of the graph to construct using GraphFactory.
-|gremlin.hadoop.inputLocation |The location of the input file(s) for Hadoop-Gremlin to read the graph from.
-|gremlin.hadoop.graphInputFormat |The format that the graph input file(s) are represented in.
-|gremlin.hadoop.outputLocation |The location to write the computed HadoopGraph to.
-|gremlin.hadoop.graphOutputFormat |The format that the output file(s) should be represented in.
-|gremlin.hadoop.jarsInDistributedCache |Whether to upload the Hadoop-Gremlin jars to a distributed cache (necessary if jars are not on the machines' classpaths).
-|=========================================================
-
-
-
-Along with the properties above, the numerous link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[Hadoop specific properties]
-can be added as needed to tune and parameterize the executed Hadoop-Gremlin job on the respective Hadoop cluster.
-
-IMPORTANT: As the size of the graphs being processed becomes large, it is important to fully understand how the
-underlying OLAP engine (e.g. Spark, Giraph, etc.) works and understand the numerous parameterizations offered by
-these systems. Such knowledge can help alleviate out of memory exceptions, slow load times, slow processing times,
-garbage collection issues, etc.
-
-OLTP Hadoop-Gremlin
-~~~~~~~~~~~~~~~~~~~
-
-image:hadoop-pipes.png[width=180,float=left] It is possible to execute OLTP operations over a `HadoopGraph`.
-However, realize that the underlying HDFS files are not random access and thus, to retrieve a vertex, a linear scan
-is required. OLTP operations are useful for peeking into the graph prior to executing a long running OLAP job -- e.g.
-`g.V().valueMap().limit(10)`.
-
-CAUTION: OLTP operations on `HadoopGraph` are not efficient. They require linear scans to execute and are unreasonable
-for large graphs. In such large graph situations, make use of <<traversalvertexprogram,TraversalVertexProgram>>
-which is the OLAP Gremlin machine.
-
-[gremlin-groovy]
-----
-hdfs.copyFromLocal('data/tinkerpop-modern.kryo', 'tinkerpop-modern.kryo')
-hdfs.ls()
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-g = graph.traversal()
-g.V().count()
-g.V().out().out().values('name')
-g.V().group().by{it.value('name')[1]}.by('name').next()
-----
-
-OLAP Hadoop-Gremlin
-~~~~~~~~~~~~~~~~~~~
-
-image:hadoop-furnace.png[width=180,float=left] Hadoop-Gremlin was designed to execute OLAP operations via
-`GraphComputer`. The OLTP examples presented previously are reproduced below, but using `TraversalVertexProgram`
-for the execution of the Gremlin traversal.
-
-A `Graph` in TinkerPop3 can support any number of `GraphComputer` implementations. Out of the box, Hadoop-Gremlin
-supports the following three implementations.
-
-* <<mapreducegraphcomputer,`MapReduceGraphComputer`>>: Leverages Hadoop's MapReduce engine to execute TinkerPop3 OLAP
-computations. (*coming soon*)
-** The graph must fit within the total disk space of the Hadoop cluster (supports massive graphs). Message passing is
-coordinated via MapReduce jobs over the on-disk graph (slow traversals).
-* <<sparkgraphcomputer,`SparkGraphComputer`>>: Leverages Apache Spark to execute TinkerPop3 OLAP computations.
-** The graph may fit within the total RAM of the cluster (supports larger graphs). Message passing is coordinated via
-Spark map/reduce/join operations on in-memory and disk-cached data (average speed traversals).
-* <<giraphgraphcomputer,`GiraphGraphComputer`>>: Leverages Apache Giraph to execute TinkerPop3 OLAP computations.
-** The graph should fit within the total RAM of the Hadoop cluster (graph size restriction), though "out-of-core"
-processing is possible. Message passing is coordinated via ZooKeeper for the in-memory graph (speedy traversals).
-
-TIP: image:gremlin-sugar.png[width=50,float=left] For those wanting to use the <<sugar-plugin,SugarPlugin>> with
-their submitted traversal, do `:remote config useSugar true` as well as `:plugin use tinkerpop.sugar` at the start of
-the Gremlin Console session if it is not already activated.
-
-Note that `SparkGraphComputer` and `GiraphGraphComputer` are loaded via their respective plugins. Typically only
-one plugin or the other is loaded depending on the desired `GraphComputer` to use.
-
-[source,text]
-----
-$ bin/gremlin.sh
-
-         \,,,/
-         (o o)
------oOOo-(3)-oOOo-----
-plugin activated: tinkerpop.server
-plugin activated: tinkerpop.utilities
-plugin activated: tinkerpop.tinkergraph
-plugin activated: tinkerpop.hadoop
-gremlin> :install org.apache.tinkerpop giraph-gremlin x.y.z
-==>loaded: [org.apache.tinkerpop, giraph-gremlin, x.y.z] - restart the console to use [tinkerpop.giraph]
-gremlin> :install org.apache.tinkerpop spark-gremlin x.y.z
-==>loaded: [org.apache.tinkerpop, spark-gremlin, x.y.z] - restart the console to use [tinkerpop.spark]
-gremlin> :q
-$ bin/gremlin.sh
-
-         \,,,/
-         (o o)
------oOOo-(3)-oOOo-----
-plugin activated: tinkerpop.server
-plugin activated: tinkerpop.utilities
-plugin activated: tinkerpop.tinkergraph
-plugin activated: tinkerpop.hadoop
-gremlin> :plugin use tinkerpop.giraph
-==>tinkerpop.giraph activated
-gremlin> :plugin use tinkerpop.spark
-==>tinkerpop.spark activated
-----
-
-WARNING: Hadoop, Spark, and Giraph all depend on many of the same libraries (e.g. ZooKeeper, Snappy, Netty, Guava,
-etc.). Unfortunately, typically these dependencies are not to the same versions of the respective libraries. As such,
-it is best to *not* have both Spark and Giraph plugins loaded in the same console session nor in the same Java
-project (though intelligent `<exclusion>`-usage can help alleviate conflicts in a Java project).
-
-CAUTION: It is important to note that when doing an OLAP traversal, any resulting vertices, edges, or properties will be
-attached to the source graph. For Hadoop-based graphs, this may lead to linear search times on massive graphs. Thus,
-if vertex, edge, or property objects are to be returns (as a final result), it is best to `.id()` to get the id
-of the object and not the actual attached object.
-
-[[mapreducegraphcomputer]]
-MapReduceGraphComputer
-^^^^^^^^^^^^^^^^^^^^^^
-
-*COMING SOON*
-
-[[sparkgraphcomputer]]
-SparkGraphComputer
-^^^^^^^^^^^^^^^^^^
-
-[source,xml]
-----
-<dependency>
-   <groupId>org.apache.tinkerpop</groupId>
-   <artifactId>spark-gremlin</artifactId>
-   <version>x.y.z</version>
-</dependency>
-----
-
-image:spark-logo.png[width=175,float=left] link:http://spark.apache.org[Spark] is an Apache Software Foundation
-project focused on general-purpose OLAP data processing. Spark provides a hybrid in-memory/disk-based distributed
-computing model that is similar to Hadoop's MapReduce model. Spark maintains a fluent function chaining DSL that is
-arguably easier for developers to work with than native Hadoop MapReduce. Spark-Gremlin provides an implementation of
-the bulk-synchronous parallel, distributed message passing algorithm within Spark and thus, any `VertexProgram` can be
-executed over `SparkGraphComputer`.
-
-If `SparkGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
-specified in `HADOOP_GREMLIN_LIBS`.
-
-[source,shell]
-export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/spark-gremlin/lib
-
-Furthermore the `lib/` directory should be distributed across all machines in the SparkServer cluster. For this purpose TinkerPop
-provides a helper script, which takes the Spark installation directory and the the Spark machines as input:
-
-[source,shell]
-bin/init-tp-spark.sh /usr/local/spark spark@10.0.0.1 spark@10.0.0.2 spark@10.0.0.3
-
-Once the `lib/` directory is distributed, `SparkGraphComputer` can be used as follows.
-
-[gremlin-groovy]
-----
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-g = graph.traversal(computer(SparkGraphComputer))
-g.V().count()
-g.V().out().out().values('name')
-----
-
-For using lambdas in Gremlin-Groovy, simply provide `:remote connect` a `TraversalSource` which leverages SparkGraphComputer.
-
-[gremlin-groovy]
-----
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-g = graph.traversal(computer(SparkGraphComputer))
-:remote connect tinkerpop.hadoop graph g
-:> g.V().group().by{it.value('name')[1]}.by('name')
-----
-
-The `SparkGraphComputer` algorithm leverages Spark's caching abilities to reduce the amount of data shuffled across
-the wire on each iteration of the <<vertexprogram,`VertexProgram`>>. When the graph is loaded as a Spark RDD
-(Resilient Distributed Dataset) it is immediately cached as `graphRDD`. The `graphRDD` is a distributed adjacency
-list which encodes the vertex, its properties, and all its incident edges. On the first iteration, each vertex
-(in parallel) is passed through `VertexProgram.execute()`. This yields an output of the vertex's mutated state
-(i.e. updated compute keys -- `propertyX`) and its outgoing messages. This `viewOutgoingRDD` is then reduced to
-`viewIncomingRDD` where the outgoing messages are sent to their respective vertices. If a `MessageCombiner` exists
-for the vertex program, then messages are aggregated locally and globally to ultimately yield one incoming message
-for the vertex. This reduce sequence is the "message pass." If the vertex program does not terminate on this
-iteration, then the `viewIncomingRDD` is joined with the cached `graphRDD` and the process continues. When there
-are no more iterations, there is a final join and the resultant RDD is stripped of its edges and messages. This
-`mapReduceRDD` is cached and is processed by each <<mapreduce,`MapReduce`>> job in the
-<<graphcomputer,`GraphComputer`>> computation.
-
-image::spark-algorithm.png[width=775]
-
-[width="100%",cols="2,10",options="header"]
-|========================================================
-|Property |Description
-|gremlin.spark.graphInputRDD |A class for creating RDD's from underlying graph data, defaults to Hadoop `InputFormat`.
-|gremlin.spark.graphOutputRDD |A class for output RDD's, defaults to Hadoop `OutputFormat`.
-|gremlin.spark.graphStorageLevel |What `StorageLevel` to use for the cached graph during job execution (default `MEMORY_ONLY`).
-|gremlin.spark.persistContext |Whether to create a new `SparkContext` for every `SparkGraphComputer` or to reuse an existing one.
-|gremlin.spark.persistStorageLevel |What `StorageLevel` to use when persisted RDDs via `PersistedOutputRDD` (default `MEMORY_ONLY`).
-|========================================================
-
-InputRDD and OutputRDD
-++++++++++++++++++++++
-
-If the provider/user does not want to use Hadoop `InputFormats`, it is possible to leverage Spark's RDD
-constructs directly. There is a `gremlin.spark.graphInputRDD` configuration that references a `Class<? extends
-InputRDD>`. An `InputRDD` provides a read method that takes a `SparkContext` and returns a graphRDD. Likewise, use
-`gremlin.spark.graphOutputRDD` and the respective `OutputRDD`.
-
-If the graph system provider uses an `InputRDD`, the RDD should maintain an associated `org.apache.spark.Partitioner`. By doing so,
-`SparkGraphComputer` will not partition the loaded graph across the cluster as it has already been partitioned by the graph system provider.
-This can save a significant amount of time and space resources.
-If the `InputRDD` does not have a registered partitioner, `SparkGraphComputer` will partition the graph using
-a `org.apache.spark.HashPartitioner` with the number of partitions being either the number of existing partitions in the input (e.g. input splits)
-or the user specified number of `GraphComputer.workers()`.
-
-Using a Persisted Context
-+++++++++++++++++++++++++
-
-It is possible to persist the graph RDD between jobs within the `SparkContext` (e.g. SparkServer) by leveraging `PersistedOutputRDD`.
-Note that `gremlin.spark.persistContext` should be set to `true` or else the persisted RDD will be destroyed when the `SparkContext` closes.
-The persisted RDD is named by the `gremlin.hadoop.outputLocation` configuration. Similarly, `PersistedInputRDD` is used with respective
-`gremlin.hadoop.inputLocation` to retrieve the persisted RDD from the `SparkContext`.
-
-When using a persistent `SparkContext` the configuration used by the original Spark Configuration will be inherited by all threaded
-references to that Spark Context. The exception to this rule are those properties which have a specific thread local effect.
-
-.Thread Local Properties
-. spark.jobGroup.id
-. spark.job.description
-. spark.job.interruptOnCancel
-. spark.scheduler.pool
-
-Finally, there is a `spark` object that can be used to manage persisted RDDs (see <<interacting-with-spark, Interacting with Spark>>).
-
-[[bulkdumpervertexprogramusingspark]]
-Exporting with BulkDumperVertexProgram
-++++++++++++++++++++++++++++++++++++++
-
-The <<bulkdumpervertexprogram, BulkDumperVertexProgram>> exports a whole graph in any of the supported Hadoop GraphOutputFormats (`GraphSONOutputFormat`,
-`GryoOutputFormat` or `ScriptOutputFormat`). The example below takes a Hadoop graph as the input (in `GryoInputFormat`) and exports it as a GraphSON file
-(`GraphSONOutputFormat`).
-
-[gremlin-groovy]
-----
-hdfs.copyFromLocal('data/tinkerpop-modern.kryo', 'tinkerpop-modern.kryo')
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-graph.configuration().setProperty('gremlin.hadoop.graphOutputFormat', 'org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat')
-graph.compute(SparkGraphComputer).program(BulkDumperVertexProgram.build().create()).submit().get()
-hdfs.ls('output')
-hdfs.head('output/~g')
-----
-
-Loading with BulkLoaderVertexProgram
-++++++++++++++++++++++++++++++++++++
-
-The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load large
-amounts of data to and from different `Graph` implementations. The following code demonstrates how to load the
-Grateful Dead graph from HadoopGraph into TinkerGraph over Spark:
-
-[gremlin-groovy]
-----
-hdfs.copyFromLocal('data/grateful-dead.kryo', 'grateful-dead.kryo')
-readGraph = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
-writeGraph = 'conf/tinkergraph-gryo.properties'
-blvp = BulkLoaderVertexProgram.build().
-           keepOriginalIds(false).
-           writeGraph(writeGraph).create(readGraph)
-readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()
-:set max-iteration 10
-graph = GraphFactory.open(writeGraph)
-g = graph.traversal()
-g.V().valueMap()
-graph.close()
-----
-
-[source,properties]
-----
-# hadoop-grateful-gryo.properties
-
-#
-# Hadoop Graph Configuration
-#
-gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
-gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
-gremlin.hadoop.inputLocation=grateful-dead.kryo
-gremlin.hadoop.outputLocation=output
-gremlin.hadoop.jarsInDistributedCache=true
-
-#
-# SparkGraphComputer Configuration
-#
-spark.master=local[1]
-spark.executor.memory=1g
-spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
-----
-
-[source,properties]
-----
-# tinkergraph-gryo.properties
-
-gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
-gremlin.tinkergraph.graphFormat=gryo
-gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
-----
-
-IMPORTANT: The path to TinkerGraph jars needs to be included in the `HADOOP_GREMLIN_LIBS` for the above example to work.
-
-[[giraphgraphcomputer]]
-GiraphGraphComputer
-^^^^^^^^^^^^^^^^^^^
-
-[source,xml]
-----
-<dependency>
-   <groupId>org.apache.tinkerpop</groupId>
-   <artifactId>giraph-gremlin</artifactId>
-   <version>x.y.z</version>
-</dependency>
-----
-
-image:giraph-logo.png[width=100,float=left] link:http://giraph.apache.org[Giraph] is an Apache Software Foundation
-project focused on OLAP-based graph processing. Giraph makes use of the distributed graph computing paradigm made
-popular by Google's Pregel. In Giraph, developers write "vertex programs" that get executed at each vertex in
-parallel. These programs communicate with one another in a bulk synchronous parallel (BSP) manner. This model aligns
-with TinkerPop3's `GraphComputer` API. TinkerPop3 provides an implementation of `GraphComputer` that works for Giraph
-called `GiraphGraphComputer`. Moreover, with TinkerPop3's <<mapreduce,MapReduce>>-framework, the standard
-Giraph/Pregel model is extended to support an arbitrary number of MapReduce phases to aggregate and yield results
-from the graph. Below are examples using `GiraphGraphComputer` from the <<gremlin-console,Gremlin-Console>>.
-
-WARNING: Giraph uses a large number of Hadoop counters. The default for Hadoop is 120. In `mapred-site.xml` it is
-possible to increase the limit it via the `mapreduce.job.counters.max` property. A good value to use is 1000. This
-is a cluster-wide property so be sure to restart the cluster after updating.
-
-WARNING: The maximum number of workers can be no larger than the number of map-slots in the Hadoop cluster minus 1.
-For example, if the Hadoop cluster has 4 map slots, then `giraph.maxWorkers` can not be larger than 3. One map-slot
-is reserved for the master compute node and all other slots can be allocated as workers to execute the VertexPrograms
-on the vertices of the graph.
-
-If `GiraphGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
-specified in `HADOOP_GREMLIN_LIBS`.
-
-[source,shell]
-export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/giraph-gremlin/lib
-
-Or, the user can specify the directory in the Gremlin Console.
-
-[source,groovy]
-System.setProperty('HADOOP_GREMLIN_LIBS',System.getProperty('HADOOP_GREMLIN_LIBS') + ':' + '/usr/local/gremlin-console/ext/giraph-gremlin/lib')
-
-[gremlin-groovy]
-----
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-g = graph.traversal(computer(GiraphGraphComputer))
-g.V().count()
-g.V().out().out().values('name')
-----
-
-IMPORTANT: The examples above do not use lambdas (i.e. closures in Gremlin-Groovy). This makes the traversal
-serializable and thus, able to be distributed to all machines in the Hadoop cluster. If a lambda is required in a
-traversal, then the traversal must be sent as a `String` and compiled locally at each machine in the cluster. The
-following example demonstrates the `:remote` command which allows for submitting Gremlin traversals as a `String`.
-
-[gremlin-groovy]
-----
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-g = graph.traversal(computer(GiraphGraphComputer))
-:remote connect tinkerpop.hadoop graph g
-:> g.V().group().by{it.value('name')[1]}.by('name')
-result
-result.memory.runtime
-result.memory.keys()
-result.memory.get('~reducing')
-----
-
-NOTE: If the user explicitly specifies `giraph.maxWorkers` and/or `giraph.numComputeThreads` in the configuration,
-then these values will be used by Giraph. However, if these are not specified and the user never calls
-`GraphComputer.workers()` then `GiraphGraphComputer` will try to compute the number of workers/threads to use based
-on the cluster's profile.
-
-Loading with BulkLoaderVertexProgram
-++++++++++++++++++++++++++++++++++++
-
-The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load
-large amounts of data to and from different `Graph` implementations. The following code demonstrates how to load
-the Grateful Dead graph from HadoopGraph into TinkerGraph over Giraph:
-
-[gremlin-groovy]
-----
-hdfs.copyFromLocal('data/grateful-dead.kryo', 'grateful-dead.kryo')
-readGraph = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
-writeGraph = 'conf/tinkergraph-gryo.properties'
-blvp = BulkLoaderVertexProgram.build().
-           keepOriginalIds(false).
-           writeGraph(writeGraph).create(readGraph)
-readGraph.compute(GiraphGraphComputer).workers(1).program(blvp).submit().get()
-:set max-iteration 10
-graph = GraphFactory.open(writeGraph)
-g = graph.traversal()
-g.V().valueMap()
-graph.close()
-----
-
-[source,properties]
-----
-# hadoop-grateful-gryo.properties
-
-#
-# Hadoop Graph Configuration
-#
-gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
-gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
-gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
-gremlin.hadoop.inputLocation=grateful-dead.kryo
-gremlin.hadoop.outputLocation=output
-gremlin.hadoop.jarsInDistributedCache=true
-
-#
-# GiraphGraphComputer Configuration
-#
-giraph.minWorkers=1
-giraph.maxWorkers=1
-giraph.useOutOfCoreGraph=true
-giraph.useOutOfCoreMessages=true
-mapred.map.child.java.opts=-Xmx1024m
-mapred.reduce.child.java.opts=-Xmx1024m
-giraph.numInputThreads=4
-giraph.numComputeThreads=4
-giraph.maxMessagesInMemory=100000
-----
-
-[source,properties]
-----
-# tinkergraph-gryo.properties
-
-gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
-gremlin.tinkergraph.graphFormat=gryo
-gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
-----
-
-NOTE: The path to TinkerGraph needs to be included in the `HADOOP_GREMLIN_LIBS` for the above example to work.
-
-Input/Output Formats
-~~~~~~~~~~~~~~~~~~~~
-
-image:adjacency-list.png[width=300,float=right] Hadoop-Gremlin provides various I/O formats -- i.e. Hadoop
-`InputFormat` and `OutputFormat`. All of the formats make use of an link:http://en.wikipedia.org/wiki/Adjacency_list[adjacency list]
-representation of the graph where each "row" represents a single vertex, its properties, and its incoming and
-outgoing edges.
-
-{empty} +
-
-[[gryo-io-format]]
-Gryo I/O Format
-^^^^^^^^^^^^^^^
-
-* **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat`
-* **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat`
-
-<<gryo-reader-writer,Gryo>> is a binary graph format that leverages link:https://github.com/EsotericSoftware/kryo[Kryo]
-to make a compact, binary representation of a vertex. It is recommended that users leverage Gryo given its space/time
-savings over text-based representations.
-
-NOTE: The `GryoInputFormat` is splittable.
-
-[[graphson-io-format]]
-GraphSON I/O Format
-^^^^^^^^^^^^^^^^^^^
-
-* **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat`
-* **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat`
-
-<<graphson-reader-writer,GraphSON>> is a JSON based graph format. GraphSON is a space-expensive graph format in that
-it is a text-based markup language. However, it is convenient for many developers to work with as its structure is
-simple (easy to create and parse).
-
-The data below represents an adjacency list representation of the classic TinkerGraph toy graph in GraphSON format.
-
-[source,json]
-----
-{"id":1,"label":"person","outE":{"created":[{"id":9,"inV":3,"properties":{"weight":0.4}}],"knows":[{"id":7,"inV":2,"properties":{"weight":0.5}},{"id":8,"inV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":0,"value":"marko"}],"age":[{"id":1,"value":29}]}}
-{"id":2,"label":"person","inE":{"knows":[{"id":7,"outV":1,"properties":{"weight":0.5}}]},"properties":{"name":[{"id":2,"value":"vadas"}],"age":[{"id":3,"value":27}]}}
-{"id":3,"label":"software","inE":{"created":[{"id":9,"outV":1,"properties":{"weight":0.4}},{"id":11,"outV":4,"properties":{"weight":0.4}},{"id":12,"outV":6,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":4,"value":"lop"}],"lang":[{"id":5,"value":"java"}]}}
-{"id":4,"label":"person","inE":{"knows":[{"id":8,"outV":1,"properties":{"weight":1.0}}]},"outE":{"created":[{"id":10,"inV":5,"properties":{"weight":1.0}},{"id":11,"inV":3,"properties":{"weight":0.4}}]},"properties":{"name":[{"id":6,"value":"josh"}],"age":[{"id":7,"value":32}]}}
-{"id":5,"label":"software","inE":{"created":[{"id":10,"outV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":8,"value":"ripple"}],"lang":[{"id":9,"value":"java"}]}}
-{"id":6,"label":"person","outE":{"created":[{"id":12,"inV":3,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":10,"value":"peter"}],"age":[{"id":11,"value":35}]}}
-----
-
-[[script-io-format]]
-Script I/O Format
-^^^^^^^^^^^^^^^^^
-
-* **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat`
-* **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptOutputFormat`
-
-`ScriptInputFormat` and `ScriptOutputFormat` take an arbitrary script and use that script to either read or write
-`Vertex` objects, respectively. This can be considered the most general `InputFormat`/`OutputFormat` possible in that
-Hadoop-Gremlin uses the user provided script for all reading/writing.
-
-ScriptInputFormat
-+++++++++++++++++
-
-The data below represents an adjacency list representation of the classic TinkerGraph toy graph. First line reads,
-"vertex `1`, labeled `person` having 2 property values (`marko` and `29`) has 3 outgoing edges; the first edge is
-labeled `knows`, connects the current vertex `1` with vertex `2` and has a property value `0.4`, and so on."
-
-[source]
-1:person:marko:29 knows:2:0.5,knows:4:1.0,created:3:0.4
-2:person:vadas:27
-3:project:lop:java
-4:person:josh:32 created:3:0.4,created:5:1.0
-5:project:ripple:java
-6:person:peter:35 created:3:0.2
-
-There is no corresponding `InputFormat` that can parse this particular file (or some adjacency list variant of it).
-As such, `ScriptInputFormat` can be used. With `ScriptInputFormat` a script is stored in HDFS and leveraged by each
-mapper in the Hadoop job. The script must have the following method defined:
-
-[source,groovy]
-def parse(String line, ScriptElementFactory factory) { ... }
-
-`ScriptElementFactory` is a legacy from previous versions and, although it's still functional, it should no longer be used.
-In order to create vertices and edges, the `parse()` method gets access to a global variable named `graph`, which holds
-the local `StarGraph` for the current line/vertex.
-
-An appropriate `parse()` for the above adjacency list file is:
-
-[source,groovy]
-def parse(line, factory) {
-    def parts = line.split(/ /)
-    def (id, label, name, x) = parts[0].split(/:/).toList()
-    def v1 = graph.addVertex(T.id, id, T.label, label)
-    if (name != null) v1.property('name', name) // first value is always the name
-    if (x != null) {
-        // second value depends on the vertex label; it's either
-        // the age of a person or the language of a project
-        if (label.equals('project')) v1.property('lang', x)
-        else v1.property('age', Integer.valueOf(x))
-    }
-    if (parts.length == 2) {
-        parts[1].split(/,/).grep { !it.isEmpty() }.each {
-            def (eLabel, refId, weight) = it.split(/:/).toList()
-            def v2 = graph.addVertex(T.id, refId)
-            v1.addOutEdge(eLabel, v2, 'weight', Double.valueOf(weight))
-        }
-    }
-    return v1
-}
-
-The resultant `Vertex` denotes whether the line parsed yielded a valid Vertex. As such, if the line is not valid
-(e.g. a comment line, a skip line, etc.), then simply return `null`.
-
-ScriptOutputFormat Support
-++++++++++++++++++++++++++
-
-The principle above can also be used to convert a vertex to an arbitrary `String` representation that is ultimately
-streamed back to a file in HDFS. This is the role of `ScriptOutputFormat`. `ScriptOutputFormat` requires that the
-provided script maintains a method with the following signature:
-
-[source,groovy]
-def stringify(Vertex vertex) { ... }
-
-An appropriate `stringify()` to produce output in the same format that was shown in the `ScriptInputFormat` sample is:
-
-[source,groovy]
-def stringify(vertex) {
-    def v = vertex.values('name', 'age', 'lang').inject(vertex.id(), vertex.label()).join(':')
-    def outE = vertex.outE().map {
-        def e = it.get()
-        e.values('weight').inject(e.label(), e.inV().next().id()).join(':')
-    }.join(',')
-    return [v, outE].join('\t')
-}
-
-
-
-Storage Systems
-~~~~~~~~~~~~~~~
-
-Hadoop-Gremlin provides two implementations of the `Storage` API:
-
-* `FileSystemStorage`: Access HDFS and local file system data.
-* `SparkContextStorage`: Access Spark persisted RDD data.
-
-[[interacting-with-hdfs]]
-Interacting with HDFS
-^^^^^^^^^^^^^^^^^^^^^
-
-The distributed file system of Hadoop is called link:http://en.wikipedia.org/wiki/Apache_Hadoop#Hadoop_distributed_file_system[HDFS].
-The results of any OLAP operation are stored in HDFS accessible via `hdfs`. For local file system access, there is `local`.
-
-[gremlin-groovy]
-----
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-graph.compute(SparkGraphComputer).program(PeerPressureVertexProgram.build().create(graph)).mapReduce(ClusterCountMapReduce.build().memoryKey('clusterCount').create()).submit().get();
-hdfs.ls()
-hdfs.ls('output')
-hdfs.head('output', GryoInputFormat)
-hdfs.head('output', 'clusterCount', SequenceFileInputFormat)
-hdfs.rm('output')
-hdfs.ls()
-----
-
-[[interacting-with-spark]]
-Interacting with Spark
-^^^^^^^^^^^^^^^^^^^^^^
-
-If a Spark context is persisted, then Spark RDDs will remain the Spark cache and accessible over subsequent jobs.
-RDDs are retrieved and saved to the `SparkContext` via `PersistedInputRDD` and `PersistedOutputRDD` respectivly.
-Persisted RDDs can be accessed using `spark`.
-
-[gremlin-groovy]
-----
-Spark.create('local[4]')
-graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
-graph.configuration().setProperty('gremlin.spark.graphOutputRDD', PersistedOutputRDD.class.getCanonicalName())
-graph.configuration().clearProperty('gremlin.hadoop.graphOutputFormat')
-graph.configuration().setProperty('gremlin.spark.persistContext',true)
-graph.compute(SparkGraphComputer).program(PeerPressureVertexProgram.build().create(graph)).mapReduce(ClusterCountMapReduce.build().memoryKey('clusterCount').create()).submit().get();
-spark.ls()
-spark.ls('output')
-spark.head('output', PersistedInputRDD)
-spark.head('output', 'clusterCount', PersistedInputRDD)
-spark.rm('output')
-spark.ls()
-----
-
-A Command Line Example
-~~~~~~~~~~~~~~~~~~~~~~
-
-image::pagerank-logo.png[width=300]
-
-The classic link:http://en.wikipedia.org/wiki/PageRank[PageRank] centrality algorithm can be executed over the
-TinkerPop graph from the command line using `GiraphGraphComputer`.
-
-WARNING: Be sure that the `HADOOP_GREMLIN_LIBS` references the location `lib` directory of the respective
-`GraphComputer` engine being used or else the requisite dependencies will not be uploaded to the Hadoop cluster.
-
-[source,text]
-----
-$ hdfs dfs -copyFromLocal data/tinkerpop-modern.json tinkerpop-modern.json
-$ hdfs dfs -ls
-Found 2 items
--rw-r--r--   1 marko supergroup       2356 2014-07-28 13:00 /user/marko/tinkerpop-modern.json
-$ hadoop jar target/giraph-gremlin-x.y.z-job.jar org.apache.tinkerpop.gremlin.giraph.process.computer.GiraphGraphComputer ../hadoop-gremlin/conf/hadoop-graphson.properties
-15/09/11 08:02:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-15/09/11 08:02:11 INFO computer.GiraphGraphComputer: HadoopGremlin(Giraph): PageRankVertexProgram[alpha=0.85,iterations=30]
-15/09/11 08:02:12 INFO mapreduce.JobSubmitter: number of splits:3
-15/09/11 08:02:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441915907347_0028
-15/09/11 08:02:12 INFO impl.YarnClientImpl: Submitted application application_1441915907347_0028
-15/09/11 08:02:12 INFO job.GiraphJob: Tracking URL: http://markos-macbook:8088/proxy/application_1441915907347_0028/
-15/09/11 08:02:12 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 3 mappers
-15/09/11 08:03:54 INFO mapreduce.Job: Running job: job_1441915907347_0028
-15/09/11 08:03:55 INFO mapreduce.Job: Job job_1441915907347_0028 running in uber mode : false
-15/09/11 08:03:55 INFO mapreduce.Job:  map 33% reduce 0%
-15/09/11 08:03:57 INFO mapreduce.Job:  map 67% reduce 0%
-15/09/11 08:04:01 INFO mapreduce.Job:  map 100% reduce 0%
-15/09/11 08:06:17 INFO mapreduce.Job: Job job_1441915907347_0028 completed successfully
-15/09/11 08:06:17 INFO mapreduce.Job: Counters: 80
-    File System Counters
-        FILE: Number of bytes read=0
-        FILE: Number of bytes written=483918
-        FILE: Number of read operations=0
-        FILE: Number of large read operations=0
-        FILE: Number of write operations=0
-        HDFS: Number of bytes read=1465
-        HDFS: Number of bytes written=1760
-        HDFS: Number of read operations=39
-        HDFS: Number of large read operations=0
-        HDFS: Number of write operations=20
-    Job Counters
-        Launched map tasks=3
-        Other local map tasks=3
-        Total time spent by all maps in occupied slots (ms)=458105
-        Total time spent by all reduces in occupied slots (ms)=0
-        Total time spent by all map tasks (ms)=458105
-        Total vcore-seconds taken by all map tasks=458105
-        Total megabyte-seconds taken by all map tasks=469099520
-    Map-Reduce Framework
-        Map input records=3
-        Map output records=0
-        Input split bytes=132
-        Spilled Records=0
-        Failed Shuffles=0
-        Merged Map outputs=0
-        GC time elapsed (ms)=1594
-        CPU time spent (ms)=0
-        Physical memory (bytes) snapshot=0
-        Virtual memory (bytes) snapshot=0
-        Total committed heap usage (bytes)=527958016
-    Giraph Stats
-        Aggregate edges=0
-        Aggregate finished vertices=0
-        Aggregate sent message message bytes=13535
-        Aggregate sent messages=186
-        Aggregate vertices=6
-        Current master task partition=0
-        Current workers=2
-        Last checkpointed superstep=0
-        Sent message bytes=438
-        Sent messages=6
-        Superstep=31
-    Giraph Timers
-        Initialize (ms)=2996
-        Input superstep (ms)=5209
-        Setup (ms)=59
-        Shutdown (ms)=9324
-        Superstep 0 GiraphComputation (ms)=3861
-        Superstep 1 GiraphComputation (ms)=4027
-        Superstep 10 GiraphComputation (ms)=4000
-        Superstep 11 GiraphComputation (ms)=4004
-        Superstep 12 GiraphComputation (ms)=3999
-        Superstep 13 GiraphComputation (ms)=4000
-        Superstep 14 GiraphComputation (ms)=4005
-        Superstep 15 GiraphComputation (ms)=4003
-        Superstep 16 GiraphComputation (ms)=4001
-        Superstep 17 GiraphComputation (ms)=4007
-        Superstep 18 GiraphComputation (ms)=3998
-        Superstep 19 GiraphComputation (ms)=4006
-        Superstep 2 GiraphComputation (ms)=4007
-        Superstep 20 GiraphComputation (ms)=3996
-        Superstep 21 GiraphComputation (ms)=4006
-        Superstep 22 GiraphComputation (ms)=4002
-        Superstep 23 GiraphComputation (ms)=3998
-        Superstep 24 GiraphComputation (ms)=4003
-        Superstep 25 GiraphComputation (ms)=4001
-        Superstep 26 GiraphComputation (ms)=4003
-        Superstep 27 GiraphComputation (ms)=4005
-        Superstep 28 GiraphComputation (ms)=4002
-        Superstep 29 GiraphComputation (ms)=4001
-        Superstep 3 GiraphComputation (ms)=3988
-        Superstep 30 GiraphComputation (ms)=4248
-        Superstep 4 GiraphComputation (ms)=4010
-        Superstep 5 GiraphComputation (ms)=3998
-        Superstep 6 GiraphComputation (ms)=3996
-        Superstep 7 GiraphComputation

<TRUNCATED>

[06/19] incubator-tinkerpop git commit: Properly handle session bindings to avoid concurrency errors.

Posted by dk...@apache.org.

Properly handle session bindings to avoid concurrency errors.

Basically, the bindings needed to be constructed within the session's executor to avoid two threads accessing it at the same time.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/b42dcea2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/b42dcea2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/b42dcea2

Branch: refs/heads/master
Commit: b42dcea255cdb5c8df91a35561d6845ac1f41f9e
Parents: a5a00bf
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Tue Feb 16 16:08:43 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Tue Feb 16 16:08:43 2016 -0500

----------------------------------------------------------------------
 .../tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java     | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/b42dcea2/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
----------------------------------------------------------------------
diff --git a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
index b8f48eb..cc4b4ad 100644
--- a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
+++ b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
@@ -200,7 +200,9 @@ public abstract class AbstractEvalOpProcessor implements OpProcessor {
                     try {
                         b.putAll(bindingsSupplier.get());
                     } catch (OpProcessorException ope) {
-                        ope.printStackTrace();
+                        // this should bubble up in the GremlinExecutor properly as the RuntimeException will be
+                        // unwrapped and the root cause thrown
+                        throw new RuntimeException(ope);
                     }
                 })
                 .withResult(o -> {

[19/19] incubator-tinkerpop git commit: Merge branch 'tp31'

Posted by dk...@apache.org.

Merge branch 'tp31'


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/e23c00bd
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/e23c00bd
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/e23c00bd

Branch: refs/heads/master
Commit: e23c00bdd8eb6e1ff8619ebce5dadf13eb82b33a
Parents: 12a6917 26f81b9
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Fri Feb 19 19:48:32 2016 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Fri Feb 19 19:48:32 2016 +0100

----------------------------------------------------------------------
 CHANGELOG.asciidoc                              |    1 +
 docs/preprocessor/preprocess-file.sh            |   29 +-
 docs/preprocessor/preprocess.sh                 |    2 +-
 docs/src/dev/developer/contributing.asciidoc    |   22 +-
 docs/src/dev/developer/release.asciidoc         |   39 +-
 .../reference/implementations-hadoop.asciidoc   |  929 +++++++++
 .../reference/implementations-intro.asciidoc    |  545 ++++++
 .../reference/implementations-neo4j.asciidoc    |  261 +++
 .../implementations-tinkergraph.asciidoc        |  144 ++
 docs/src/reference/implementations.asciidoc     | 1835 ------------------
 docs/src/reference/index.asciidoc               |    5 +-
 .../the-gremlin-console/index.asciidoc          |    2 +
 .../tinkerpop/gremlin/driver/Connection.java    |   62 +-
 .../groovy/engine/GremlinExecutorTest.java      |    3 +-
 .../server/op/AbstractEvalOpProcessor.java      |   55 +-
 .../gremlin/server/op/session/Session.java      |    6 +-
 .../server/op/session/SessionOpProcessor.java   |    4 +-
 .../server/GremlinServerIntegrateTest.java      |   27 +
 18 files changed, 2066 insertions(+), 1905 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/e23c00bd/CHANGELOG.asciidoc
----------------------------------------------------------------------

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/e23c00bd/docs/src/reference/implementations-hadoop.asciidoc
----------------------------------------------------------------------
diff --cc docs/src/reference/implementations-hadoop.asciidoc
index 0000000,376f377..6999616
mode 000000,100644..100644
--- a/docs/src/reference/implementations-hadoop.asciidoc
+++ b/docs/src/reference/implementations-hadoop.asciidoc
@@@ -1,0 -1,929 +1,929 @@@
+ ////
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ////
+ [[hadoop-gremlin]]
+ Hadoop-Gremlin
+ --------------
+ 
+ [source,xml]
+ ----
+ <dependency>
+    <groupId>org.apache.tinkerpop</groupId>
+    <artifactId>hadoop-gremlin</artifactId>
+    <version>x.y.z</version>
+ </dependency>
+ ----
+ 
+ image:hadoop-logo-notext.png[width=100,float=left] link:http://hadoop.apache.org/[Hadoop] is a distributed
+ computing framework that is used to process data represented across a multi-machine compute cluster. When the
+ data in the Hadoop cluster represents a TinkerPop3 graph, then Hadoop-Gremlin can be used to process the graph
+ using both TinkerPop3's OLTP and OLAP graph computing models.
+ 
+ IMPORTANT: This section assumes that the user has a Hadoop 2.x cluster functioning. For more information on getting
+ started with Hadoop, please see the
+ link:http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html[Single Node Setup]
+ tutorial. Moreover, if using `GiraphGraphComputer` or `SparkGraphComputer` it is advisable that the reader also
+ familiarize their self with Giraph (link:http://giraph.apache.org/quick_start.html[Getting Started]) and Spark
+ (link:http://spark.apache.org/docs/latest/quick-start.html[Quick Start]).
+ 
+ Installing Hadoop-Gremlin
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
+ 
+ The `HADOOP_GREMLIN_LIBS` references locations that contains jars that should be uploaded to a respective
+ distributed cache (link:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html[YARN] or SparkServer).
+ Note that the locations in `HADOOP_GREMLIN_LIBS` can be a colon-separated (`:`) and all jars from all locations will
+ be loaded into the cluster. Typically, only the jars of the respective GraphComputer are required to be loaded (e.g.
+ `GiraphGraphComputer` plugin lib directory).
+ 
+ [source,shell]
+ export HADOOP_GREMLIN_LIBS=/usr/local/gremlin-console/ext/giraph-gremlin/lib
+ 
+ If using <<gremlin-console,Gremlin Console>>, it is important to install the Hadoop-Gremlin plugin. Note that
+ Hadoop-Gremlin requires a Gremlin Console restart after installing.
+ 
+ [source,text]
+ ----
+ $ bin/gremlin.sh
+ 
+          \,,,/
+          (o o)
+ -----oOOo-(3)-oOOo-----
+ plugin activated: tinkerpop.server
+ plugin activated: tinkerpop.utilities
+ plugin activated: tinkerpop.tinkergraph
+ gremlin> :install org.apache.tinkerpop hadoop-gremlin x.y.z
+ ==>loaded: [org.apache.tinkerpop, hadoop-gremlin, x.y.z] - restart the console to use [tinkerpop.hadoop]
+ gremlin> :q
+ $ bin/gremlin.sh
+ 
+          \,,,/
+          (o o)
+ -----oOOo-(3)-oOOo-----
+ plugin activated: tinkerpop.server
+ plugin activated: tinkerpop.utilities
+ plugin activated: tinkerpop.tinkergraph
+ gremlin> :plugin use tinkerpop.hadoop
+ ==>tinkerpop.hadoop activated
+ gremlin>
+ ----
+ 
+ Properties Files
+ ~~~~~~~~~~~~~~~~
+ 
+ `HadoopGraph` makes use of properties files which ultimately get turned into Apache configurations and/or
+ Hadoop configurations. The example properties file presented below is located at `conf/hadoop/hadoop-gryo.properties`.
+ 
+ [source,text]
+ gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+ gremlin.hadoop.inputLocation=tinkerpop-modern.kryo
+ gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+ gremlin.hadoop.outputLocation=output
+ gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
+ gremlin.hadoop.jarsInDistributedCache=true
+ ####################################
+ # Spark Configuration              #
+ ####################################
+ spark.master=local[4]
+ spark.executor.memory=1g
+ spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+ ####################################
+ # SparkGraphComputer Configuration #
+ ####################################
+ gremlin.spark.graphInputRDD=org.apache.tinkerpop.gremlin.spark.structure.io.InputRDDFormat
+ gremlin.spark.graphOutputRDD=org.apache.tinkerpop.gremlin.spark.structure.io.OutputRDDFormat
+ gremlin.spark.persistContext=true
+ #####################################
+ # GiraphGraphComputer Configuration #
+ #####################################
+ giraph.minWorkers=2
+ giraph.maxWorkers=2
+ giraph.useOutOfCoreGraph=true
+ giraph.useOutOfCoreMessages=true
+ mapreduce.map.java.opts=-Xmx1024m
+ mapreduce.reduce.java.opts=-Xmx1024m
+ giraph.numInputThreads=2
+ giraph.numComputeThreads=2
+ 
+ A review of the Hadoop-Gremlin specific properties are provided in the table below. For the respective OLAP
+ engines (<<sparkgraphcomputer,`SparkGraphComputer`>> or <<giraphgraphcomputer,`GiraphGraphComputer`>>) refer
+ to their respective documentation for configuration options.
+ 
+ [width="100%",cols="2,10",options="header"]
+ |=========================================================
+ |Property |Description
+ |gremlin.graph |The class of the graph to construct using GraphFactory.
+ |gremlin.hadoop.inputLocation |The location of the input file(s) for Hadoop-Gremlin to read the graph from.
+ |gremlin.hadoop.graphInputFormat |The format that the graph input file(s) are represented in.
+ |gremlin.hadoop.outputLocation |The location to write the computed HadoopGraph to.
+ |gremlin.hadoop.graphOutputFormat |The format that the output file(s) should be represented in.
+ |gremlin.hadoop.jarsInDistributedCache |Whether to upload the Hadoop-Gremlin jars to a distributed cache (necessary if jars are not on the machines' classpaths).
+ |=========================================================
+ 
+ 
+ 
+ Along with the properties above, the numerous link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[Hadoop specific properties]
+ can be added as needed to tune and parameterize the executed Hadoop-Gremlin job on the respective Hadoop cluster.
+ 
+ IMPORTANT: As the size of the graphs being processed becomes large, it is important to fully understand how the
+ underlying OLAP engine (e.g. Spark, Giraph, etc.) works and understand the numerous parameterizations offered by
+ these systems. Such knowledge can help alleviate out of memory exceptions, slow load times, slow processing times,
+ garbage collection issues, etc.
+ 
+ OLTP Hadoop-Gremlin
+ ~~~~~~~~~~~~~~~~~~~
+ 
+ image:hadoop-pipes.png[width=180,float=left] It is possible to execute OLTP operations over a `HadoopGraph`.
+ However, realize that the underlying HDFS files are not random access and thus, to retrieve a vertex, a linear scan
+ is required. OLTP operations are useful for peeking into the graph prior to executing a long running OLAP job -- e.g.
+ `g.V().valueMap().limit(10)`.
+ 
+ CAUTION: OLTP operations on `HadoopGraph` are not efficient. They require linear scans to execute and are unreasonable
+ for large graphs. In such large graph situations, make use of <<traversalvertexprogram,TraversalVertexProgram>>
+ which is the OLAP Gremlin machine.
+ 
+ [gremlin-groovy]
+ ----
+ hdfs.copyFromLocal('data/tinkerpop-modern.kryo', 'tinkerpop-modern.kryo')
+ hdfs.ls()
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+ g = graph.traversal()
+ g.V().count()
+ g.V().out().out().values('name')
+ g.V().group().by{it.value('name')[1]}.by('name').next()
+ ----
+ 
+ OLAP Hadoop-Gremlin
+ ~~~~~~~~~~~~~~~~~~~
+ 
+ image:hadoop-furnace.png[width=180,float=left] Hadoop-Gremlin was designed to execute OLAP operations via
+ `GraphComputer`. The OLTP examples presented previously are reproduced below, but using `TraversalVertexProgram`
+ for the execution of the Gremlin traversal.
+ 
+ A `Graph` in TinkerPop3 can support any number of `GraphComputer` implementations. Out of the box, Hadoop-Gremlin
+ supports the following three implementations.
+ 
+ * <<mapreducegraphcomputer,`MapReduceGraphComputer`>>: Leverages Hadoop's MapReduce engine to execute TinkerPop3 OLAP
+ computations. (*coming soon*)
+ ** The graph must fit within the total disk space of the Hadoop cluster (supports massive graphs). Message passing is
+ coordinated via MapReduce jobs over the on-disk graph (slow traversals).
+ * <<sparkgraphcomputer,`SparkGraphComputer`>>: Leverages Apache Spark to execute TinkerPop3 OLAP computations.
+ ** The graph may fit within the total RAM of the cluster (supports larger graphs). Message passing is coordinated via
+ Spark map/reduce/join operations on in-memory and disk-cached data (average speed traversals).
+ * <<giraphgraphcomputer,`GiraphGraphComputer`>>: Leverages Apache Giraph to execute TinkerPop3 OLAP computations.
+ ** The graph should fit within the total RAM of the Hadoop cluster (graph size restriction), though "out-of-core"
+ processing is possible. Message passing is coordinated via ZooKeeper for the in-memory graph (speedy traversals).
+ 
+ TIP: image:gremlin-sugar.png[width=50,float=left] For those wanting to use the <<sugar-plugin,SugarPlugin>> with
+ their submitted traversal, do `:remote config useSugar true` as well as `:plugin use tinkerpop.sugar` at the start of
+ the Gremlin Console session if it is not already activated.
+ 
+ Note that `SparkGraphComputer` and `GiraphGraphComputer` are loaded via their respective plugins. Typically only
+ one plugin or the other is loaded depending on the desired `GraphComputer` to use.
+ 
+ [source,text]
+ ----
+ $ bin/gremlin.sh
+ 
+          \,,,/
+          (o o)
+ -----oOOo-(3)-oOOo-----
+ plugin activated: tinkerpop.server
+ plugin activated: tinkerpop.utilities
+ plugin activated: tinkerpop.tinkergraph
+ plugin activated: tinkerpop.hadoop
+ gremlin> :install org.apache.tinkerpop giraph-gremlin x.y.z
+ ==>loaded: [org.apache.tinkerpop, giraph-gremlin, x.y.z] - restart the console to use [tinkerpop.giraph]
+ gremlin> :install org.apache.tinkerpop spark-gremlin x.y.z
+ ==>loaded: [org.apache.tinkerpop, spark-gremlin, x.y.z] - restart the console to use [tinkerpop.spark]
+ gremlin> :q
+ $ bin/gremlin.sh
+ 
+          \,,,/
+          (o o)
+ -----oOOo-(3)-oOOo-----
+ plugin activated: tinkerpop.server
+ plugin activated: tinkerpop.utilities
+ plugin activated: tinkerpop.tinkergraph
+ plugin activated: tinkerpop.hadoop
+ gremlin> :plugin use tinkerpop.giraph
+ ==>tinkerpop.giraph activated
+ gremlin> :plugin use tinkerpop.spark
+ ==>tinkerpop.spark activated
+ ----
+ 
+ WARNING: Hadoop, Spark, and Giraph all depend on many of the same libraries (e.g. ZooKeeper, Snappy, Netty, Guava,
+ etc.). Unfortunately, typically these dependencies are not to the same versions of the respective libraries. As such,
+ it is best to *not* have both Spark and Giraph plugins loaded in the same console session nor in the same Java
+ project (though intelligent `<exclusion>`-usage can help alleviate conflicts in a Java project).
+ 
+ CAUTION: It is important to note that when doing an OLAP traversal, any resulting vertices, edges, or properties will be
+ attached to the source graph. For Hadoop-based graphs, this may lead to linear search times on massive graphs. Thus,
+ if vertex, edge, or property objects are to be returns (as a final result), it is best to `.id()` to get the id
+ of the object and not the actual attached object.
+ 
+ [[mapreducegraphcomputer]]
+ MapReduceGraphComputer
+ ^^^^^^^^^^^^^^^^^^^^^^
+ 
+ *COMING SOON*
+ 
+ [[sparkgraphcomputer]]
+ SparkGraphComputer
+ ^^^^^^^^^^^^^^^^^^
+ 
+ [source,xml]
+ ----
+ <dependency>
+    <groupId>org.apache.tinkerpop</groupId>
+    <artifactId>spark-gremlin</artifactId>
+    <version>x.y.z</version>
+ </dependency>
+ ----
+ 
+ image:spark-logo.png[width=175,float=left] link:http://spark.apache.org[Spark] is an Apache Software Foundation
+ project focused on general-purpose OLAP data processing. Spark provides a hybrid in-memory/disk-based distributed
+ computing model that is similar to Hadoop's MapReduce model. Spark maintains a fluent function chaining DSL that is
+ arguably easier for developers to work with than native Hadoop MapReduce. Spark-Gremlin provides an implementation of
+ the bulk-synchronous parallel, distributed message passing algorithm within Spark and thus, any `VertexProgram` can be
+ executed over `SparkGraphComputer`.
+ 
+ If `SparkGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
+ specified in `HADOOP_GREMLIN_LIBS`.
+ 
+ [source,shell]
+ export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/spark-gremlin/lib
+ 
+ Furthermore the `lib/` directory should be distributed across all machines in the SparkServer cluster. For this purpose TinkerPop
+ provides a helper script, which takes the Spark installation directory and the the Spark machines as input:
+ 
+ [source,shell]
+ bin/init-tp-spark.sh /usr/local/spark spark@10.0.0.1 spark@10.0.0.2 spark@10.0.0.3
+ 
+ Once the `lib/` directory is distributed, `SparkGraphComputer` can be used as follows.
+ 
+ [gremlin-groovy]
+ ----
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
 -g = graph.traversal(computer(SparkGraphComputer))
++g = graph.traversal().withComputer(SparkGraphComputer)
+ g.V().count()
+ g.V().out().out().values('name')
+ ----
+ 
+ For using lambdas in Gremlin-Groovy, simply provide `:remote connect` a `TraversalSource` which leverages SparkGraphComputer.
+ 
+ [gremlin-groovy]
+ ----
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
 -g = graph.traversal(computer(SparkGraphComputer))
++g = graph.traversal().withComputer(SparkGraphComputer)
+ :remote connect tinkerpop.hadoop graph g
+ :> g.V().group().by{it.value('name')[1]}.by('name')
+ ----
+ 
+ The `SparkGraphComputer` algorithm leverages Spark's caching abilities to reduce the amount of data shuffled across
+ the wire on each iteration of the <<vertexprogram,`VertexProgram`>>. When the graph is loaded as a Spark RDD
+ (Resilient Distributed Dataset) it is immediately cached as `graphRDD`. The `graphRDD` is a distributed adjacency
+ list which encodes the vertex, its properties, and all its incident edges. On the first iteration, each vertex
+ (in parallel) is passed through `VertexProgram.execute()`. This yields an output of the vertex's mutated state
+ (i.e. updated compute keys -- `propertyX`) and its outgoing messages. This `viewOutgoingRDD` is then reduced to
+ `viewIncomingRDD` where the outgoing messages are sent to their respective vertices. If a `MessageCombiner` exists
+ for the vertex program, then messages are aggregated locally and globally to ultimately yield one incoming message
+ for the vertex. This reduce sequence is the "message pass." If the vertex program does not terminate on this
+ iteration, then the `viewIncomingRDD` is joined with the cached `graphRDD` and the process continues. When there
+ are no more iterations, there is a final join and the resultant RDD is stripped of its edges and messages. This
+ `mapReduceRDD` is cached and is processed by each <<mapreduce,`MapReduce`>> job in the
+ <<graphcomputer,`GraphComputer`>> computation.
+ 
+ image::spark-algorithm.png[width=775]
+ 
+ [width="100%",cols="2,10",options="header"]
+ |========================================================
+ |Property |Description
+ |gremlin.spark.graphInputRDD |A class for creating RDD's from underlying graph data, defaults to Hadoop `InputFormat`.
+ |gremlin.spark.graphOutputRDD |A class for output RDD's, defaults to Hadoop `OutputFormat`.
+ |gremlin.spark.graphStorageLevel |What `StorageLevel` to use for the cached graph during job execution (default `MEMORY_ONLY`).
+ |gremlin.spark.persistContext |Whether to create a new `SparkContext` for every `SparkGraphComputer` or to reuse an existing one.
+ |gremlin.spark.persistStorageLevel |What `StorageLevel` to use when persisted RDDs via `PersistedOutputRDD` (default `MEMORY_ONLY`).
+ |========================================================
+ 
+ InputRDD and OutputRDD
+ ++++++++++++++++++++++
+ 
+ If the provider/user does not want to use Hadoop `InputFormats`, it is possible to leverage Spark's RDD
+ constructs directly. There is a `gremlin.spark.graphInputRDD` configuration that references a `Class<? extends
+ InputRDD>`. An `InputRDD` provides a read method that takes a `SparkContext` and returns a graphRDD. Likewise, use
+ `gremlin.spark.graphOutputRDD` and the respective `OutputRDD`.
+ 
+ If the graph system provider uses an `InputRDD`, the RDD should maintain an associated `org.apache.spark.Partitioner`. By doing so,
+ `SparkGraphComputer` will not partition the loaded graph across the cluster as it has already been partitioned by the graph system provider.
+ This can save a significant amount of time and space resources.
+ If the `InputRDD` does not have a registered partitioner, `SparkGraphComputer` will partition the graph using
+ a `org.apache.spark.HashPartitioner` with the number of partitions being either the number of existing partitions in the input (e.g. input splits)
+ or the user specified number of `GraphComputer.workers()`.
+ 
+ Using a Persisted Context
+ +++++++++++++++++++++++++
+ 
+ It is possible to persist the graph RDD between jobs within the `SparkContext` (e.g. SparkServer) by leveraging `PersistedOutputRDD`.
+ Note that `gremlin.spark.persistContext` should be set to `true` or else the persisted RDD will be destroyed when the `SparkContext` closes.
+ The persisted RDD is named by the `gremlin.hadoop.outputLocation` configuration. Similarly, `PersistedInputRDD` is used with respective
+ `gremlin.hadoop.inputLocation` to retrieve the persisted RDD from the `SparkContext`.
+ 
+ When using a persistent `SparkContext` the configuration used by the original Spark Configuration will be inherited by all threaded
+ references to that Spark Context. The exception to this rule are those properties which have a specific thread local effect.
+ 
+ .Thread Local Properties
+ . spark.jobGroup.id
+ . spark.job.description
+ . spark.job.interruptOnCancel
+ . spark.scheduler.pool
+ 
+ Finally, there is a `spark` object that can be used to manage persisted RDDs (see <<interacting-with-spark, Interacting with Spark>>).
+ 
+ [[bulkdumpervertexprogramusingspark]]
+ Exporting with BulkDumperVertexProgram
+ ++++++++++++++++++++++++++++++++++++++
+ 
+ The <<bulkdumpervertexprogram, BulkDumperVertexProgram>> exports a whole graph in any of the supported Hadoop GraphOutputFormats (`GraphSONOutputFormat`,
+ `GryoOutputFormat` or `ScriptOutputFormat`). The example below takes a Hadoop graph as the input (in `GryoInputFormat`) and exports it as a GraphSON file
+ (`GraphSONOutputFormat`).
+ 
+ [gremlin-groovy]
+ ----
+ hdfs.copyFromLocal('data/tinkerpop-modern.kryo', 'tinkerpop-modern.kryo')
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+ graph.configuration().setProperty('gremlin.hadoop.graphOutputFormat', 'org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat')
+ graph.compute(SparkGraphComputer).program(BulkDumperVertexProgram.build().create()).submit().get()
+ hdfs.ls('output')
+ hdfs.head('output/~g')
+ ----
+ 
+ Loading with BulkLoaderVertexProgram
+ ++++++++++++++++++++++++++++++++++++
+ 
+ The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load large
+ amounts of data to and from different `Graph` implementations. The following code demonstrates how to load the
+ Grateful Dead graph from HadoopGraph into TinkerGraph over Spark:
+ 
+ [gremlin-groovy]
+ ----
+ hdfs.copyFromLocal('data/grateful-dead.kryo', 'grateful-dead.kryo')
+ readGraph = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
+ writeGraph = 'conf/tinkergraph-gryo.properties'
+ blvp = BulkLoaderVertexProgram.build().
+            keepOriginalIds(false).
+            writeGraph(writeGraph).create(readGraph)
+ readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()
+ :set max-iteration 10
+ graph = GraphFactory.open(writeGraph)
+ g = graph.traversal()
+ g.V().valueMap()
+ graph.close()
+ ----
+ 
+ [source,properties]
+ ----
+ # hadoop-grateful-gryo.properties
+ 
+ #
+ # Hadoop Graph Configuration
+ #
+ gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+ gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+ gremlin.hadoop.inputLocation=grateful-dead.kryo
+ gremlin.hadoop.outputLocation=output
+ gremlin.hadoop.jarsInDistributedCache=true
+ 
+ #
+ # SparkGraphComputer Configuration
+ #
+ spark.master=local[1]
+ spark.executor.memory=1g
+ spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+ ----
+ 
+ [source,properties]
+ ----
+ # tinkergraph-gryo.properties
+ 
+ gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+ gremlin.tinkergraph.graphFormat=gryo
+ gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
+ ----
+ 
+ IMPORTANT: The path to TinkerGraph jars needs to be included in the `HADOOP_GREMLIN_LIBS` for the above example to work.
+ 
+ [[giraphgraphcomputer]]
+ GiraphGraphComputer
+ ^^^^^^^^^^^^^^^^^^^
+ 
+ [source,xml]
+ ----
+ <dependency>
+    <groupId>org.apache.tinkerpop</groupId>
+    <artifactId>giraph-gremlin</artifactId>
+    <version>x.y.z</version>
+ </dependency>
+ ----
+ 
+ image:giraph-logo.png[width=100,float=left] link:http://giraph.apache.org[Giraph] is an Apache Software Foundation
+ project focused on OLAP-based graph processing. Giraph makes use of the distributed graph computing paradigm made
+ popular by Google's Pregel. In Giraph, developers write "vertex programs" that get executed at each vertex in
+ parallel. These programs communicate with one another in a bulk synchronous parallel (BSP) manner. This model aligns
+ with TinkerPop3's `GraphComputer` API. TinkerPop3 provides an implementation of `GraphComputer` that works for Giraph
+ called `GiraphGraphComputer`. Moreover, with TinkerPop3's <<mapreduce,MapReduce>>-framework, the standard
+ Giraph/Pregel model is extended to support an arbitrary number of MapReduce phases to aggregate and yield results
+ from the graph. Below are examples using `GiraphGraphComputer` from the <<gremlin-console,Gremlin-Console>>.
+ 
+ WARNING: Giraph uses a large number of Hadoop counters. The default for Hadoop is 120. In `mapred-site.xml` it is
+ possible to increase the limit it via the `mapreduce.job.counters.max` property. A good value to use is 1000. This
+ is a cluster-wide property so be sure to restart the cluster after updating.
+ 
+ WARNING: The maximum number of workers can be no larger than the number of map-slots in the Hadoop cluster minus 1.
+ For example, if the Hadoop cluster has 4 map slots, then `giraph.maxWorkers` can not be larger than 3. One map-slot
+ is reserved for the master compute node and all other slots can be allocated as workers to execute the VertexPrograms
+ on the vertices of the graph.
+ 
+ If `GiraphGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
+ specified in `HADOOP_GREMLIN_LIBS`.
+ 
+ [source,shell]
+ export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/giraph-gremlin/lib
+ 
+ Or, the user can specify the directory in the Gremlin Console.
+ 
+ [source,groovy]
+ System.setProperty('HADOOP_GREMLIN_LIBS',System.getProperty('HADOOP_GREMLIN_LIBS') + ':' + '/usr/local/gremlin-console/ext/giraph-gremlin/lib')
+ 
+ [gremlin-groovy]
+ ----
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
 -g = graph.traversal(computer(GiraphGraphComputer))
++g = graph.traversal().withComputer(GiraphGraphComputer)
+ g.V().count()
+ g.V().out().out().values('name')
+ ----
+ 
+ IMPORTANT: The examples above do not use lambdas (i.e. closures in Gremlin-Groovy). This makes the traversal
+ serializable and thus, able to be distributed to all machines in the Hadoop cluster. If a lambda is required in a
+ traversal, then the traversal must be sent as a `String` and compiled locally at each machine in the cluster. The
+ following example demonstrates the `:remote` command which allows for submitting Gremlin traversals as a `String`.
+ 
+ [gremlin-groovy]
+ ----
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
 -g = graph.traversal(computer(GiraphGraphComputer))
++g = graph.traversal().withComputer(GiraphGraphComputer)
+ :remote connect tinkerpop.hadoop graph g
+ :> g.V().group().by{it.value('name')[1]}.by('name')
+ result
+ result.memory.runtime
+ result.memory.keys()
+ result.memory.get('~reducing')
+ ----
+ 
+ NOTE: If the user explicitly specifies `giraph.maxWorkers` and/or `giraph.numComputeThreads` in the configuration,
+ then these values will be used by Giraph. However, if these are not specified and the user never calls
+ `GraphComputer.workers()` then `GiraphGraphComputer` will try to compute the number of workers/threads to use based
+ on the cluster's profile.
+ 
+ Loading with BulkLoaderVertexProgram
+ ++++++++++++++++++++++++++++++++++++
+ 
+ The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load
+ large amounts of data to and from different `Graph` implementations. The following code demonstrates how to load
+ the Grateful Dead graph from HadoopGraph into TinkerGraph over Giraph:
+ 
+ [gremlin-groovy]
+ ----
+ hdfs.copyFromLocal('data/grateful-dead.kryo', 'grateful-dead.kryo')
+ readGraph = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
+ writeGraph = 'conf/tinkergraph-gryo.properties'
+ blvp = BulkLoaderVertexProgram.build().
+            keepOriginalIds(false).
+            writeGraph(writeGraph).create(readGraph)
+ readGraph.compute(GiraphGraphComputer).workers(1).program(blvp).submit().get()
+ :set max-iteration 10
+ graph = GraphFactory.open(writeGraph)
+ g = graph.traversal()
+ g.V().valueMap()
+ graph.close()
+ ----
+ 
+ [source,properties]
+ ----
+ # hadoop-grateful-gryo.properties
+ 
+ #
+ # Hadoop Graph Configuration
+ #
+ gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+ gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+ gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
+ gremlin.hadoop.inputLocation=grateful-dead.kryo
+ gremlin.hadoop.outputLocation=output
+ gremlin.hadoop.jarsInDistributedCache=true
+ 
+ #
+ # GiraphGraphComputer Configuration
+ #
+ giraph.minWorkers=1
+ giraph.maxWorkers=1
+ giraph.useOutOfCoreGraph=true
+ giraph.useOutOfCoreMessages=true
+ mapred.map.child.java.opts=-Xmx1024m
+ mapred.reduce.child.java.opts=-Xmx1024m
+ giraph.numInputThreads=4
+ giraph.numComputeThreads=4
+ giraph.maxMessagesInMemory=100000
+ ----
+ 
+ [source,properties]
+ ----
+ # tinkergraph-gryo.properties
+ 
+ gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+ gremlin.tinkergraph.graphFormat=gryo
+ gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
+ ----
+ 
+ NOTE: The path to TinkerGraph needs to be included in the `HADOOP_GREMLIN_LIBS` for the above example to work.
+ 
+ Input/Output Formats
+ ~~~~~~~~~~~~~~~~~~~~
+ 
+ image:adjacency-list.png[width=300,float=right] Hadoop-Gremlin provides various I/O formats -- i.e. Hadoop
+ `InputFormat` and `OutputFormat`. All of the formats make use of an link:http://en.wikipedia.org/wiki/Adjacency_list[adjacency list]
+ representation of the graph where each "row" represents a single vertex, its properties, and its incoming and
+ outgoing edges.
+ 
+ {empty} +
+ 
+ [[gryo-io-format]]
+ Gryo I/O Format
+ ^^^^^^^^^^^^^^^
+ 
+ * **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat`
+ * **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat`
+ 
+ <<gryo-reader-writer,Gryo>> is a binary graph format that leverages link:https://github.com/EsotericSoftware/kryo[Kryo]
+ to make a compact, binary representation of a vertex. It is recommended that users leverage Gryo given its space/time
+ savings over text-based representations.
+ 
+ NOTE: The `GryoInputFormat` is splittable.
+ 
+ [[graphson-io-format]]
+ GraphSON I/O Format
+ ^^^^^^^^^^^^^^^^^^^
+ 
+ * **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat`
+ * **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat`
+ 
+ <<graphson-reader-writer,GraphSON>> is a JSON based graph format. GraphSON is a space-expensive graph format in that
+ it is a text-based markup language. However, it is convenient for many developers to work with as its structure is
+ simple (easy to create and parse).
+ 
+ The data below represents an adjacency list representation of the classic TinkerGraph toy graph in GraphSON format.
+ 
+ [source,json]
+ ----
+ {"id":1,"label":"person","outE":{"created":[{"id":9,"inV":3,"properties":{"weight":0.4}}],"knows":[{"id":7,"inV":2,"properties":{"weight":0.5}},{"id":8,"inV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":0,"value":"marko"}],"age":[{"id":1,"value":29}]}}
+ {"id":2,"label":"person","inE":{"knows":[{"id":7,"outV":1,"properties":{"weight":0.5}}]},"properties":{"name":[{"id":2,"value":"vadas"}],"age":[{"id":3,"value":27}]}}
+ {"id":3,"label":"software","inE":{"created":[{"id":9,"outV":1,"properties":{"weight":0.4}},{"id":11,"outV":4,"properties":{"weight":0.4}},{"id":12,"outV":6,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":4,"value":"lop"}],"lang":[{"id":5,"value":"java"}]}}
+ {"id":4,"label":"person","inE":{"knows":[{"id":8,"outV":1,"properties":{"weight":1.0}}]},"outE":{"created":[{"id":10,"inV":5,"properties":{"weight":1.0}},{"id":11,"inV":3,"properties":{"weight":0.4}}]},"properties":{"name":[{"id":6,"value":"josh"}],"age":[{"id":7,"value":32}]}}
+ {"id":5,"label":"software","inE":{"created":[{"id":10,"outV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":8,"value":"ripple"}],"lang":[{"id":9,"value":"java"}]}}
+ {"id":6,"label":"person","outE":{"created":[{"id":12,"inV":3,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":10,"value":"peter"}],"age":[{"id":11,"value":35}]}}
+ ----
+ 
+ [[script-io-format]]
+ Script I/O Format
+ ^^^^^^^^^^^^^^^^^
+ 
+ * **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat`
+ * **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptOutputFormat`
+ 
+ `ScriptInputFormat` and `ScriptOutputFormat` take an arbitrary script and use that script to either read or write
+ `Vertex` objects, respectively. This can be considered the most general `InputFormat`/`OutputFormat` possible in that
+ Hadoop-Gremlin uses the user provided script for all reading/writing.
+ 
+ ScriptInputFormat
+ +++++++++++++++++
+ 
+ The data below represents an adjacency list representation of the classic TinkerGraph toy graph. First line reads,
+ "vertex `1`, labeled `person` having 2 property values (`marko` and `29`) has 3 outgoing edges; the first edge is
+ labeled `knows`, connects the current vertex `1` with vertex `2` and has a property value `0.4`, and so on."
+ 
+ [source]
+ 1:person:marko:29 knows:2:0.5,knows:4:1.0,created:3:0.4
+ 2:person:vadas:27
+ 3:project:lop:java
+ 4:person:josh:32 created:3:0.4,created:5:1.0
+ 5:project:ripple:java
+ 6:person:peter:35 created:3:0.2
+ 
+ There is no corresponding `InputFormat` that can parse this particular file (or some adjacency list variant of it).
+ As such, `ScriptInputFormat` can be used. With `ScriptInputFormat` a script is stored in HDFS and leveraged by each
+ mapper in the Hadoop job. The script must have the following method defined:
+ 
+ [source,groovy]
+ def parse(String line, ScriptElementFactory factory) { ... }
+ 
+ `ScriptElementFactory` is a legacy from previous versions and, although it's still functional, it should no longer be used.
+ In order to create vertices and edges, the `parse()` method gets access to a global variable named `graph`, which holds
+ the local `StarGraph` for the current line/vertex.
+ 
+ An appropriate `parse()` for the above adjacency list file is:
+ 
+ [source,groovy]
+ def parse(line, factory) {
+     def parts = line.split(/ /)
+     def (id, label, name, x) = parts[0].split(/:/).toList()
+     def v1 = graph.addVertex(T.id, id, T.label, label)
+     if (name != null) v1.property('name', name) // first value is always the name
+     if (x != null) {
+         // second value depends on the vertex label; it's either
+         // the age of a person or the language of a project
+         if (label.equals('project')) v1.property('lang', x)
+         else v1.property('age', Integer.valueOf(x))
+     }
+     if (parts.length == 2) {
+         parts[1].split(/,/).grep { !it.isEmpty() }.each {
+             def (eLabel, refId, weight) = it.split(/:/).toList()
+             def v2 = graph.addVertex(T.id, refId)
+             v1.addOutEdge(eLabel, v2, 'weight', Double.valueOf(weight))
+         }
+     }
+     return v1
+ }
+ 
+ The resultant `Vertex` denotes whether the line parsed yielded a valid Vertex. As such, if the line is not valid
+ (e.g. a comment line, a skip line, etc.), then simply return `null`.
+ 
+ ScriptOutputFormat Support
+ ++++++++++++++++++++++++++
+ 
+ The principle above can also be used to convert a vertex to an arbitrary `String` representation that is ultimately
+ streamed back to a file in HDFS. This is the role of `ScriptOutputFormat`. `ScriptOutputFormat` requires that the
+ provided script maintains a method with the following signature:
+ 
+ [source,groovy]
+ def stringify(Vertex vertex) { ... }
+ 
+ An appropriate `stringify()` to produce output in the same format that was shown in the `ScriptInputFormat` sample is:
+ 
+ [source,groovy]
+ def stringify(vertex) {
+     def v = vertex.values('name', 'age', 'lang').inject(vertex.id(), vertex.label()).join(':')
+     def outE = vertex.outE().map {
+         def e = it.get()
+         e.values('weight').inject(e.label(), e.inV().next().id()).join(':')
+     }.join(',')
+     return [v, outE].join('\t')
+ }
+ 
+ 
+ 
+ Storage Systems
+ ~~~~~~~~~~~~~~~
+ 
+ Hadoop-Gremlin provides two implementations of the `Storage` API:
+ 
+ * `FileSystemStorage`: Access HDFS and local file system data.
+ * `SparkContextStorage`: Access Spark persisted RDD data.
+ 
+ [[interacting-with-hdfs]]
+ Interacting with HDFS
+ ^^^^^^^^^^^^^^^^^^^^^
+ 
+ The distributed file system of Hadoop is called link:http://en.wikipedia.org/wiki/Apache_Hadoop#Hadoop_distributed_file_system[HDFS].
+ The results of any OLAP operation are stored in HDFS accessible via `hdfs`. For local file system access, there is `local`.
+ 
+ [gremlin-groovy]
+ ----
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+ graph.compute(SparkGraphComputer).program(PeerPressureVertexProgram.build().create(graph)).mapReduce(ClusterCountMapReduce.build().memoryKey('clusterCount').create()).submit().get();
+ hdfs.ls()
+ hdfs.ls('output')
+ hdfs.head('output', GryoInputFormat)
+ hdfs.head('output', 'clusterCount', SequenceFileInputFormat)
+ hdfs.rm('output')
+ hdfs.ls()
+ ----
+ 
+ [[interacting-with-spark]]
+ Interacting with Spark
+ ^^^^^^^^^^^^^^^^^^^^^^
+ 
+ If a Spark context is persisted, then Spark RDDs will remain the Spark cache and accessible over subsequent jobs.
+ RDDs are retrieved and saved to the `SparkContext` via `PersistedInputRDD` and `PersistedOutputRDD` respectivly.
+ Persisted RDDs can be accessed using `spark`.
+ 
+ [gremlin-groovy]
+ ----
+ Spark.create('local[4]')
+ graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+ graph.configuration().setProperty('gremlin.spark.graphOutputRDD', PersistedOutputRDD.class.getCanonicalName())
+ graph.configuration().clearProperty('gremlin.hadoop.graphOutputFormat')
+ graph.configuration().setProperty('gremlin.spark.persistContext',true)
+ graph.compute(SparkGraphComputer).program(PeerPressureVertexProgram.build().create(graph)).mapReduce(ClusterCountMapReduce.build().memoryKey('clusterCount').create()).submit().get();
+ spark.ls()
+ spark.ls('output')
+ spark.head('output', PersistedInputRDD)
+ spark.head('output', 'clusterCount', PersistedInputRDD)
+ spark.rm('output')
+ spark.ls()
+ ----
+ 
+ A Command Line Example
+ ~~~~~~~~~~~~~~~~~~~~~~
+ 
+ image::pagerank-logo.png[width=300]
+ 
+ The classic link:http://en.wikipedia.org/wiki/PageRank[PageRank] centrality algorithm can be executed over the
+ TinkerPop graph from the command line using `GiraphGraphComputer`.
+ 
+ WARNING: Be sure that the `HADOOP_GREMLIN_LIBS` references the location `lib` directory of the respective
+ `GraphComputer` engine being used or else the requisite dependencies will not be uploaded to the Hadoop cluster.
+ 
+ [source,text]
+ ----
+ $ hdfs dfs -copyFromLocal data/tinkerpop-modern.json tinkerpop-modern.json
+ $ hdfs dfs -ls
+ Found 2 items
+ -rw-r--r--   1 marko supergroup       2356 2014-07-28 13:00 /user/marko/tinkerpop-modern.json
+ $ hadoop jar target/giraph-gremlin-x.y.z-job.jar org.apache.tinkerpop.gremlin.giraph.process.computer.GiraphGraphComputer ../hadoop-gremlin/conf/hadoop-graphson.properties
+ 15/09/11 08:02:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+ 15/09/11 08:02:11 INFO computer.GiraphGraphComputer: HadoopGremlin(Giraph): PageRankVertexProgram[alpha=0.85,iterations=30]
+ 15/09/11 08:02:12 INFO mapreduce.JobSubmitter: number of splits:3
+ 15/09/11 08:02:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441915907347_0028
+ 15/09/11 08:02:12 INFO impl.YarnClientImpl: Submitted application application_1441915907347_0028
+ 15/09/11 08:02:12 INFO job.GiraphJob: Tracking URL: http://markos-macbook:8088/proxy/application_1441915907347_0028/
+ 15/09/11 08:02:12 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 3 mappers
+ 15/09/11 08:03:54 INFO mapreduce.Job: Running job: job_1441915907347_0028
+ 15/09/11 08:03:55 INFO mapreduce.Job: Job job_1441915907347_0028 running in uber mode : false
+ 15/09/11 08:03:55 INFO mapreduce.Job:  map 33% reduce 0%
+ 15/09/11 08:03:57 INFO mapreduce.Job:  map 67% reduce 0%
+ 15/09/11 08:04:01 INFO mapreduce.Job:  map 100% reduce 0%
+ 15/09/11 08:06:17 INFO mapreduce.Job: Job job_1441915907347_0028 completed successfully
+ 15/09/11 08:06:17 INFO mapreduce.Job: Counters: 80
+     File System Counters
+         FILE: Number of bytes read=0
+         FILE: Number of bytes written=483918
+         FILE: Number of read operations=0
+         FILE: Number of large read operations=0
+         FILE: Number of write operations=0
+         HDFS: Number of bytes read=1465
+         HDFS: Number of bytes written=1760
+         HDFS: Number of read operations=39
+         HDFS: Number of large read operations=0
+         HDFS: Number of write operations=20
+     Job Counters
+         Launched map tasks=3
+         Other local map tasks=3
+         Total time spent by all maps in occupied slots (ms)=458105
+         Total time spent by all reduces in occupied slots (ms)=0
+         Total time spent by all map tasks (ms)=458105
+         Total vcore-seconds taken by all map tasks=458105
+         Total megabyte-seconds taken by all map tasks=469099520
+     Map-Reduce Framework
+         Map input records=3
+         Map output records=0
+         Input split bytes=132
+         Spilled Records=0
+         Failed Shuffles=0
+         Merged Map outputs=0
+         GC time elapsed (ms)=1594
+         CPU time spent (ms)=0
+         Physical memory (bytes) snapshot=0
+         Virtual memory (bytes) snapshot=0
+         Total committed heap usage (bytes)=527958016
+     Giraph Stats
+         Aggregate edges=0
+         Aggregate finished vertices=0
+         Aggregate sent message message bytes=13535
+         Aggregate sent messages=186
+         Aggregate vertices=6
+         Current master task partition=0
+         Current workers=2
+         Last checkpointed superstep=0
+         Sent message bytes=438
+         Sent messages=6
+         Superstep=31
+     Giraph Timers
+         Initialize (ms)=2996
+         Input superstep (ms)=5209
+         Setup (ms)=59
+         Shutdown (ms)=9324
+         Superstep 0 GiraphComputation (ms)=3861
+         Superstep 1 GiraphComputation (ms)=4027
+         Superstep 10 GiraphComputation (ms)=4000
+         Superstep 11 GiraphComputation (ms)=4004
+         Superstep 12 GiraphComputation (ms)=3999
+         Superstep 13 GiraphComputation (ms)=4000
+         Superstep 14 GiraphComputation (ms)=4005
+         Superstep 15 GiraphComputation (ms)=4003
+         Superstep 16 GiraphComputation (ms)=4001
+         Superstep 17 GiraphComputation (ms)=4007
+         Superstep 18 GiraphComputation (ms)=3998
+         Superstep 19 GiraphComputation (ms)=4006
+         Superstep 2 GiraphComputation (ms)=4007
+         Superstep 20 GiraphComputation (ms)=3996
+         Superstep 21 GiraphComputation (ms)=4006
+         Superstep 22 GiraphComputation (ms)=4002
+         Superstep 23 GiraphComputation (ms)=3998
+         Superstep 24 GiraphComputation (ms)=4003
+         Superstep 25 GiraphComputation (ms)=4001
+         Superstep 26 GiraphComputation (ms)=4003
+         Superstep 27 GiraphComputation (ms)=4005
+         Superstep 28 GiraphComputation (ms)=4002
+         Superstep 29 GiraphComputation (ms)=4001
+         Superstep 3 GiraphComputation (ms)=3988
+         Superstep 30 GiraphComputation (ms)=4248
+         Superstep 4 GiraphComputation (ms)=4010
+         Superstep 5 GiraphComputation (ms)=3998
+         Superstep 6 GiraphComputation (ms)=3996
+         Superstep 7 GiraphComputation (ms)=4005
+         Superstep 8 GiraphComputation (ms)=4009
+         Superstep 9 GiraphComputation (ms)=3994
+         Total (ms)=138788
+     File Input Format Counters
+         Bytes Read=0
+     File Output Format Counters
+         Bytes Written=0
+ $ hdfs dfs -cat output/~g/*
+ {"id":1,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.15000000000000002}],"name":[{"id":0,"value":"marko"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":3.0}],"age":[{"id":1,"value":29}]}}
+ {"id":5,"label":"software","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":35,"value":0.23181250000000003}],"name":[{"id":8,"value":"ripple"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":6,"value":0.0}],"lang":[{"id":9,"value":"java"}]}}
+ {"id":3,"label":"software","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.4018125}],"name":[{"id":4,"value":"lop"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":0.0}],"lang":[{"id":5,"value":"java"}]}}
+ {"id":4,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.19250000000000003}],"name":[{"id":6,"value":"josh"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":2.0}],"age":[{"id":7,"value":32}]}}
+ {"id":2,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":35,"value":0.19250000000000003}],"name":[{"id":2,"value":"vadas"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":6,"value":0.0}],"age":[{"id":3,"value":27}]}}
+ {"id":6,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":35,"value":0.15000000000000002}],"name":[{"id":10,"value":"peter"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":6,"value":1.0}],"age":[{"id":11,"value":35}]}}
+ ----
+ 
+ Vertex 4 ("josh") is isolated below:
+ 
+ [source,js]
+ ----
+ {
+   "id":4,
+   "label":"person",
+   "properties": {
+     "gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.19250000000000003}],
+     "name":[{"id":6,"value":"josh"}],
+     "gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":2.0}],
+     "age":[{"id":7,"value":32}]}
+   }
+ }
+ ----
+ 
+ Hadoop-Gremlin for Graph System Providers
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ 
+ Hadoop-Gremlin is centered around `InputFormats` and `OutputFormats`. If a 3rd-party graph system provider wishes to
+ leverage Hadoop-Gremlin (and its respective `GraphComputer` engines), then they need to provide, at minimum, a
+ Hadoop2 `InputFormat<NullWritable,VertexWritable>` for their graph system. If the provider wishes to persist computed
+ results back to their graph system (and not just to HDFS via a `FileOutputFormat`), then a graph system specific
+ `OutputFormat<NullWritable,VertexWritable>` must be developed as well.
+ 
+ Conceptually, `HadoopGraph` is a wrapper around a `Configuration` object. There is no "data" in the `HadoopGraph` as
+ the `InputFormat` specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, `HadoopGraph` is a
+ small object with little overhead. Graph system providers should realize `HadoopGraph` as the gateway to the OLAP
+ features offered by Hadoop-Gremlin. For example, a graph system specific `Graph.compute(Class<? extends GraphComputer>
+ graphComputerClass)`-method may look as follows:
+ 
+ [source,java]
+ ----
+ public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException {
+   try {
+     if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass))
+       return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this);
+     else
+       throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass);
+   } catch (final Exception e) {
+     throw new IllegalArgumentException(e.getMessage(),e);
+   }
+ }
+ ----
+ 
+ Note that the configurations for Hadoop are assumed to be in the `Graph.configuration()` object. If this is not the
+ case, then the `Configuration` provided to `HadoopGraph.open()` should be dynamically created within the
+ `compute()`-method. It is in the provided configuration that `HadoopGraph` gets the various properties which
+ determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphInputFormat` and
+ `gremlin.hadoop.graphOutputFormat`.
+ 
+ IMPORTANT: A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which
+ determines which persistence options are available to the user. For the standard file-based `OutputFormats` provided
+ by Hadoop-Gremlin (e.g. <<gryo-io-format,`GryoOutputFormat`>>, <<graphson-io-format,`GraphSONOutputFormat`>>,
+ and <<script-io-format,`ScriptInputOutputFormat`>>) `ResultGraph.ORIGINAL` is not supported as the original graph
+ data files are not random access and are, in essence, immutable. Thus, these file-based `OutputFormats` only support
+ `ResultGraph.NEW` which creates a copy of the data specified by the `Persist` enum.
+

[09/19] incubator-tinkerpop git commit: Merge remote-tracking branch 'origin/TINKERPOP-1148' into tp31

Posted by dk...@apache.org.

Merge remote-tracking branch 'origin/TINKERPOP-1148' into tp31


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/f19311b7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/f19311b7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/f19311b7

Branch: refs/heads/master
Commit: f19311b7fbd21ce07f52cfe9fb06e53d507077d2
Parents: 70a3065 b42dcea
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu Feb 18 11:28:36 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Feb 18 11:28:36 2016 -0500

----------------------------------------------------------------------
 .../server/op/AbstractEvalOpProcessor.java      | 51 ++++++++++++--------
 .../gremlin/server/op/session/Session.java      |  6 ++-
 .../server/op/session/SessionOpProcessor.java   |  4 +-
 .../server/GremlinServerIntegrateTest.java      | 27 +++++++++++
 4 files changed, 66 insertions(+), 22 deletions(-)
----------------------------------------------------------------------

[17/19] incubator-tinkerpop git commit: minor parser fix

Posted by dk...@apache.org.

minor parser fix


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/606c68bf
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/606c68bf
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/606c68bf

Branch: refs/heads/master
Commit: 606c68bfad95e6cc9407803a071e734a01489ade
Parents: c9149c2
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Fri Feb 19 18:26:17 2016 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Fri Feb 19 18:26:17 2016 +0100

----------------------------------------------------------------------
 docs/preprocessor/preprocess.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/606c68bf/docs/preprocessor/preprocess.sh
----------------------------------------------------------------------
diff --git a/docs/preprocessor/preprocess.sh b/docs/preprocessor/preprocess.sh
index 00d887b..cb7ab22 100755
--- a/docs/preprocessor/preprocess.sh
+++ b/docs/preprocessor/preprocess.sh
@@ -124,7 +124,7 @@ find "${TP_HOME}/docs/src/" -name index.asciidoc | xargs -n1 dirname | while rea
   if [ ${process_subdirs} -eq 1 ]; then
     find "${subdir}" -name "*.asciidoc" |
          xargs -n1 basename |
-         xargs -n1 -I {} echo "echo -ne {}' '; (grep -n {} ${subdir}/index.asciidoc || echo 0) | cut -d ':' -f1" | /bin/bash | sort -nk2 | cut -d ' ' -f1 |
+         xargs -n1 -I {} echo "echo -ne {}' '; (grep -n {} ${subdir}/index.asciidoc || echo 0) | head -n1 | cut -d ':' -f1" | /bin/bash | sort -nk2 | cut -d ' ' -f1 |
          xargs -n1 -I {} echo "${subdir}/{}" |
          xargs -n1 ${TP_HOME}/docs/preprocessor/preprocess-file.sh "${CONSOLE_HOME}"

[12/19] incubator-tinkerpop git commit: splitted implementations.asciidoc into implementations-hadoop.asciidoc and implementations-neo4j.asciidoc

Posted by dk...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/bbf5b3f4/docs/src/reference/implementations-neo4j.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/implementations-neo4j.asciidoc b/docs/src/reference/implementations-neo4j.asciidoc
new file mode 100644
index 0000000..5602754
--- /dev/null
+++ b/docs/src/reference/implementations-neo4j.asciidoc
@@ -0,0 +1,921 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[implementations]]
+Implementations
+===============
+
+image::gremlin-racecar.png[width=325]
+
+[[graph-system-provider-requirements]]
+Graph System Provider Requirements
+----------------------------------
+
+image:tinkerpop-enabled.png[width=140,float=left] At the core of TinkerPop3 is a Java8 API. The implementation of this
+core API and its validation via the `gremlin-test` suite is all that is required of a graph system provider wishing to
+provide a TinkerPop3-enabled graph engine. Once a graph system has a valid implementation, then all the applications
+provided by TinkerPop (e.g. Gremlin Console, Gremlin Server, etc.) and 3rd-party developers (e.g. Gremlin-Scala,
+Gremlin-JS, etc.) will integrate properly. Finally, please feel free to use the logo on the left to promote your
+TinkerPop3 implementation.
+
+Implementing Gremlin-Core
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The classes that a graph system provider should focus on implementing are itemized below. It is a good idea to study
+the <<tinkergraph-gremlin,TinkerGraph>> (in-memory OLTP and OLAP in `tinkergraph-gremlin`), <<neo4j-gremlin,Neo4jGraph>>
+(OTLP w/ transactions in `neo4j-gremlin`) and/or <<hadoop-gremlin,HadoopGraph>> (OLAP in `hadoop-gremlin`)
+implementations for ideas and patterns.
+
+. Online Transactional Processing Graph Systems (*OLTP*)
+ .. Structure API: `Graph`, `Element`, `Vertex`, `Edge`, `Property` and `Transaction` (if transactions are supported).
+ .. Process API: `TraversalStrategy` instances for optimizing Gremlin traversals to the provider's graph system (i.e. `TinkerGraphStepStrategy`).
+. Online Analytics Processing Graph Systems (*OLAP*)
+ .. Everything required of OTLP is required of OLAP (but not vice versa).
+ .. GraphComputer API: `GraphComputer`, `Messenger`, `Memory`.
+
+Please consider the following implementation notes:
+
+* Be sure your `Graph` implementation is named as `XXXGraph` (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
+* Use `StringHelper` to ensuring that the `toString()` representation of classes are consistent with other implementations.
+* Ensure that your implementation's `Features` (Graph, Vertex, etc.) are correct so that test cases handle particulars accordingly.
+* Use the numerous static method helper classes such as `ElementHelper`, `GraphComputerHelper`, `VertexProgramHelper`, etc.
+* There are a number of default methods on the provided interfaces that are semantically correct. However, if they are
+not efficient for the implementation, override them.
+* Implement the `structure/` package interfaces first and then, if desired, interfaces in the `process/` package interfaces.
+* `ComputerGraph` is a `Wrapper` system that ensure proper semantics during a GraphComputer computation.
+
+[[oltp-implementations]]
+OLTP Implementations
+^^^^^^^^^^^^^^^^^^^^
+
+image:pipes-character-1.png[width=110,float=right] The most important interfaces to implement are in the `structure/`
+package. These include interfaces like Graph, Vertex, Edge, Property, Transaction, etc. The `StructureStandardSuite`
+will ensure that the semantics of the methods implemented are correct. Moreover, there are numerous `Exceptions`
+classes with static exceptions that should be thrown by the graph system so that all the exceptions and their
+messages are consistent amongst all TinkerPop3 implementations.
+
+[[olap-implementations]]
+OLAP Implementations
+^^^^^^^^^^^^^^^^^^^^
+
+image:furnace-character-1.png[width=110,float=right] Implementing the OLAP interfaces may be a bit more complicated.
+Note that before OLAP interfaces are implemented, it is necessary for the OLTP interfaces to be, at minimal,
+implemented as specified in <<oltp-implementations,OLTP Implementations>>. A summary of each required interface
+implementation is presented below:
+
+. `GraphComputer`: A fluent builder for specifying an isolation level, a VertexProgram, and any number of MapReduce jobs to be submitted.
+. `Memory`: A global blackboard for ANDing, ORing, INCRing, and SETing values for specified keys.
+. `Messenger`: The system that collects and distributes messages being propagated by vertices executing the VertexProgram application.
+. `MapReduce.MapEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications map-phase.
+. `MapReduce.ReduceEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications combine- and reduce-phases.
+
+NOTE: The VertexProgram and MapReduce interfaces in the `process/computer/` package are not required by the graph
+system. Instead, these are interfaces to be implemented by application developers writing VertexPrograms and MapReduce jobs.
+
+IMPORTANT: TinkerPop3 provides three OLAP implementations: <<tinkergraph-gremlin,TinkerGraphComputer>> (TinkerGraph),
+<<giraphgraphcomputer,GiraphGraphComputer>> (HadoopGraph), and <<sparkgraphcomputer,`SparkGraphComputer`>> (Hadoop).
+Given the complexity of the OLAP system, it is good to study and copy many of the patterns used in these reference
+implementations.
+
+Implementing GraphComputer
+++++++++++++++++++++++++++
+
+image:furnace-character-3.png[width=150,float=right] The most complex method in GraphComputer is the `submit()`-method. The method must do the following:
+
+. Ensure the the GraphComputer has not already been executed.
+. Ensure that at least there is a VertexProgram or 1 MapReduce job.
+. If there is a VertexProgram, validate that it can execute on the GraphComputer given the respectively defined features.
+. Create the Memory to be used for the computation.
+. Execute the VertexProgram.setup() method once and only once.
+. Execute the VertexProgram.execute() method for each vertex.
+. Execute the VertexProgram.terminate() method once and if true, repeat VertexProgram.execute().
+. When VertexProgram.terminate() returns true, move to MapReduce job execution.
+. MapReduce jobs are not required to be executed in any specified order.
+. For each Vertex, execute MapReduce.map(). Then (if defined) execute MapReduce.combine() and MapReduce.reduce().
+. Update Memory with runtime information.
+. Construct a new `ComputerResult` containing the compute Graph and Memory.
+
+Implementing Memory
++++++++++++++++++++
+
+image:gremlin-brain.png[width=175,float=left] The Memory object is initially defined by `VertexProgram.setup()`.
+The memory data is available in the first round of the `VertexProgram.execute()` method. Each Vertex, when executing
+the VertexProgram, can update the Memory in its round. However, the update is not seen by the other vertices until
+the next round. At the end of the first round, all the updates are aggregated and the new memory data is available
+on the second round. This process repeats until the VertexProgram terminates.
+
+Implementing Messenger
+++++++++++++++++++++++
+
+The Messenger object is similar to the Memory object in that a vertex can read and write to the Messenger. However,
+the data it reads are the messages sent to the vertex in the previous step and the data it writes are the messages
+that will be readable by the receiving vertices in the subsequent round.
+
+Implementing MapReduce Emitters
++++++++++++++++++++++++++++++++
+
+image:hadoop-logo-notext.png[width=150,float=left] The MapReduce framework in TinkerPop3 is similar to the model
+popularized by link:http://apache.hadoop.org[Hadoop]. The primary difference is that all Mappers process the vertices
+of the graph, not an arbitrary key/value pair. However, the vertices' edges can not be accessed -- only their
+properties. This greatly reduces the amount of data needed to be pushed through the MapReduce engine as any edge
+information required, can be computed in the VertexProgram.execute() method. Moreover, at this stage, vertices can
+not be mutated, only their token and property data read. A Gremlin OLAP system needs to provide implementations for
+to particular classes: `MapReduce.MapEmitter` and `MapReduce.ReduceEmitter`. TinkerGraph's implementation is provided
+below which demonstrates the simplicity of the algorithm (especially when the data is all within the same JVM).
+
+[source,java]
+----
+public class TinkerMapEmitter<K, V> implements MapReduce.MapEmitter<K, V> {
+
+    public Map<K, Queue<V>> reduceMap;
+    public Queue<KeyValue<K, V>> mapQueue;
+    private final boolean doReduce;
+
+    public TinkerMapEmitter(final boolean doReduce) { <1>
+        this.doReduce = doReduce;
+        if (this.doReduce)
+            this.reduceMap = new ConcurrentHashMap<>();
+        else
+            this.mapQueue = new ConcurrentLinkedQueue<>();
+    }
+
+    @Override
+    public void emit(K key, V value) {
+        if (this.doReduce)
+            this.reduceMap.computeIfAbsent(key, k -> new ConcurrentLinkedQueue<>()).add(value); <2>
+        else
+            this.mapQueue.add(new KeyValue<>(key, value)); <3>
+    }
+
+    protected void complete(final MapReduce<K, V, ?, ?, ?> mapReduce) {
+        if (!this.doReduce && mapReduce.getMapKeySort().isPresent()) { <4>
+            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
+            final List<KeyValue<K, V>> list = new ArrayList<>(this.mapQueue);
+            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
+            this.mapQueue.clear();
+            this.mapQueue.addAll(list);
+        } else if (mapReduce.getMapKeySort().isPresent()) {
+            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
+            final List<Map.Entry<K, Queue<V>>> list = new ArrayList<>();
+            list.addAll(this.reduceMap.entrySet());
+            Collections.sort(list, Comparator.comparing(Map.Entry::getKey, comparator));
+            this.reduceMap = new LinkedHashMap<>();
+            list.forEach(entry -> this.reduceMap.put(entry.getKey(), entry.getValue()));
+        }
+    }
+}
+----
+
+<1> If the MapReduce job has a reduce, then use one data structure (`reduceMap`), else use another (`mapList`). The
+difference being that a reduction requires a grouping by key and therefore, the `Map<K,Queue<V>>` definition. If no
+reduction/grouping is required, then a simple `Queue<KeyValue<K,V>>` can be leveraged.
+<2> If reduce is to follow, then increment the Map with a new value for the key. `MapHelper` is a TinkerPop3 class
+with static methods for adding data to a Map.
+<3> If no reduce is to follow, then simply append a KeyValue to the queue.
+<4> When the map phase is complete, any map-result sorting required can be executed at this point.
+
+[source,java]
+----
+public class TinkerReduceEmitter<OK, OV> implements MapReduce.ReduceEmitter<OK, OV> {
+
+    protected Queue<KeyValue<OK, OV>> reduceQueue = new ConcurrentLinkedQueue<>();
+
+    @Override
+    public void emit(final OK key, final OV value) {
+        this.reduceQueue.add(new KeyValue<>(key, value));
+    }
+
+    protected void complete(final MapReduce<?, ?, OK, OV, ?> mapReduce) {
+        if (mapReduce.getReduceKeySort().isPresent()) {
+            final Comparator<OK> comparator = mapReduce.getReduceKeySort().get();
+            final List<KeyValue<OK, OV>> list = new ArrayList<>(this.reduceQueue);
+            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
+            this.reduceQueue.clear();
+            this.reduceQueue.addAll(list);
+        }
+    }
+}
+----
+
+The method `MapReduce.reduce()` is defined as:
+
+[source,java]
+public void reduce(final OK key, final Iterator<OV> values, final ReduceEmitter<OK, OV> emitter) { ... }
+
+In other words, for the TinkerGraph implementation, iterate through the entrySet of the `reduceMap` and call the
+`reduce()` method on each entry. The `reduce()` method can emit key/value pairs which are simply aggregated into a
+`Queue<KeyValue<OK,OV>>` in an analogous fashion to `TinkerMapEmitter` when no reduce is to follow. These two emitters
+are tied together in `TinkerGraphComputer.submit()`.
+
+[source,java]
+----
+...
+for (final MapReduce mapReduce : mapReducers) {
+    if (mapReduce.doStage(MapReduce.Stage.MAP)) {
+        final TinkerMapEmitter<?, ?> mapEmitter = new TinkerMapEmitter<>(mapReduce.doStage(MapReduce.Stage.REDUCE));
+        final SynchronizedIterator<Vertex> vertices = new SynchronizedIterator<>(this.graph.vertices());
+        workers.setMapReduce(mapReduce);
+        workers.mapReduceWorkerStart(MapReduce.Stage.MAP);
+        workers.executeMapReduce(workerMapReduce -> {
+            while (true) {
+                final Vertex vertex = vertices.next();
+                if (null == vertex) return;
+                workerMapReduce.map(ComputerGraph.mapReduce(vertex), mapEmitter);
+            }
+        });
+        workers.mapReduceWorkerEnd(MapReduce.Stage.MAP);
+
+        // sort results if a map output sort is defined
+        mapEmitter.complete(mapReduce);
+
+        // no need to run combiners as this is single machine
+        if (mapReduce.doStage(MapReduce.Stage.REDUCE)) {
+            final TinkerReduceEmitter<?, ?> reduceEmitter = new TinkerReduceEmitter<>();
+            final SynchronizedIterator<Map.Entry<?, Queue<?>>> keyValues = new SynchronizedIterator((Iterator) mapEmitter.reduceMap.entrySet().iterator());
+            workers.mapReduceWorkerStart(MapReduce.Stage.REDUCE);
+            workers.executeMapReduce(workerMapReduce -> {
+                while (true) {
+                    final Map.Entry<?, Queue<?>> entry = keyValues.next();
+                    if (null == entry) return;
+                        workerMapReduce.reduce(entry.getKey(), entry.getValue().iterator(), reduceEmitter);
+                    }
+                });
+            workers.mapReduceWorkerEnd(MapReduce.Stage.REDUCE);
+            reduceEmitter.complete(mapReduce); // sort results if a reduce output sort is defined
+            mapReduce.addResultToMemory(this.memory, reduceEmitter.reduceQueue.iterator()); <1>
+        } else {
+            mapReduce.addResultToMemory(this.memory, mapEmitter.mapQueue.iterator()); <2>
+        }
+    }
+}
+...
+----
+
+<1> Note that the final results of the reducer are provided to the Memory as specified by the application developer's
+`MapReduce.addResultToMemory()` implementation.
+<2> If there is no reduce stage, the the map-stage results are inserted into Memory as specified by the application
+developer's `MapReduce.addResultToMemory()` implementation.
+
+[[io-implementations]]
+IO Implementations
+^^^^^^^^^^^^^^^^^^
+
+If a `Graph` requires custom serializers for IO to work properly, implement the `Graph.io` method.  A typical example
+of where a `Graph` would require such a custom serializers is if their identifier system uses non-primitive values,
+such as OrientDB's `Rid` class.  From basic serialization of a single `Vertex` all the way up the stack to Gremlin
+Server, the need to know how to handle these complex identifiers is an important requirement.
+
+The first step to implementing custom serializers is to first implement the `IoRegistry` interface and register the
+custom classes and serializers to it. Each `Io` implementation has different requirements for what it expects from the
+`IoRegistry`:
+
+* *GraphML* - No custom serializers expected/allowed.
+* *GraphSON* - Register a Jackson `SimpleModule`.  The `SimpleModule` encapsulates specific classes to be serialized,
+so it does not need to be registered to a specific class in the `IoRegistry` (use `null`).
+* *Gryo* - Expects registration of one of three objects:
+** Register just the custom class with a `null` Kryo `Serializer` implementation - this class will use default "field-level" Kryo serialization.
+** Register the custom class with a specific Kryo `Serializer' implementation.
+** Register the custom class with a `Function<Kryo, Serializer>` for those cases where the Kryo `Serializer` requires the `Kryo` instance to get constructed.
+
+This implementation should provide a zero-arg constructor as the stack may require instantiation via reflection.
+Consider extending `AbstractIoRegistry` for convenience as follows:
+
+[source,java]
+----
+public class MyGraphIoRegistry extends AbstractIoRegistry {
+    public MyGraphIoRegistry() {
+        register(GraphSONIo.class, null, new MyGraphSimpleModule());
+        register(GryoIo.class, MyGraphIdClass.class, new MyGraphIdSerializer());
+    }
+}
+----
+
+In the `Graph.io` method, provide the `IoRegistry` object to the supplied `Builder` and call the `create` method to
+return that `Io` instance as follows:
+
+[source,java]
+----
+public <I extends Io> I io(final Io.Builder<I> builder) {
+    return (I) builder.graph(this).registry(myGraphIoRegistry).create();
+}}
+----
+
+In this way, `Graph` implementations can pre-configure custom serializers for IO interactions and users will not need
+to know about those details. Following this pattern will ensure proper execution of the test suite as well as
+simplified usage for end-users.
+
+IMPORTANT: Proper implementation of IO is critical to successful `Graph` operations in Gremlin Server.  The Test Suite
+does have "serialization" tests that provide some assurance that an implementation is working properly, but those
+tests cannot make assertions against any specifics of a custom serializer.  It is the responsibility of the
+implementer to test the specifics of their custom serializers.
+
+TIP: Consider separating serializer code into its own module, if possible, so that clients that use the `Graph`
+implementation remotely don't need a full dependency on the entire `Graph` - just the IO components and related
+classes being serialized.
+
+[[validating-with-gremlin-test]]
+Validating with Gremlin-Test
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+image:gremlin-edumacated.png[width=225]
+
+[source,xml]
+<dependency>
+  <groupId>org.apache.tinkerpop</groupId>
+  <artifactId>gremlin-test</artifactId>
+  <version>x.y.z</version>
+</dependency>
+<dependency>
+  <groupId>org.apache.tinkerpop</groupId>
+  <artifactId>gremlin-groovy-test</artifactId>
+  <version>x.y.z</version>
+</dependency>
+
+The operational semantics of any OLTP or OLAP implementation are validated by `gremlin-test` and functional
+interoperability with the Groovy environment is ensured by `gremlin-groovy-test`. To implement these tests, provide
+test case implementations as shown below, where `XXX` below denotes the name of the graph implementation (e.g.
+TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
+
+[source,java]
+----
+// Structure API tests
+@RunWith(StructureStandardSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
+public class XXXStructureStandardTest {}
+
+// Process API tests
+@RunWith(ProcessComputerSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
+public class XXXProcessComputerTest {}
+
+@RunWith(ProcessStandardSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
+public class XXXProcessStandardTest {}
+
+@RunWith(GroovyEnvironmentSuite.class)
+@GraphProviderClass(provider = XXXProvider.class, graph = TinkerGraph.class)
+public class XXXGroovyEnvironmentTest {}
+
+@RunWith(GroovyProcessStandardSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = TinkerGraph.class)
+public class XXXGroovyProcessStandardTest {}
+
+@RunWith(GroovyProcessComputerSuite.class)
+@GraphProviderClass(provider = XXXGraphComputerProvider.class, graph = TinkerGraph.class)
+public class XXXGroovyProcessComputerTest {}
+----
+
+The above set of tests represent the minimum test suite set to implement.  There are other "integration" and
+"performance" tests that should be considered optional.  Implementing those tests requires the same pattern as shown above.
+
+IMPORTANT: It is as important to look at "ignored" tests as it is to look at ones that fail.  The `gremlin-test`
+suite utilizes the `Feature` implementation exposed by the `Graph` to determine which tests to execute.  If a test
+utilizes features that are not supported by the graph, it will ignore them.  While that may be fine, implementers
+should validate that the ignored tests are appropriately bypassed and that there are no mistakes in their feature
+definitions.  Moreover, implementers should consider filling gaps in their own test suites, especially when
+IO-related tests are being ignored.
+
+The only test-class that requires any code investment is the `GraphProvider` implementation class. This class is a
+used by the test suite to construct `Graph` configurations and instances and provides information about the
+implementation itself.  In most cases, it is best to simply extend `AbstractGraphProvider` as it provides many
+default implementations of the `GraphProvider` interface.
+
+Finally, specify the test suites that will be supported by the `Graph` implementation using the `@Graph.OptIn`
+annotation.  See the `TinkerGraph` implementation below as an example:
+
+[source,java]
+----
+@Graph.OptIn(Graph.OptIn.SUITE_STRUCTURE_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
+@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_COMPUTER)
+@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_ENVIRONMENT)
+public class TinkerGraph implements Graph {
+----
+
+Only include annotations for the suites the implementation will support.  Note that implementing the suite, but
+not specifying the appropriate annotation will prevent the suite from running (an obvious error message will appear
+in this case when running the mis-configured suite).
+
+There are times when there may be a specific test in the suite that the implementation cannot support (despite the
+features it implements) or should not otherwise be executed.  It is possible for implementers to "opt-out" of a test
+by using the `@Graph.OptOut` annotation.  The following is an example of this annotation usage as taken from
+`HadoopGraph`:
+
+[source, java]
+----
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
+        method = "g_V_matchXa_hasXname_GarciaX__a_inXwrittenByX_b__a_inXsungByX_bX",
+        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
+        method = "g_V_matchXa_inXsungByX_b__a_inXsungByX_c__b_outXwrittenByX_d__c_outXwrittenByX_e__d_hasXname_George_HarisonX__e_hasXname_Bob_MarleyXX",
+        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
+        method = "shouldNotAllowBadMemoryKeys",
+        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
+        method = "shouldRequireRegisteringMemoryKeys",
+        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
+public class HadoopGraph implements Graph {
+----
+
+The above examples show how to ignore individual tests.  It is also possible to:
+
+* Ignore an entire test case (i.e. all the methods within the test) by setting the `method` to "*".
+* Ignore a "base" test class such that test that extend from those classes will all be ignored.  This style of
+ignoring is useful for Gremlin "process" tests that have bases classes that are extended by various Gremlin flavors (e.g. groovy).
+* Ignore a `GraphComputer` test based on the type of `GraphComputer` being used.  Specify the "computer" attribute on
+the `OptOut` (which is an array specification) which should have a value of the `GraphComputer` implementation class
+that should ignore that test. This attribute should be left empty for "standard" execution and by default all
+`GraphComputer` implementations will be included in the `OptOut` so if there are multiple implementations, explicitly
+specify the ones that should be excluded.
+
+Also note that some of the tests in the Gremlin Test Suite are parameterized tests and require an additional level of
+specificity to be properly ignored.  To ignore these types of tests, examine the name template of the parameterized
+tests.  It is defined by a Java annotation that looks like this:
+
+[source, java]
+@Parameterized.Parameters(name = "expect({0})")
+
+The annotation above shows that the name of each parameterized test will be prefixed with "expect" and have
+parentheses wrapped around the first parameter (at index 0) value supplied to each test.  This information can
+only be garnered by studying the test set up itself.  Once the pattern is determined and the specific unique name of
+the parameterized test is identified, add it to the `specific` property on the `OptOut` annotation in addition to
+the other arguments.
+
+These annotations help provide users a level of transparency into test suite compliance (via the
+xref:describe-graph[describeGraph()] utility function). It also allows implementers to have a lot of flexibility in
+terms of how they wish to support TinkerPop.  For example, maybe there is a single test case that prevents an
+implementer from claiming support of a `Feature`.  The implementer could choose to either not support the `Feature`
+or to support it but "opt-out" of the test with a "reason" as to why so that users understand the limitation.
+
+IMPORTANT: Before using `OptOut` be sure that the reason for using it is sound and it is more of a last resort.
+It is possible that a test from the suite doesn't properly represent the expectations of a feature, is too broad or
+narrow for the semantics it is trying to enforce or simply contains a bug.  Please consider raising issues in the
+developer mailing list with such concerns before assuming `OptOut` is the only answer.
+
+IMPORTANT: There are no tests that specifically validate complete compliance with Gremlin Server.  Generally speaking,
+a `Graph` that passes the full Test Suite, should be compliant with Gremlin Server.  The one area where problems can
+occur is in serialization.  Always ensure that IO is properly implemented, that custom serializers are tested fully
+and ultimately integration test the `Graph` with an actual Gremlin Server instance.
+
+CAUTION: Configuring tests to run in parallel might result in errors that are difficult to debug as there is some
+shared state in test execution around graph configuration.  It is therefore recommended that parallelism be turned
+off for the test suite (the Maven SureFire Plugin is configured this way by default).  It may also be important to
+include this setting, `<reuseForks>false</reuseForks>`, in the SureFire configuration if tests are failing in an
+unexplainable way.
+
+Accessibility via GremlinPlugin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+image:gremlin-plugin.png[width=100,float=left] The applications distributed with TinkerPop3 do not distribute with
+any graph system implementations besides TinkerGraph. If your implementation is stored in a Maven repository (e.g.
+Maven Central Repository), then it is best to provide a `GremlinPlugin` implementation so the respective jars can be
+downloaded according and when required by the user. Neo4j's GremlinPlugin is provided below for reference.
+
+[source,java]
+----
+public class Neo4jGremlinPlugin implements GremlinPlugin {
+
+    private static final String IMPORT = "import ";
+    private static final String DOT_STAR = ".*";
+
+    private static final Set<String> IMPORTS = new HashSet<String>() {{
+        add(IMPORT + Neo4jGraph.class.getPackage().getName() + DOT_STAR);
+    }};
+
+    @Override
+    public String getName() {
+        return "neo4j";
+    }
+
+    @Override
+    public void pluginTo(final PluginAcceptor pluginAcceptor) {
+        pluginAcceptor.addImports(IMPORTS);
+    }
+}
+---- 
+
+With the above plugin implementations, users can now download respective binaries for Gremlin Console, Gremlin Server, etc.
+
+[source,groovy]
+gremlin> g = Neo4jGraph.open('/tmp/neo4j')
+No such property: Neo4jGraph for class: groovysh_evaluate
+Display stack trace? [yN]
+gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z
+==>loaded: [org.apache.tinkerpop, neo4j-gremlin, …]
+gremlin> :plugin use tinkerpop.neo4j
+==>tinkerpop.neo4j activated
+gremlin> g = Neo4jGraph.open('/tmp/neo4j')
+==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
+
+In-Depth Implementations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+image:gremlin-painting.png[width=200,float=right] The graph system implementation details presented thus far are
+minimum requirements necessary to yield a valid TinkerPop3 implementation. However, there are other areas that a
+graph system provider can tweak to provide an implementation more optimized for their underlying graph engine. Typical
+areas of focus include:
+
+* Traversal Strategies: A <<traversalstrategy,TraversalStrategy>> can be used to alter a traversal prior to its
+execution. A typical example is converting a pattern of `g.V().has('name','marko')` into a global index lookup for
+all vertices with name "marko". In this way, a `O(|V|)` lookup becomes an `O(log(|V|))`. Please review
+`TinkerGraphStepStrategy` for ideas.
+* Step Implementations: Every <<graph-traversal-steps,step>> is ultimately referenced by the `GraphTraversal`
+interface. It is possible to extend `GraphTraversal` to use a graph system specific step implementation.
+
+
+[[tinkergraph-gremlin]]
+TinkerGraph-Gremlin
+-------------------
+
+[source,xml]
+----
+<dependency>
+   <groupId>org.apache.tinkerpop</groupId>
+   <artifactId>tinkergraph-gremlin</artifactId>
+   <version>x.y.z</version>
+</dependency>
+----
+
+image:tinkerpop-character.png[width=100,float=left] TinkerGraph is a single machine, in-memory (with optional
+persistence), non-transactional graph engine that provides both OLTP and OLAP functionality. It is deployed with
+TinkerPop3 and serves as the reference implementation for other providers to study in order to understand the
+semantics of the various methods of the TinkerPop3 API. Constructing a simple graph in Java8 is presented below.
+
+[source,java]
+Graph g = TinkerGraph.open();
+Vertex marko = g.addVertex("name","marko","age",29);
+Vertex lop = g.addVertex("name","lop","lang","java");
+marko.addEdge("created",lop,"weight",0.6d);
+
+The above graph creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6
+property. Next, the graph can be queried as such.
+
+[source,java]
+g.V().has("name","marko").out("created").values("name")
+
+The `g.V().has("name","marko")` part of the query can be executed in two ways.
+
+ * A linear scan of all vertices filtering out those vertices that don't have the name "marko"
+ * A `O(log(|V|))` index lookup for all vertices with the name "marko"
+
+Given the initial graph construction in the first code block, no index was defined and thus, a linear scan is executed.
+However, if the graph was constructed as such, then an index lookup would be used.
+
+[source,java]
+Graph g = TinkerGraph.open();
+g.createIndex("name",Vertex.class)
+
+The execution times for a vertex lookup by property is provided below for both no-index and indexed version of
+TinkerGraph over the Grateful Dead graph.
+
+[gremlin-groovy]
+----
+graph = TinkerGraph.open()
+g = graph.traversal()
+graph.io(graphml()).readGraph('data/grateful-dead.xml')
+clock(1000) {g.V().has('name','Garcia').iterate()} <1>
+graph = TinkerGraph.open()
+g = graph.traversal()
+graph.createIndex('name',Vertex.class)
+graph.io(graphml()).readGraph('data/grateful-dead.xml')
+clock(1000){g.V().has('name','Garcia').iterate()} <2>
+----
+
+<1> Determine the average runtime of 1000 vertex lookups when no `name`-index is defined.
+<2> Determine the average runtime of 1000 vertex lookups when a `name`-index is defined.
+
+IMPORTANT: Each graph system will have different mechanism by which indices and schemas are defined. TinkerPop3
+does not require any conformance in this area. In TinkerGraph, the only definitions are around indices. With other
+graph systems, property value types, indices, edge labels, etc. may be required to be defined _a priori_ to adding
+data to the graph.
+
+NOTE: TinkerGraph is distributed with Gremlin Server and is therefore automatically available to it for configuration.
+
+Configuration
+~~~~~~~~~~~~~
+
+TinkerGraph has several settings that can be provided on creation via `Configuration` object:
+
+[width="100%",cols="2,10",options="header"]
+|=========================================================
+|Property |Description
+|gremlin.graph |`org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph`
+|gremlin.tinkergraph.vertexIdManager |The `IdManager` implementation to use for vertices.
+|gremlin.tinkergraph.edgeIdManager |The `IdManager` implementation to use for edges.
+|gremlin.tinkergraph.vertexPropertyIdManager |The `IdManager` implementation to use for vertex properties.
+|gremlin.tinkergraph.defaultVertexPropertyCardinality |The default `VertexProperty.Cardinality` to use when `Vertex.property(k,v)` is called.
+|gremlin.tinkergraph.graphLocation |The path and file name for where TinkerGraph should persist the graph data. If a
+value is specified here, the the `gremlin.tinkergraph.graphFormat` should also be specified.  If this value is not
+included (default), then the graph will stay in-memory and not be loaded/persisted to disk.
+|gremlin.tinkergraph.graphFormat |The format to use to serialize the graph which may be one of the following:
+`graphml`, `graphson`, `gryo`, or a fully qualified class name that implements Io.Builder interface (which allows for
+external third party graph reader/writer formats to be used for persistence).
+If a value is specified here, then the `gremlin.tinkergraph.graphLocation` should
+also be specified.  If this value is not included (default), then the graph will stay in-memory and not be
+loaded/persisted to disk.
+|=========================================================
+
+The `IdManager` settings above refer to how TinkerGraph will control identifiers for vertices, edges and vertex
+properties.  There are several options for each of these settings: `ANY`, `LONG`, `INTEGER`, `UUID`, or the fully
+qualified class name of an `IdManager` implementation on the classpath.  When not specified, the default values
+for all settings is `ANY`, meaning that the graph will work with any object on the JVM as the identifier and will
+generate new identifiers from `Long` when the identifier is not user supplied.  TinkerGraph will also expect the
+user to understand the types used for identifiers when querying, meaning that `g.V(1)` and `g.V(1L)` could return
+two different vertices.  `LONG`, `INTEGER` and `UUID` settings will try to coerce identifier values to the expected
+type as well as generate new identifiers with that specified type.
+
+If the TinkerGraph is configured for persistence with `gremlin.tinkergraph.graphLocation` and
+`gremlin.tinkergraph.graphFormat`, then the graph will be written to the specified location with the specified
+format when `Graph.close()` is called.  In addition, if these settings are present, TinkerGraph will attempt to
+load the graph from the specified location.
+
+IMPORTANT: If choosing `graphson` as the `gremlin.tinkergraph.graphFormat`, be sure to also establish the  various
+`IdManager` settings as well to ensure that identifiers are properly coerced to the appropriate types as GraphSON
+can lose the identifier's type during serialization (i.e. it will assume `Integer` when the default for TinkerGraph
+is `Long`, which could lead to load errors that result in a message like, "Vertex with id already exists").
+
+It is important to consider the data being imported to TinkerGraph with respect to `defaultVertexPropertyCardinality`
+setting.  For example, if a `.gryo` file is known to contain multi-property data, be sure to set the default
+cardinality to `list` or else the data will import as `single`.  Consider the following:
+
+[gremlin-groovy]
+----
+graph = TinkerGraph.open()
+graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
+g = graph.traversal()
+g.V().properties()
+conf = new BaseConfiguration()
+conf.setProperty("gremlin.tinkergraph.defaultVertexPropertyCardinality","list")
+graph = TinkerGraph.open(conf)
+graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
+g = graph.traversal()
+g.V().properties()
+----
+
+[[neo4j-gremlin]]
+Neo4j-Gremlin
+-------------
+
+[source,xml]
+----
+<dependency>
+   <groupId>org.apache.tinkerpop</groupId>
+   <artifactId>neo4j-gremlin</artifactId>
+   <version>x.y.z</version>
+</dependency>
+<!-- neo4j-tinkerpop-api-impl is NOT Apache 2 licensed - more information below -->
+<dependency>
+  <groupId>org.neo4j</groupId>
+  <artifactId>neo4j-tinkerpop-api-impl</artifactId>
+  <version>0.1-2.2</version>
+</dependency>
+----
+
+link:http://neotechnology.com[Neo Technology] are the developers of the OLTP-based link:http://neo4j.org[Neo4j graph database].
+
+CAUTION: Unless under a commercial agreement with Neo Technology, Neo4j is licensed
+link:http://en.wikipedia.org/wiki/Affero_General_Public_License[AGPL]. The `neo4j-gremlin` module is licensed Apache2
+because it only references the Apache2-licensed Neo4j API (not its implementation). Note that neither the
+<<gremlin-console,Gremlin Console>> nor <<gremlin-server,Gremlin Server>> distribute with the Neo4j implementation
+binaries. To access the binaries, use the `:install` command to download binaries from
+link:http://search.maven.org/[Maven Central Repository].
+
+[source,groovy]
+----
+gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z
+==>Loaded: [org.apache.tinkerpop, neo4j-gremlin, x.y.z] - restart the console to use [tinkerpop.neo4j]
+gremlin> :q
+...
+gremlin> :plugin use tinkerpop.neo4j
+==>tinkerpop.neo4j activated
+gremlin> graph = Neo4jGraph.open('/tmp/neo4j')
+==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
+----
+
+NOTE: Neo4j link:http://docs.neo4j.org/chunked/stable/ha.html[High Availability] is currently not supported by
+Neo4j-Gremlin.
+
+TIP: To host Neo4j in <<gremlin-server,Gremlin Server>>, the dependencies must first be "installed" or otherwise
+copied to the Gremlin Server path. The automated method for doing this would be to execute
+`bin/gremlin-server.sh -i org.apache.tinkerpop neo4j-gremlin x.y.z`.
+
+Indices
+~~~~~~~
+
+Neo4j 2.x indices leverage vertex labels to partition the index space. TinkerPop3 does not provide method interfaces
+for defining schemas/indices for the underlying graph system. Thus, in order to create indices, it is important to
+call the Neo4j API directly.
+
+NOTE: `Neo4jGraphStep` will attempt to discern which indices to use when executing a traversal of the form `g.V().has()`.
+
+The Gremlin-Console session below demonstrates Neo4j indices. For more information, please refer to the Neo4j documentation:
+
+* Manipulating indices with link:http://docs.neo4j.org/chunked/stable/query-schema-index.html[Cypher].
+* Manipulating indices with the Neo4j link:http://docs.neo4j.org/chunked/stable/tutorials-java-embedded-new-index.html[Java API].
+
+[gremlin-groovy]
+----
+graph = Neo4jGraph.open('/tmp/neo4j')
+graph.cypher("CREATE INDEX ON :person(name)")
+graph.tx().commit()  <1>
+graph.addVertex(label,'person','name','marko')
+graph.addVertex(label,'dog','name','puppy')
+g = graph.traversal()
+g.V().hasLabel('person').has('name','marko').values('name')
+graph.close()
+----
+
+<1> Schema mutations must happen in a different transaction than graph mutations
+
+Below demonstrates the runtime benefits of indices and demonstrates how if there is no defined index (only vertex
+labels), a linear scan of the vertex-label partition is still faster than a linear scan of all vertices.
+
+[gremlin-groovy]
+----
+graph = Neo4jGraph.open('/tmp/neo4j')
+graph.io(graphml()).readGraph('data/grateful-dead.xml')
+g = graph.traversal()
+g.tx().commit()
+clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()}  <1>
+graph.cypher("CREATE INDEX ON :artist(name)") <2>
+g.tx().commit()
+Thread.sleep(5000) <3>
+clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} <4>
+clock(1000) {g.V().has('name','Garcia').iterate()} <5>
+graph.cypher("DROP INDEX ON :artist(name)") <6>
+g.tx().commit()
+graph.close()
+----
+
+<1> Find all artists whose name is Garcia which does a linear scan of the artist vertex-label partition.
+<2> Create an index for all artist vertices on their name property.
+<3> Neo4j indices are eventually consistent so this stalls to give the index time to populate itself.
+<4> Find all artists whose name is Garcia which uses the pre-defined schema index.
+<5> Find all vertices whose name is Garcia which requires a linear scan of all the data in the graph.
+<6> Drop the created index.
+
+Multi/Meta-Properties
+~~~~~~~~~~~~~~~~~~~~~
+
+`Neo4jGraph` supports both multi- and meta-properties (see <<_vertex_properties,vertex properties>>). These features
+are not native to Neo4j and are implemented using "hidden" Neo4j nodes. For example, when a vertex has multiple
+"name" properties, each property is a new node (multi-properties) which can have properties attached to it
+(meta-properties). As such, the native, underlying representation may become difficult to query directly using
+another graph language such as <<_cypher,Cypher>>. The default setting is to disable multi- and meta-properties.
+However, if this feature is desired, then it can be activated via `gremlin.neo4j.metaProperties` and
+`gremlin.neo4j.multiProperties` configurations being set to `true`. Once the configuration is set, it can not be
+changed for the lifetime of the graph.
+
+[gremlin-groovy]
+----
+conf = new BaseConfiguration()
+conf.setProperty('gremlin.neo4j.directory','/tmp/neo4j')
+conf.setProperty('gremlin.neo4j.multiProperties',true)
+conf.setProperty('gremlin.neo4j.metaProperties',true)
+graph = Neo4jGraph.open(conf)
+g = graph.traversal()
+g.addV('name','michael','name','michael hunger','name','mhunger')
+g.V().properties('name').property('acl', 'public')
+g.V(0).valueMap()
+g.V(0).properties()
+g.V(0).properties().valueMap()
+graph.close()
+----
+
+WARNING: `Neo4jGraph` without multi- and meta-properties is in 1-to-1 correspondence with the native, underlying Neo4j
+representation. It is recommended that if the user does not require multi/meta-properties, then they should not
+enable them. Without multi- and meta-properties enabled, Neo4j can be interacted with with other tools and technologies
+that do not leverage TinkerPop.
+
+IMPORTANT: When using a multi-property enabled `Neo4jGraph`, vertices may represent their properties on "hidden
+nodes" adjacent to the vertex. If a vertex property key/value is required for indexing, then two indices are
+required -- e.g. `CREATE INDEX ON :person(name)` and `CREATE INDEX ON :vertexProperty(name)`
+(see <<_indices,Neo4j indices>>).
+
+Cypher
+~~~~~~
+
+image::gremlin-loves-cypher.png[width=400]
+
+NeoTechnology are the creators of the graph pattern-match query language link:http://www.neo4j.org/learn/cypher[Cypher].
+It is possible to leverage Cypher from within Gremlin by using the `Neo4jGraph.cypher()` graph traversal method.
+
+[gremlin-groovy]
+----
+graph = Neo4jGraph.open('/tmp/neo4j')
+graph.io(gryo()).readGraph('data/tinkerpop-modern.kryo')
+graph.cypher('MATCH (a {name:"marko"}) RETURN a')
+graph.cypher('MATCH (a {name:"marko"}) RETURN a').select('a').out('knows').values('name')
+graph.close()
+----
+
+Thus, like <<match-step,`match()`>>-step in Gremlin, it is possible to do a declarative pattern match and then move
+back into imperative Gremlin.
+
+TIP: For those developers using <<gremlin-server,Gremlin Server>> against Neo4j, it is possible to do Cypher queries
+by simply placing the Cypher string in `graph.cypher(...)` before submission to the server.
+
+Multi-Label
+~~~~~~~~~~~
+
+TinkerPop3 requires every `Element` to have a single, immutable string label (i.e. a `Vertex`, `Edge`, and
+`VertexProperty`). In Neo4j, a `Node` (vertex) can have an
+link:http://neo4j.com/docs/stable/graphdb-neo4j-labels.html[arbitrary number of labels] while a `Relationship`
+(edge) can have one and only one. Furthermore, in Neo4j, `Node` labels are mutable while `Relationship` labels are
+not. In order to handle this mismatch, three `Neo4jVertex` specific methods exist in Neo4j-Gremlin.
+
+[source,java]
+public Set<String> labels() // get all the labels of the vertex
+public void addLabel(String label) // add a label to the vertex
+public void removeLabel(String label) // remove a label from the vertex
+
+An example use case is presented below.
+
+[gremlin-groovy]
+----
+graph = Neo4jGraph.open('/tmp/neo4j')
+vertex = (Neo4jVertex) graph.addVertex('human::animal') <1>
+vertex.label() <2>
+vertex.labels() <3>
+vertex.addLabel('organism') <4>
+vertex.label()
+vertex.removeLabel('human') <5>
+vertex.labels()
+vertex.addLabel('organism') <6>
+vertex.labels()
+vertex.removeLabel('human') <7>
+vertex.label()
+g = graph.traversal()
+g.V().has(label,'organism') <8>
+g.V().has(label,of('organism')) <9>
+g.V().has(label,of('organism')).has(label,of('animal'))
+g.V().has(label,of('organism').and(of('animal')))
+graph.close()
+----
+
+<1> Typecasting to a `Neo4jVertex` is only required in Java.
+<2> The standard `Vertex.label()` method returns all the labels in alphabetical order concatenated using `::`.
+<3> `Neo4jVertex.labels()` method returns the individual labels as a set.
+<4> `Neo4jVertex.addLabel()` method adds a single label.
+<5> `Neo4jVertex.removeLabel()` method removes a single label.
+<6> Labels are unique and thus duplicate labels don't exist.
+<7> If a label that does not exist is removed, nothing happens.
+<8> `P.eq()` does a full string match and should only be used if multi-labels are not leveraged.
+<9> `LabelP.of()` is specific to `Neo4jGraph` and used for multi-label matching.
+
+IMPORTANT: `LabelP.of()` is only required if multi-labels are leveraged. `LabelP.of()` is used when
+filtering/looking-up vertices by their label(s) as the standard `P.eq()` does a direct match on the `::`-representation
+of `vertex.label()`
+
+Loading with BulkLoaderVertexProgram
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load
+large amounts of data to and from Neo4j. The following code demonstrates how to load the modern graph from TinkerGraph
+into Neo4j:
+
+[gremlin-groovy]
+----
+wgConf = 'conf/neo4j-standalone.properties'
+modern = TinkerFactory.createModern()
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(wgConf).create(modern)
+modern.compute().workers(1).program(blvp).submit().get()
+graph = GraphFactory.open(wgConf)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# neo4j-standalone.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
+gremlin.neo4j.directory=/tmp/neo4j
+gremlin.neo4j.conf.node_auto_indexing=true
+gremlin.neo4j.conf.relationship_auto_indexing=true
+----

[02/19] incubator-tinkerpop git commit: Minor tweak to artifact deployment in release docs.

Posted by dk...@apache.org.

Minor tweak to artifact deployment in release docs.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/fea3e313
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/fea3e313
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/fea3e313

Branch: refs/heads/master
Commit: fea3e313cc5c8f93b413478998ee20ae9f662a01
Parents: ba0b7b7
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Feb 15 15:18:59 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Tue Feb 16 07:45:59 2016 -0500

----------------------------------------------------------------------
 docs/src/dev/developer/release.asciidoc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/fea3e313/docs/src/dev/developer/release.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/dev/developer/release.asciidoc b/docs/src/dev/developer/release.asciidoc
index af2e24e..30749bf 100644
--- a/docs/src/dev/developer/release.asciidoc
+++ b/docs/src/dev/developer/release.asciidoc
@@ -148,9 +148,10 @@ A positive vote for a particular release from the Apache Incubator is required t
 Release & Promote
 -----------------
 
-. Close the staging repository at link:https://repository.apache.org/[Apache Nexus]) and then release.
+. Login to link:https://repository.apache.org/[Apache Nexus] and release the previously closed repository.
 . `svn co --depth empty https://dist.apache.org/repos/dist/dev/incubator/tinkerpop dev; svn up dev/xx.yy.zz`
 . `svn co --depth empty https://dist.apache.org/repos/dist/release/incubator/tinkerpop release; mkdir release/xx.yy.zz`
+. Copy release files from `dev/xx.yy.zz` to `release/xx.yy.zz`.
 . `cd release; svn add xx.yy.zz/; svn ci -m "TinkerPop xx.yy.zz release"`
 . If there is are releases present in SVN that represents lines of code that are no longer under development, then remove those releases. In other words, if `3.1.0-incubating` is present and `3.1.1-incubating` is released then remove `3.1.0-incubating`.  However, if `3.0.2-incubating` is present and that line of code is still under potential development, it may stay.
 . Update homepage with references to latest distribution and to other internal links elsewhere on the page.

[13/19] incubator-tinkerpop git commit: splitted implementations.asciidoc into implementations-hadoop.asciidoc and implementations-neo4j.asciidoc

Posted by dk...@apache.org.

splitted implementations.asciidoc into implementations-hadoop.asciidoc and implementations-neo4j.asciidoc


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/bbf5b3f4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/bbf5b3f4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/bbf5b3f4

Branch: refs/heads/master
Commit: bbf5b3f4d61c0aa2266b7430a1e083e8f4c01920
Parents: 70a3065
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Thu Feb 18 23:52:55 2016 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Thu Feb 18 23:52:55 2016 +0100

----------------------------------------------------------------------
 .../reference/implementations-hadoop.asciidoc   |  929 +++++++++
 .../reference/implementations-neo4j.asciidoc    |  921 +++++++++
 docs/src/reference/implementations.asciidoc     | 1835 ------------------
 docs/src/reference/index.asciidoc               |    3 +-
 4 files changed, 1852 insertions(+), 1836 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/bbf5b3f4/docs/src/reference/implementations-hadoop.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/implementations-hadoop.asciidoc b/docs/src/reference/implementations-hadoop.asciidoc
new file mode 100644
index 0000000..376f377
--- /dev/null
+++ b/docs/src/reference/implementations-hadoop.asciidoc
@@ -0,0 +1,929 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[hadoop-gremlin]]
+Hadoop-Gremlin
+--------------
+
+[source,xml]
+----
+<dependency>
+   <groupId>org.apache.tinkerpop</groupId>
+   <artifactId>hadoop-gremlin</artifactId>
+   <version>x.y.z</version>
+</dependency>
+----
+
+image:hadoop-logo-notext.png[width=100,float=left] link:http://hadoop.apache.org/[Hadoop] is a distributed
+computing framework that is used to process data represented across a multi-machine compute cluster. When the
+data in the Hadoop cluster represents a TinkerPop3 graph, then Hadoop-Gremlin can be used to process the graph
+using both TinkerPop3's OLTP and OLAP graph computing models.
+
+IMPORTANT: This section assumes that the user has a Hadoop 2.x cluster functioning. For more information on getting
+started with Hadoop, please see the
+link:http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/SingleCluster.html[Single Node Setup]
+tutorial. Moreover, if using `GiraphGraphComputer` or `SparkGraphComputer` it is advisable that the reader also
+familiarize their self with Giraph (link:http://giraph.apache.org/quick_start.html[Getting Started]) and Spark
+(link:http://spark.apache.org/docs/latest/quick-start.html[Quick Start]).
+
+Installing Hadoop-Gremlin
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `HADOOP_GREMLIN_LIBS` references locations that contains jars that should be uploaded to a respective
+distributed cache (link:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html[YARN] or SparkServer).
+Note that the locations in `HADOOP_GREMLIN_LIBS` can be a colon-separated (`:`) and all jars from all locations will
+be loaded into the cluster. Typically, only the jars of the respective GraphComputer are required to be loaded (e.g.
+`GiraphGraphComputer` plugin lib directory).
+
+[source,shell]
+export HADOOP_GREMLIN_LIBS=/usr/local/gremlin-console/ext/giraph-gremlin/lib
+
+If using <<gremlin-console,Gremlin Console>>, it is important to install the Hadoop-Gremlin plugin. Note that
+Hadoop-Gremlin requires a Gremlin Console restart after installing.
+
+[source,text]
+----
+$ bin/gremlin.sh
+
+         \,,,/
+         (o o)
+-----oOOo-(3)-oOOo-----
+plugin activated: tinkerpop.server
+plugin activated: tinkerpop.utilities
+plugin activated: tinkerpop.tinkergraph
+gremlin> :install org.apache.tinkerpop hadoop-gremlin x.y.z
+==>loaded: [org.apache.tinkerpop, hadoop-gremlin, x.y.z] - restart the console to use [tinkerpop.hadoop]
+gremlin> :q
+$ bin/gremlin.sh
+
+         \,,,/
+         (o o)
+-----oOOo-(3)-oOOo-----
+plugin activated: tinkerpop.server
+plugin activated: tinkerpop.utilities
+plugin activated: tinkerpop.tinkergraph
+gremlin> :plugin use tinkerpop.hadoop
+==>tinkerpop.hadoop activated
+gremlin>
+----
+
+Properties Files
+~~~~~~~~~~~~~~~~
+
+`HadoopGraph` makes use of properties files which ultimately get turned into Apache configurations and/or
+Hadoop configurations. The example properties file presented below is located at `conf/hadoop/hadoop-gryo.properties`.
+
+[source,text]
+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.inputLocation=tinkerpop-modern.kryo
+gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+gremlin.hadoop.outputLocation=output
+gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
+gremlin.hadoop.jarsInDistributedCache=true
+####################################
+# Spark Configuration              #
+####################################
+spark.master=local[4]
+spark.executor.memory=1g
+spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+####################################
+# SparkGraphComputer Configuration #
+####################################
+gremlin.spark.graphInputRDD=org.apache.tinkerpop.gremlin.spark.structure.io.InputRDDFormat
+gremlin.spark.graphOutputRDD=org.apache.tinkerpop.gremlin.spark.structure.io.OutputRDDFormat
+gremlin.spark.persistContext=true
+#####################################
+# GiraphGraphComputer Configuration #
+#####################################
+giraph.minWorkers=2
+giraph.maxWorkers=2
+giraph.useOutOfCoreGraph=true
+giraph.useOutOfCoreMessages=true
+mapreduce.map.java.opts=-Xmx1024m
+mapreduce.reduce.java.opts=-Xmx1024m
+giraph.numInputThreads=2
+giraph.numComputeThreads=2
+
+A review of the Hadoop-Gremlin specific properties are provided in the table below. For the respective OLAP
+engines (<<sparkgraphcomputer,`SparkGraphComputer`>> or <<giraphgraphcomputer,`GiraphGraphComputer`>>) refer
+to their respective documentation for configuration options.
+
+[width="100%",cols="2,10",options="header"]
+|=========================================================
+|Property |Description
+|gremlin.graph |The class of the graph to construct using GraphFactory.
+|gremlin.hadoop.inputLocation |The location of the input file(s) for Hadoop-Gremlin to read the graph from.
+|gremlin.hadoop.graphInputFormat |The format that the graph input file(s) are represented in.
+|gremlin.hadoop.outputLocation |The location to write the computed HadoopGraph to.
+|gremlin.hadoop.graphOutputFormat |The format that the output file(s) should be represented in.
+|gremlin.hadoop.jarsInDistributedCache |Whether to upload the Hadoop-Gremlin jars to a distributed cache (necessary if jars are not on the machines' classpaths).
+|=========================================================
+
+
+
+Along with the properties above, the numerous link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[Hadoop specific properties]
+can be added as needed to tune and parameterize the executed Hadoop-Gremlin job on the respective Hadoop cluster.
+
+IMPORTANT: As the size of the graphs being processed becomes large, it is important to fully understand how the
+underlying OLAP engine (e.g. Spark, Giraph, etc.) works and understand the numerous parameterizations offered by
+these systems. Such knowledge can help alleviate out of memory exceptions, slow load times, slow processing times,
+garbage collection issues, etc.
+
+OLTP Hadoop-Gremlin
+~~~~~~~~~~~~~~~~~~~
+
+image:hadoop-pipes.png[width=180,float=left] It is possible to execute OLTP operations over a `HadoopGraph`.
+However, realize that the underlying HDFS files are not random access and thus, to retrieve a vertex, a linear scan
+is required. OLTP operations are useful for peeking into the graph prior to executing a long running OLAP job -- e.g.
+`g.V().valueMap().limit(10)`.
+
+CAUTION: OLTP operations on `HadoopGraph` are not efficient. They require linear scans to execute and are unreasonable
+for large graphs. In such large graph situations, make use of <<traversalvertexprogram,TraversalVertexProgram>>
+which is the OLAP Gremlin machine.
+
+[gremlin-groovy]
+----
+hdfs.copyFromLocal('data/tinkerpop-modern.kryo', 'tinkerpop-modern.kryo')
+hdfs.ls()
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+g = graph.traversal()
+g.V().count()
+g.V().out().out().values('name')
+g.V().group().by{it.value('name')[1]}.by('name').next()
+----
+
+OLAP Hadoop-Gremlin
+~~~~~~~~~~~~~~~~~~~
+
+image:hadoop-furnace.png[width=180,float=left] Hadoop-Gremlin was designed to execute OLAP operations via
+`GraphComputer`. The OLTP examples presented previously are reproduced below, but using `TraversalVertexProgram`
+for the execution of the Gremlin traversal.
+
+A `Graph` in TinkerPop3 can support any number of `GraphComputer` implementations. Out of the box, Hadoop-Gremlin
+supports the following three implementations.
+
+* <<mapreducegraphcomputer,`MapReduceGraphComputer`>>: Leverages Hadoop's MapReduce engine to execute TinkerPop3 OLAP
+computations. (*coming soon*)
+** The graph must fit within the total disk space of the Hadoop cluster (supports massive graphs). Message passing is
+coordinated via MapReduce jobs over the on-disk graph (slow traversals).
+* <<sparkgraphcomputer,`SparkGraphComputer`>>: Leverages Apache Spark to execute TinkerPop3 OLAP computations.
+** The graph may fit within the total RAM of the cluster (supports larger graphs). Message passing is coordinated via
+Spark map/reduce/join operations on in-memory and disk-cached data (average speed traversals).
+* <<giraphgraphcomputer,`GiraphGraphComputer`>>: Leverages Apache Giraph to execute TinkerPop3 OLAP computations.
+** The graph should fit within the total RAM of the Hadoop cluster (graph size restriction), though "out-of-core"
+processing is possible. Message passing is coordinated via ZooKeeper for the in-memory graph (speedy traversals).
+
+TIP: image:gremlin-sugar.png[width=50,float=left] For those wanting to use the <<sugar-plugin,SugarPlugin>> with
+their submitted traversal, do `:remote config useSugar true` as well as `:plugin use tinkerpop.sugar` at the start of
+the Gremlin Console session if it is not already activated.
+
+Note that `SparkGraphComputer` and `GiraphGraphComputer` are loaded via their respective plugins. Typically only
+one plugin or the other is loaded depending on the desired `GraphComputer` to use.
+
+[source,text]
+----
+$ bin/gremlin.sh
+
+         \,,,/
+         (o o)
+-----oOOo-(3)-oOOo-----
+plugin activated: tinkerpop.server
+plugin activated: tinkerpop.utilities
+plugin activated: tinkerpop.tinkergraph
+plugin activated: tinkerpop.hadoop
+gremlin> :install org.apache.tinkerpop giraph-gremlin x.y.z
+==>loaded: [org.apache.tinkerpop, giraph-gremlin, x.y.z] - restart the console to use [tinkerpop.giraph]
+gremlin> :install org.apache.tinkerpop spark-gremlin x.y.z
+==>loaded: [org.apache.tinkerpop, spark-gremlin, x.y.z] - restart the console to use [tinkerpop.spark]
+gremlin> :q
+$ bin/gremlin.sh
+
+         \,,,/
+         (o o)
+-----oOOo-(3)-oOOo-----
+plugin activated: tinkerpop.server
+plugin activated: tinkerpop.utilities
+plugin activated: tinkerpop.tinkergraph
+plugin activated: tinkerpop.hadoop
+gremlin> :plugin use tinkerpop.giraph
+==>tinkerpop.giraph activated
+gremlin> :plugin use tinkerpop.spark
+==>tinkerpop.spark activated
+----
+
+WARNING: Hadoop, Spark, and Giraph all depend on many of the same libraries (e.g. ZooKeeper, Snappy, Netty, Guava,
+etc.). Unfortunately, typically these dependencies are not to the same versions of the respective libraries. As such,
+it is best to *not* have both Spark and Giraph plugins loaded in the same console session nor in the same Java
+project (though intelligent `<exclusion>`-usage can help alleviate conflicts in a Java project).
+
+CAUTION: It is important to note that when doing an OLAP traversal, any resulting vertices, edges, or properties will be
+attached to the source graph. For Hadoop-based graphs, this may lead to linear search times on massive graphs. Thus,
+if vertex, edge, or property objects are to be returns (as a final result), it is best to `.id()` to get the id
+of the object and not the actual attached object.
+
+[[mapreducegraphcomputer]]
+MapReduceGraphComputer
+^^^^^^^^^^^^^^^^^^^^^^
+
+*COMING SOON*
+
+[[sparkgraphcomputer]]
+SparkGraphComputer
+^^^^^^^^^^^^^^^^^^
+
+[source,xml]
+----
+<dependency>
+   <groupId>org.apache.tinkerpop</groupId>
+   <artifactId>spark-gremlin</artifactId>
+   <version>x.y.z</version>
+</dependency>
+----
+
+image:spark-logo.png[width=175,float=left] link:http://spark.apache.org[Spark] is an Apache Software Foundation
+project focused on general-purpose OLAP data processing. Spark provides a hybrid in-memory/disk-based distributed
+computing model that is similar to Hadoop's MapReduce model. Spark maintains a fluent function chaining DSL that is
+arguably easier for developers to work with than native Hadoop MapReduce. Spark-Gremlin provides an implementation of
+the bulk-synchronous parallel, distributed message passing algorithm within Spark and thus, any `VertexProgram` can be
+executed over `SparkGraphComputer`.
+
+If `SparkGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
+specified in `HADOOP_GREMLIN_LIBS`.
+
+[source,shell]
+export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/spark-gremlin/lib
+
+Furthermore the `lib/` directory should be distributed across all machines in the SparkServer cluster. For this purpose TinkerPop
+provides a helper script, which takes the Spark installation directory and the the Spark machines as input:
+
+[source,shell]
+bin/init-tp-spark.sh /usr/local/spark spark@10.0.0.1 spark@10.0.0.2 spark@10.0.0.3
+
+Once the `lib/` directory is distributed, `SparkGraphComputer` can be used as follows.
+
+[gremlin-groovy]
+----
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+g = graph.traversal(computer(SparkGraphComputer))
+g.V().count()
+g.V().out().out().values('name')
+----
+
+For using lambdas in Gremlin-Groovy, simply provide `:remote connect` a `TraversalSource` which leverages SparkGraphComputer.
+
+[gremlin-groovy]
+----
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+g = graph.traversal(computer(SparkGraphComputer))
+:remote connect tinkerpop.hadoop graph g
+:> g.V().group().by{it.value('name')[1]}.by('name')
+----
+
+The `SparkGraphComputer` algorithm leverages Spark's caching abilities to reduce the amount of data shuffled across
+the wire on each iteration of the <<vertexprogram,`VertexProgram`>>. When the graph is loaded as a Spark RDD
+(Resilient Distributed Dataset) it is immediately cached as `graphRDD`. The `graphRDD` is a distributed adjacency
+list which encodes the vertex, its properties, and all its incident edges. On the first iteration, each vertex
+(in parallel) is passed through `VertexProgram.execute()`. This yields an output of the vertex's mutated state
+(i.e. updated compute keys -- `propertyX`) and its outgoing messages. This `viewOutgoingRDD` is then reduced to
+`viewIncomingRDD` where the outgoing messages are sent to their respective vertices. If a `MessageCombiner` exists
+for the vertex program, then messages are aggregated locally and globally to ultimately yield one incoming message
+for the vertex. This reduce sequence is the "message pass." If the vertex program does not terminate on this
+iteration, then the `viewIncomingRDD` is joined with the cached `graphRDD` and the process continues. When there
+are no more iterations, there is a final join and the resultant RDD is stripped of its edges and messages. This
+`mapReduceRDD` is cached and is processed by each <<mapreduce,`MapReduce`>> job in the
+<<graphcomputer,`GraphComputer`>> computation.
+
+image::spark-algorithm.png[width=775]
+
+[width="100%",cols="2,10",options="header"]
+|========================================================
+|Property |Description
+|gremlin.spark.graphInputRDD |A class for creating RDD's from underlying graph data, defaults to Hadoop `InputFormat`.
+|gremlin.spark.graphOutputRDD |A class for output RDD's, defaults to Hadoop `OutputFormat`.
+|gremlin.spark.graphStorageLevel |What `StorageLevel` to use for the cached graph during job execution (default `MEMORY_ONLY`).
+|gremlin.spark.persistContext |Whether to create a new `SparkContext` for every `SparkGraphComputer` or to reuse an existing one.
+|gremlin.spark.persistStorageLevel |What `StorageLevel` to use when persisted RDDs via `PersistedOutputRDD` (default `MEMORY_ONLY`).
+|========================================================
+
+InputRDD and OutputRDD
+++++++++++++++++++++++
+
+If the provider/user does not want to use Hadoop `InputFormats`, it is possible to leverage Spark's RDD
+constructs directly. There is a `gremlin.spark.graphInputRDD` configuration that references a `Class<? extends
+InputRDD>`. An `InputRDD` provides a read method that takes a `SparkContext` and returns a graphRDD. Likewise, use
+`gremlin.spark.graphOutputRDD` and the respective `OutputRDD`.
+
+If the graph system provider uses an `InputRDD`, the RDD should maintain an associated `org.apache.spark.Partitioner`. By doing so,
+`SparkGraphComputer` will not partition the loaded graph across the cluster as it has already been partitioned by the graph system provider.
+This can save a significant amount of time and space resources.
+If the `InputRDD` does not have a registered partitioner, `SparkGraphComputer` will partition the graph using
+a `org.apache.spark.HashPartitioner` with the number of partitions being either the number of existing partitions in the input (e.g. input splits)
+or the user specified number of `GraphComputer.workers()`.
+
+Using a Persisted Context
++++++++++++++++++++++++++
+
+It is possible to persist the graph RDD between jobs within the `SparkContext` (e.g. SparkServer) by leveraging `PersistedOutputRDD`.
+Note that `gremlin.spark.persistContext` should be set to `true` or else the persisted RDD will be destroyed when the `SparkContext` closes.
+The persisted RDD is named by the `gremlin.hadoop.outputLocation` configuration. Similarly, `PersistedInputRDD` is used with respective
+`gremlin.hadoop.inputLocation` to retrieve the persisted RDD from the `SparkContext`.
+
+When using a persistent `SparkContext` the configuration used by the original Spark Configuration will be inherited by all threaded
+references to that Spark Context. The exception to this rule are those properties which have a specific thread local effect.
+
+.Thread Local Properties
+. spark.jobGroup.id
+. spark.job.description
+. spark.job.interruptOnCancel
+. spark.scheduler.pool
+
+Finally, there is a `spark` object that can be used to manage persisted RDDs (see <<interacting-with-spark, Interacting with Spark>>).
+
+[[bulkdumpervertexprogramusingspark]]
+Exporting with BulkDumperVertexProgram
+++++++++++++++++++++++++++++++++++++++
+
+The <<bulkdumpervertexprogram, BulkDumperVertexProgram>> exports a whole graph in any of the supported Hadoop GraphOutputFormats (`GraphSONOutputFormat`,
+`GryoOutputFormat` or `ScriptOutputFormat`). The example below takes a Hadoop graph as the input (in `GryoInputFormat`) and exports it as a GraphSON file
+(`GraphSONOutputFormat`).
+
+[gremlin-groovy]
+----
+hdfs.copyFromLocal('data/tinkerpop-modern.kryo', 'tinkerpop-modern.kryo')
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+graph.configuration().setProperty('gremlin.hadoop.graphOutputFormat', 'org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat')
+graph.compute(SparkGraphComputer).program(BulkDumperVertexProgram.build().create()).submit().get()
+hdfs.ls('output')
+hdfs.head('output/~g')
+----
+
+Loading with BulkLoaderVertexProgram
+++++++++++++++++++++++++++++++++++++
+
+The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load large
+amounts of data to and from different `Graph` implementations. The following code demonstrates how to load the
+Grateful Dead graph from HadoopGraph into TinkerGraph over Spark:
+
+[gremlin-groovy]
+----
+hdfs.copyFromLocal('data/grateful-dead.kryo', 'grateful-dead.kryo')
+readGraph = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
+writeGraph = 'conf/tinkergraph-gryo.properties'
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(writeGraph).create(readGraph)
+readGraph.compute(SparkGraphComputer).workers(1).program(blvp).submit().get()
+:set max-iteration 10
+graph = GraphFactory.open(writeGraph)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# hadoop-grateful-gryo.properties
+
+#
+# Hadoop Graph Configuration
+#
+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+gremlin.hadoop.inputLocation=grateful-dead.kryo
+gremlin.hadoop.outputLocation=output
+gremlin.hadoop.jarsInDistributedCache=true
+
+#
+# SparkGraphComputer Configuration
+#
+spark.master=local[1]
+spark.executor.memory=1g
+spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+----
+
+[source,properties]
+----
+# tinkergraph-gryo.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+gremlin.tinkergraph.graphFormat=gryo
+gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
+----
+
+IMPORTANT: The path to TinkerGraph jars needs to be included in the `HADOOP_GREMLIN_LIBS` for the above example to work.
+
+[[giraphgraphcomputer]]
+GiraphGraphComputer
+^^^^^^^^^^^^^^^^^^^
+
+[source,xml]
+----
+<dependency>
+   <groupId>org.apache.tinkerpop</groupId>
+   <artifactId>giraph-gremlin</artifactId>
+   <version>x.y.z</version>
+</dependency>
+----
+
+image:giraph-logo.png[width=100,float=left] link:http://giraph.apache.org[Giraph] is an Apache Software Foundation
+project focused on OLAP-based graph processing. Giraph makes use of the distributed graph computing paradigm made
+popular by Google's Pregel. In Giraph, developers write "vertex programs" that get executed at each vertex in
+parallel. These programs communicate with one another in a bulk synchronous parallel (BSP) manner. This model aligns
+with TinkerPop3's `GraphComputer` API. TinkerPop3 provides an implementation of `GraphComputer` that works for Giraph
+called `GiraphGraphComputer`. Moreover, with TinkerPop3's <<mapreduce,MapReduce>>-framework, the standard
+Giraph/Pregel model is extended to support an arbitrary number of MapReduce phases to aggregate and yield results
+from the graph. Below are examples using `GiraphGraphComputer` from the <<gremlin-console,Gremlin-Console>>.
+
+WARNING: Giraph uses a large number of Hadoop counters. The default for Hadoop is 120. In `mapred-site.xml` it is
+possible to increase the limit it via the `mapreduce.job.counters.max` property. A good value to use is 1000. This
+is a cluster-wide property so be sure to restart the cluster after updating.
+
+WARNING: The maximum number of workers can be no larger than the number of map-slots in the Hadoop cluster minus 1.
+For example, if the Hadoop cluster has 4 map slots, then `giraph.maxWorkers` can not be larger than 3. One map-slot
+is reserved for the master compute node and all other slots can be allocated as workers to execute the VertexPrograms
+on the vertices of the graph.
+
+If `GiraphGraphComputer` will be used as the `GraphComputer` for `HadoopGraph` then its `lib` directory should be
+specified in `HADOOP_GREMLIN_LIBS`.
+
+[source,shell]
+export HADOOP_GREMLIN_LIBS=$HADOOP_GREMLIN_LIBS:/usr/local/gremlin-console/ext/giraph-gremlin/lib
+
+Or, the user can specify the directory in the Gremlin Console.
+
+[source,groovy]
+System.setProperty('HADOOP_GREMLIN_LIBS',System.getProperty('HADOOP_GREMLIN_LIBS') + ':' + '/usr/local/gremlin-console/ext/giraph-gremlin/lib')
+
+[gremlin-groovy]
+----
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+g = graph.traversal(computer(GiraphGraphComputer))
+g.V().count()
+g.V().out().out().values('name')
+----
+
+IMPORTANT: The examples above do not use lambdas (i.e. closures in Gremlin-Groovy). This makes the traversal
+serializable and thus, able to be distributed to all machines in the Hadoop cluster. If a lambda is required in a
+traversal, then the traversal must be sent as a `String` and compiled locally at each machine in the cluster. The
+following example demonstrates the `:remote` command which allows for submitting Gremlin traversals as a `String`.
+
+[gremlin-groovy]
+----
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+g = graph.traversal(computer(GiraphGraphComputer))
+:remote connect tinkerpop.hadoop graph g
+:> g.V().group().by{it.value('name')[1]}.by('name')
+result
+result.memory.runtime
+result.memory.keys()
+result.memory.get('~reducing')
+----
+
+NOTE: If the user explicitly specifies `giraph.maxWorkers` and/or `giraph.numComputeThreads` in the configuration,
+then these values will be used by Giraph. However, if these are not specified and the user never calls
+`GraphComputer.workers()` then `GiraphGraphComputer` will try to compute the number of workers/threads to use based
+on the cluster's profile.
+
+Loading with BulkLoaderVertexProgram
+++++++++++++++++++++++++++++++++++++
+
+The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk loader that can be used to load
+large amounts of data to and from different `Graph` implementations. The following code demonstrates how to load
+the Grateful Dead graph from HadoopGraph into TinkerGraph over Giraph:
+
+[gremlin-groovy]
+----
+hdfs.copyFromLocal('data/grateful-dead.kryo', 'grateful-dead.kryo')
+readGraph = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
+writeGraph = 'conf/tinkergraph-gryo.properties'
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(writeGraph).create(readGraph)
+readGraph.compute(GiraphGraphComputer).workers(1).program(blvp).submit().get()
+:set max-iteration 10
+graph = GraphFactory.open(writeGraph)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# hadoop-grateful-gryo.properties
+
+#
+# Hadoop Graph Configuration
+#
+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
+gremlin.hadoop.inputLocation=grateful-dead.kryo
+gremlin.hadoop.outputLocation=output
+gremlin.hadoop.jarsInDistributedCache=true
+
+#
+# GiraphGraphComputer Configuration
+#
+giraph.minWorkers=1
+giraph.maxWorkers=1
+giraph.useOutOfCoreGraph=true
+giraph.useOutOfCoreMessages=true
+mapred.map.child.java.opts=-Xmx1024m
+mapred.reduce.child.java.opts=-Xmx1024m
+giraph.numInputThreads=4
+giraph.numComputeThreads=4
+giraph.maxMessagesInMemory=100000
+----
+
+[source,properties]
+----
+# tinkergraph-gryo.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+gremlin.tinkergraph.graphFormat=gryo
+gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
+----
+
+NOTE: The path to TinkerGraph needs to be included in the `HADOOP_GREMLIN_LIBS` for the above example to work.
+
+Input/Output Formats
+~~~~~~~~~~~~~~~~~~~~
+
+image:adjacency-list.png[width=300,float=right] Hadoop-Gremlin provides various I/O formats -- i.e. Hadoop
+`InputFormat` and `OutputFormat`. All of the formats make use of an link:http://en.wikipedia.org/wiki/Adjacency_list[adjacency list]
+representation of the graph where each "row" represents a single vertex, its properties, and its incoming and
+outgoing edges.
+
+{empty} +
+
+[[gryo-io-format]]
+Gryo I/O Format
+^^^^^^^^^^^^^^^
+
+* **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat`
+* **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat`
+
+<<gryo-reader-writer,Gryo>> is a binary graph format that leverages link:https://github.com/EsotericSoftware/kryo[Kryo]
+to make a compact, binary representation of a vertex. It is recommended that users leverage Gryo given its space/time
+savings over text-based representations.
+
+NOTE: The `GryoInputFormat` is splittable.
+
+[[graphson-io-format]]
+GraphSON I/O Format
+^^^^^^^^^^^^^^^^^^^
+
+* **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat`
+* **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat`
+
+<<graphson-reader-writer,GraphSON>> is a JSON based graph format. GraphSON is a space-expensive graph format in that
+it is a text-based markup language. However, it is convenient for many developers to work with as its structure is
+simple (easy to create and parse).
+
+The data below represents an adjacency list representation of the classic TinkerGraph toy graph in GraphSON format.
+
+[source,json]
+----
+{"id":1,"label":"person","outE":{"created":[{"id":9,"inV":3,"properties":{"weight":0.4}}],"knows":[{"id":7,"inV":2,"properties":{"weight":0.5}},{"id":8,"inV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":0,"value":"marko"}],"age":[{"id":1,"value":29}]}}
+{"id":2,"label":"person","inE":{"knows":[{"id":7,"outV":1,"properties":{"weight":0.5}}]},"properties":{"name":[{"id":2,"value":"vadas"}],"age":[{"id":3,"value":27}]}}
+{"id":3,"label":"software","inE":{"created":[{"id":9,"outV":1,"properties":{"weight":0.4}},{"id":11,"outV":4,"properties":{"weight":0.4}},{"id":12,"outV":6,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":4,"value":"lop"}],"lang":[{"id":5,"value":"java"}]}}
+{"id":4,"label":"person","inE":{"knows":[{"id":8,"outV":1,"properties":{"weight":1.0}}]},"outE":{"created":[{"id":10,"inV":5,"properties":{"weight":1.0}},{"id":11,"inV":3,"properties":{"weight":0.4}}]},"properties":{"name":[{"id":6,"value":"josh"}],"age":[{"id":7,"value":32}]}}
+{"id":5,"label":"software","inE":{"created":[{"id":10,"outV":4,"properties":{"weight":1.0}}]},"properties":{"name":[{"id":8,"value":"ripple"}],"lang":[{"id":9,"value":"java"}]}}
+{"id":6,"label":"person","outE":{"created":[{"id":12,"inV":3,"properties":{"weight":0.2}}]},"properties":{"name":[{"id":10,"value":"peter"}],"age":[{"id":11,"value":35}]}}
+----
+
+[[script-io-format]]
+Script I/O Format
+^^^^^^^^^^^^^^^^^
+
+* **InputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat`
+* **OutputFormat**: `org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptOutputFormat`
+
+`ScriptInputFormat` and `ScriptOutputFormat` take an arbitrary script and use that script to either read or write
+`Vertex` objects, respectively. This can be considered the most general `InputFormat`/`OutputFormat` possible in that
+Hadoop-Gremlin uses the user provided script for all reading/writing.
+
+ScriptInputFormat
++++++++++++++++++
+
+The data below represents an adjacency list representation of the classic TinkerGraph toy graph. First line reads,
+"vertex `1`, labeled `person` having 2 property values (`marko` and `29`) has 3 outgoing edges; the first edge is
+labeled `knows`, connects the current vertex `1` with vertex `2` and has a property value `0.4`, and so on."
+
+[source]
+1:person:marko:29 knows:2:0.5,knows:4:1.0,created:3:0.4
+2:person:vadas:27
+3:project:lop:java
+4:person:josh:32 created:3:0.4,created:5:1.0
+5:project:ripple:java
+6:person:peter:35 created:3:0.2
+
+There is no corresponding `InputFormat` that can parse this particular file (or some adjacency list variant of it).
+As such, `ScriptInputFormat` can be used. With `ScriptInputFormat` a script is stored in HDFS and leveraged by each
+mapper in the Hadoop job. The script must have the following method defined:
+
+[source,groovy]
+def parse(String line, ScriptElementFactory factory) { ... }
+
+`ScriptElementFactory` is a legacy from previous versions and, although it's still functional, it should no longer be used.
+In order to create vertices and edges, the `parse()` method gets access to a global variable named `graph`, which holds
+the local `StarGraph` for the current line/vertex.
+
+An appropriate `parse()` for the above adjacency list file is:
+
+[source,groovy]
+def parse(line, factory) {
+    def parts = line.split(/ /)
+    def (id, label, name, x) = parts[0].split(/:/).toList()
+    def v1 = graph.addVertex(T.id, id, T.label, label)
+    if (name != null) v1.property('name', name) // first value is always the name
+    if (x != null) {
+        // second value depends on the vertex label; it's either
+        // the age of a person or the language of a project
+        if (label.equals('project')) v1.property('lang', x)
+        else v1.property('age', Integer.valueOf(x))
+    }
+    if (parts.length == 2) {
+        parts[1].split(/,/).grep { !it.isEmpty() }.each {
+            def (eLabel, refId, weight) = it.split(/:/).toList()
+            def v2 = graph.addVertex(T.id, refId)
+            v1.addOutEdge(eLabel, v2, 'weight', Double.valueOf(weight))
+        }
+    }
+    return v1
+}
+
+The resultant `Vertex` denotes whether the line parsed yielded a valid Vertex. As such, if the line is not valid
+(e.g. a comment line, a skip line, etc.), then simply return `null`.
+
+ScriptOutputFormat Support
+++++++++++++++++++++++++++
+
+The principle above can also be used to convert a vertex to an arbitrary `String` representation that is ultimately
+streamed back to a file in HDFS. This is the role of `ScriptOutputFormat`. `ScriptOutputFormat` requires that the
+provided script maintains a method with the following signature:
+
+[source,groovy]
+def stringify(Vertex vertex) { ... }
+
+An appropriate `stringify()` to produce output in the same format that was shown in the `ScriptInputFormat` sample is:
+
+[source,groovy]
+def stringify(vertex) {
+    def v = vertex.values('name', 'age', 'lang').inject(vertex.id(), vertex.label()).join(':')
+    def outE = vertex.outE().map {
+        def e = it.get()
+        e.values('weight').inject(e.label(), e.inV().next().id()).join(':')
+    }.join(',')
+    return [v, outE].join('\t')
+}
+
+
+
+Storage Systems
+~~~~~~~~~~~~~~~
+
+Hadoop-Gremlin provides two implementations of the `Storage` API:
+
+* `FileSystemStorage`: Access HDFS and local file system data.
+* `SparkContextStorage`: Access Spark persisted RDD data.
+
+[[interacting-with-hdfs]]
+Interacting with HDFS
+^^^^^^^^^^^^^^^^^^^^^
+
+The distributed file system of Hadoop is called link:http://en.wikipedia.org/wiki/Apache_Hadoop#Hadoop_distributed_file_system[HDFS].
+The results of any OLAP operation are stored in HDFS accessible via `hdfs`. For local file system access, there is `local`.
+
+[gremlin-groovy]
+----
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+graph.compute(SparkGraphComputer).program(PeerPressureVertexProgram.build().create(graph)).mapReduce(ClusterCountMapReduce.build().memoryKey('clusterCount').create()).submit().get();
+hdfs.ls()
+hdfs.ls('output')
+hdfs.head('output', GryoInputFormat)
+hdfs.head('output', 'clusterCount', SequenceFileInputFormat)
+hdfs.rm('output')
+hdfs.ls()
+----
+
+[[interacting-with-spark]]
+Interacting with Spark
+^^^^^^^^^^^^^^^^^^^^^^
+
+If a Spark context is persisted, then Spark RDDs will remain the Spark cache and accessible over subsequent jobs.
+RDDs are retrieved and saved to the `SparkContext` via `PersistedInputRDD` and `PersistedOutputRDD` respectivly.
+Persisted RDDs can be accessed using `spark`.
+
+[gremlin-groovy]
+----
+Spark.create('local[4]')
+graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
+graph.configuration().setProperty('gremlin.spark.graphOutputRDD', PersistedOutputRDD.class.getCanonicalName())
+graph.configuration().clearProperty('gremlin.hadoop.graphOutputFormat')
+graph.configuration().setProperty('gremlin.spark.persistContext',true)
+graph.compute(SparkGraphComputer).program(PeerPressureVertexProgram.build().create(graph)).mapReduce(ClusterCountMapReduce.build().memoryKey('clusterCount').create()).submit().get();
+spark.ls()
+spark.ls('output')
+spark.head('output', PersistedInputRDD)
+spark.head('output', 'clusterCount', PersistedInputRDD)
+spark.rm('output')
+spark.ls()
+----
+
+A Command Line Example
+~~~~~~~~~~~~~~~~~~~~~~
+
+image::pagerank-logo.png[width=300]
+
+The classic link:http://en.wikipedia.org/wiki/PageRank[PageRank] centrality algorithm can be executed over the
+TinkerPop graph from the command line using `GiraphGraphComputer`.
+
+WARNING: Be sure that the `HADOOP_GREMLIN_LIBS` references the location `lib` directory of the respective
+`GraphComputer` engine being used or else the requisite dependencies will not be uploaded to the Hadoop cluster.
+
+[source,text]
+----
+$ hdfs dfs -copyFromLocal data/tinkerpop-modern.json tinkerpop-modern.json
+$ hdfs dfs -ls
+Found 2 items
+-rw-r--r--   1 marko supergroup       2356 2014-07-28 13:00 /user/marko/tinkerpop-modern.json
+$ hadoop jar target/giraph-gremlin-x.y.z-job.jar org.apache.tinkerpop.gremlin.giraph.process.computer.GiraphGraphComputer ../hadoop-gremlin/conf/hadoop-graphson.properties
+15/09/11 08:02:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+15/09/11 08:02:11 INFO computer.GiraphGraphComputer: HadoopGremlin(Giraph): PageRankVertexProgram[alpha=0.85,iterations=30]
+15/09/11 08:02:12 INFO mapreduce.JobSubmitter: number of splits:3
+15/09/11 08:02:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441915907347_0028
+15/09/11 08:02:12 INFO impl.YarnClientImpl: Submitted application application_1441915907347_0028
+15/09/11 08:02:12 INFO job.GiraphJob: Tracking URL: http://markos-macbook:8088/proxy/application_1441915907347_0028/
+15/09/11 08:02:12 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 3 mappers
+15/09/11 08:03:54 INFO mapreduce.Job: Running job: job_1441915907347_0028
+15/09/11 08:03:55 INFO mapreduce.Job: Job job_1441915907347_0028 running in uber mode : false
+15/09/11 08:03:55 INFO mapreduce.Job:  map 33% reduce 0%
+15/09/11 08:03:57 INFO mapreduce.Job:  map 67% reduce 0%
+15/09/11 08:04:01 INFO mapreduce.Job:  map 100% reduce 0%
+15/09/11 08:06:17 INFO mapreduce.Job: Job job_1441915907347_0028 completed successfully
+15/09/11 08:06:17 INFO mapreduce.Job: Counters: 80
+    File System Counters
+        FILE: Number of bytes read=0
+        FILE: Number of bytes written=483918
+        FILE: Number of read operations=0
+        FILE: Number of large read operations=0
+        FILE: Number of write operations=0
+        HDFS: Number of bytes read=1465
+        HDFS: Number of bytes written=1760
+        HDFS: Number of read operations=39
+        HDFS: Number of large read operations=0
+        HDFS: Number of write operations=20
+    Job Counters
+        Launched map tasks=3
+        Other local map tasks=3
+        Total time spent by all maps in occupied slots (ms)=458105
+        Total time spent by all reduces in occupied slots (ms)=0
+        Total time spent by all map tasks (ms)=458105
+        Total vcore-seconds taken by all map tasks=458105
+        Total megabyte-seconds taken by all map tasks=469099520
+    Map-Reduce Framework
+        Map input records=3
+        Map output records=0
+        Input split bytes=132
+        Spilled Records=0
+        Failed Shuffles=0
+        Merged Map outputs=0
+        GC time elapsed (ms)=1594
+        CPU time spent (ms)=0
+        Physical memory (bytes) snapshot=0
+        Virtual memory (bytes) snapshot=0
+        Total committed heap usage (bytes)=527958016
+    Giraph Stats
+        Aggregate edges=0
+        Aggregate finished vertices=0
+        Aggregate sent message message bytes=13535
+        Aggregate sent messages=186
+        Aggregate vertices=6
+        Current master task partition=0
+        Current workers=2
+        Last checkpointed superstep=0
+        Sent message bytes=438
+        Sent messages=6
+        Superstep=31
+    Giraph Timers
+        Initialize (ms)=2996
+        Input superstep (ms)=5209
+        Setup (ms)=59
+        Shutdown (ms)=9324
+        Superstep 0 GiraphComputation (ms)=3861
+        Superstep 1 GiraphComputation (ms)=4027
+        Superstep 10 GiraphComputation (ms)=4000
+        Superstep 11 GiraphComputation (ms)=4004
+        Superstep 12 GiraphComputation (ms)=3999
+        Superstep 13 GiraphComputation (ms)=4000
+        Superstep 14 GiraphComputation (ms)=4005
+        Superstep 15 GiraphComputation (ms)=4003
+        Superstep 16 GiraphComputation (ms)=4001
+        Superstep 17 GiraphComputation (ms)=4007
+        Superstep 18 GiraphComputation (ms)=3998
+        Superstep 19 GiraphComputation (ms)=4006
+        Superstep 2 GiraphComputation (ms)=4007
+        Superstep 20 GiraphComputation (ms)=3996
+        Superstep 21 GiraphComputation (ms)=4006
+        Superstep 22 GiraphComputation (ms)=4002
+        Superstep 23 GiraphComputation (ms)=3998
+        Superstep 24 GiraphComputation (ms)=4003
+        Superstep 25 GiraphComputation (ms)=4001
+        Superstep 26 GiraphComputation (ms)=4003
+        Superstep 27 GiraphComputation (ms)=4005
+        Superstep 28 GiraphComputation (ms)=4002
+        Superstep 29 GiraphComputation (ms)=4001
+        Superstep 3 GiraphComputation (ms)=3988
+        Superstep 30 GiraphComputation (ms)=4248
+        Superstep 4 GiraphComputation (ms)=4010
+        Superstep 5 GiraphComputation (ms)=3998
+        Superstep 6 GiraphComputation (ms)=3996
+        Superstep 7 GiraphComputation (ms)=4005
+        Superstep 8 GiraphComputation (ms)=4009
+        Superstep 9 GiraphComputation (ms)=3994
+        Total (ms)=138788
+    File Input Format Counters
+        Bytes Read=0
+    File Output Format Counters
+        Bytes Written=0
+$ hdfs dfs -cat output/~g/*
+{"id":1,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.15000000000000002}],"name":[{"id":0,"value":"marko"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":3.0}],"age":[{"id":1,"value":29}]}}
+{"id":5,"label":"software","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":35,"value":0.23181250000000003}],"name":[{"id":8,"value":"ripple"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":6,"value":0.0}],"lang":[{"id":9,"value":"java"}]}}
+{"id":3,"label":"software","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.4018125}],"name":[{"id":4,"value":"lop"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":0.0}],"lang":[{"id":5,"value":"java"}]}}
+{"id":4,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.19250000000000003}],"name":[{"id":6,"value":"josh"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":2.0}],"age":[{"id":7,"value":32}]}}
+{"id":2,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":35,"value":0.19250000000000003}],"name":[{"id":2,"value":"vadas"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":6,"value":0.0}],"age":[{"id":3,"value":27}]}}
+{"id":6,"label":"person","properties":{"gremlin.pageRankVertexProgram.pageRank":[{"id":35,"value":0.15000000000000002}],"name":[{"id":10,"value":"peter"}],"gremlin.pageRankVertexProgram.edgeCount":[{"id":6,"value":1.0}],"age":[{"id":11,"value":35}]}}
+----
+
+Vertex 4 ("josh") is isolated below:
+
+[source,js]
+----
+{
+  "id":4,
+  "label":"person",
+  "properties": {
+    "gremlin.pageRankVertexProgram.pageRank":[{"id":39,"value":0.19250000000000003}],
+    "name":[{"id":6,"value":"josh"}],
+    "gremlin.pageRankVertexProgram.edgeCount":[{"id":10,"value":2.0}],
+    "age":[{"id":7,"value":32}]}
+  }
+}
+----
+
+Hadoop-Gremlin for Graph System Providers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Hadoop-Gremlin is centered around `InputFormats` and `OutputFormats`. If a 3rd-party graph system provider wishes to
+leverage Hadoop-Gremlin (and its respective `GraphComputer` engines), then they need to provide, at minimum, a
+Hadoop2 `InputFormat<NullWritable,VertexWritable>` for their graph system. If the provider wishes to persist computed
+results back to their graph system (and not just to HDFS via a `FileOutputFormat`), then a graph system specific
+`OutputFormat<NullWritable,VertexWritable>` must be developed as well.
+
+Conceptually, `HadoopGraph` is a wrapper around a `Configuration` object. There is no "data" in the `HadoopGraph` as
+the `InputFormat` specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, `HadoopGraph` is a
+small object with little overhead. Graph system providers should realize `HadoopGraph` as the gateway to the OLAP
+features offered by Hadoop-Gremlin. For example, a graph system specific `Graph.compute(Class<? extends GraphComputer>
+graphComputerClass)`-method may look as follows:
+
+[source,java]
+----
+public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException {
+  try {
+    if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass))
+      return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this);
+    else
+      throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass);
+  } catch (final Exception e) {
+    throw new IllegalArgumentException(e.getMessage(),e);
+  }
+}
+----
+
+Note that the configurations for Hadoop are assumed to be in the `Graph.configuration()` object. If this is not the
+case, then the `Configuration` provided to `HadoopGraph.open()` should be dynamically created within the
+`compute()`-method. It is in the provided configuration that `HadoopGraph` gets the various properties which
+determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphInputFormat` and
+`gremlin.hadoop.graphOutputFormat`.
+
+IMPORTANT: A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which
+determines which persistence options are available to the user. For the standard file-based `OutputFormats` provided
+by Hadoop-Gremlin (e.g. <<gryo-io-format,`GryoOutputFormat`>>, <<graphson-io-format,`GraphSONOutputFormat`>>,
+and <<script-io-format,`ScriptInputOutputFormat`>>) `ResultGraph.ORIGINAL` is not supported as the original graph
+data files are not random access and are, in essence, immutable. Thus, these file-based `OutputFormats` only support
+`ResultGraph.NEW` which creates a copy of the data specified by the `Persist` enum.
+

[08/19] incubator-tinkerpop git commit: Uncommented some tests there disabled by accident.

Posted by dk...@apache.org.

Uncommented some tests there disabled by accident.

Tests all seem to pass - CTR


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/70a3065a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/70a3065a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/70a3065a

Branch: refs/heads/master
Commit: 70a3065a60c40d0d57d750303c23a74655c1a0e7
Parents: b0dfde3
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Wed Feb 17 09:00:30 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Wed Feb 17 09:00:30 2016 -0500

----------------------------------------------------------------------
 .../tinkerpop/gremlin/groovy/engine/GremlinExecutorTest.java      | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/70a3065a/gremlin-groovy/src/test/java/org/apache/tinkerpop/gremlin/groovy/engine/GremlinExecutorTest.java
----------------------------------------------------------------------
diff --git a/gremlin-groovy/src/test/java/org/apache/tinkerpop/gremlin/groovy/engine/GremlinExecutorTest.java b/gremlin-groovy/src/test/java/org/apache/tinkerpop/gremlin/groovy/engine/GremlinExecutorTest.java
index b5f186b..6d6abb4 100644
--- a/gremlin-groovy/src/test/java/org/apache/tinkerpop/gremlin/groovy/engine/GremlinExecutorTest.java
+++ b/gremlin-groovy/src/test/java/org/apache/tinkerpop/gremlin/groovy/engine/GremlinExecutorTest.java
@@ -83,7 +83,7 @@ public class GremlinExecutorTest {
             e.printStackTrace();
         }
     }
-    /*
+
     @Test
     public void shouldEvalScript() throws Exception {
         final GremlinExecutor gremlinExecutor = GremlinExecutor.build().create();
@@ -600,7 +600,6 @@ public class GremlinExecutorTest {
         assertSame(service, gremlinExecutor.getScheduledExecutorService());
         gremlinExecutor.close();
     }
-    */
 
     @Test
     public void shouldAllowVariableReuseAcrossThreads() throws Exception {

[10/19] incubator-tinkerpop git commit: splitted implementations.asciidoc into implementations-hadoop.asciidoc and implementations-neo4j.asciidoc

Posted by dk...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/bbf5b3f4/docs/src/reference/index.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/index.asciidoc b/docs/src/reference/index.asciidoc
index 5a01291..8546063 100644
--- a/docs/src/reference/index.asciidoc
+++ b/docs/src/reference/index.asciidoc
@@ -32,7 +32,8 @@ include::the-graphcomputer.asciidoc[]
 
 include::gremlin-applications.asciidoc[]
 
-include::implementations.asciidoc[]
+include::implementations-neo4j.asciidoc[]
+include::implementations-hadoop.asciidoc[]
 
 include::conclusion.asciidoc[]

[04/19] incubator-tinkerpop git commit: TINKERPOP-1159 Prevented multiple close requests on call to Client.close()

Posted by dk...@apache.org.

TINKERPOP-1159 Prevented multiple close requests on call to Client.close()

This problem occured around sessions specifically as there would be one call to shutdown() from both the call to Client.closeAsync() and from the call back to Client.write(). Just needed to change it so that the code of shutdown() only executed once no matter how many times it was called. CTR.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/888ac6a2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/888ac6a2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/888ac6a2

Branch: refs/heads/master
Commit: 888ac6a23ee0d26bf8c2c3ab43b72a58120beffd
Parents: 7b8ce4b
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Tue Feb 16 12:10:29 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Tue Feb 16 12:10:29 2016 -0500

----------------------------------------------------------------------
 CHANGELOG.asciidoc                              |  1 +
 .../tinkerpop/gremlin/driver/Connection.java    | 62 ++++++++++++--------
 2 files changed, 37 insertions(+), 26 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/888ac6a2/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc
index 3e4cea3..b35efcb 100644
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@ -26,6 +26,7 @@ image::https://raw.githubusercontent.com/apache/incubator-tinkerpop/master/docs/
 TinkerPop 3.1.2 (NOT OFFICIALLY RELEASED YET)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+* Fixed a bug where multiple "close" requests were being sent by the driver on `Client.close()`.
 * Fixed an `Property` attach bug that shows up in serialization-based `GraphComputer` implementations.
 * Fixed a pom.xml bug where Gremlin Console/Server were not pulling the latest Neo4j 2.3.2.
 * Fixed bug in "round robin" load balancing in `gremlin-driver` where requests were wrongly being sent to the same host.

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/888ac6a2/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/Connection.java
----------------------------------------------------------------------
diff --git a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/Connection.java b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/Connection.java
index f3fbf0d..cb26781 100644
--- a/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/Connection.java
+++ b/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/Connection.java
@@ -33,6 +33,7 @@ import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
 import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.concurrent.atomic.AtomicReference;
 
@@ -75,6 +76,7 @@ final class Connection {
     private final Channelizer channelizer;
 
     private final AtomicReference<CompletableFuture<Void>> closeFuture = new AtomicReference<>();
+    private final AtomicBoolean shutdownInitiated = new AtomicBoolean(false);
 
     public Connection(final URI uri, final ConnectionPool pool, final int maxInProcess) throws ConnectionException {
         this.uri = uri;
@@ -185,6 +187,9 @@ final class Connection {
                         final CompletableFuture<Void> readCompleted = new CompletableFuture<>();
                         readCompleted.thenAcceptAsync(v -> {
                             thisConnection.returnToPool();
+
+                            // close was signaled in closeAsync() but there were pending messages at that time. attempt
+                            // the shutdown if the returned result cleared up the last pending message
                             if (isClosed() && pending.isEmpty())
                                 shutdown(closeFuture.get());
                         }, cluster.executor());
@@ -209,35 +214,40 @@ final class Connection {
     }
 
     private void shutdown(final CompletableFuture<Void> future) {
-        if (client instanceof Client.SessionedClient) {
-            // maybe this should be delegated back to the Client implementation???
-            final RequestMessage closeMessage = client.buildMessage(RequestMessage.build(Tokens.OPS_CLOSE));
-            final CompletableFuture<ResultSet> closed = new CompletableFuture<>();
-            write(closeMessage, closed);
-
-            try {
-                // make sure we get a response here to validate that things closed as expected.  on error, we'll let
-                // the server try to clean up on its own.  the primary error here should probably be related to
-                // protocol issues which should not be something a user has to fuss with.
-                closed.get();
-            } catch (Exception ex) {
-                final String msg = String.format(
-                    "Encountered an error trying to close connection on %s - force closing - server will close session on shutdown or timeout.",
-                    ((Client.SessionedClient) client).getSessionId());
-                logger.warn(msg, ex);
+        // shutdown can be called directly from closeAsync() or after write() and therefore this method should only
+        // be called once. once shutdown is initiated, it shoudln't be executed a second time or else it sends more
+        // messages at the server and leads to ugly log messages over there.
+        if (shutdownInitiated.compareAndSet(false, true)) {
+            if (client instanceof Client.SessionedClient) {
+                // maybe this should be delegated back to the Client implementation???
+                final RequestMessage closeMessage = client.buildMessage(RequestMessage.build(Tokens.OPS_CLOSE));
+                final CompletableFuture<ResultSet> closed = new CompletableFuture<>();
+                write(closeMessage, closed);
+
+                try {
+                    // make sure we get a response here to validate that things closed as expected.  on error, we'll let
+                    // the server try to clean up on its own.  the primary error here should probably be related to
+                    // protocol issues which should not be something a user has to fuss with.
+                    closed.get();
+                } catch (Exception ex) {
+                    final String msg = String.format(
+                            "Encountered an error trying to close connection on %s - force closing - server will close session on shutdown or timeout.",
+                            ((Client.SessionedClient) client).getSessionId());
+                    logger.warn(msg, ex);
+                }
             }
-        }
 
-        channelizer.close(channel);
-        final ChannelPromise promise = channel.newPromise();
-        promise.addListener(f -> {
-            if (f.cause() != null)
-                future.completeExceptionally(f.cause());
-            else
-                future.complete(null);
-        });
+            channelizer.close(channel);
+            final ChannelPromise promise = channel.newPromise();
+            promise.addListener(f -> {
+                if (f.cause() != null)
+                    future.completeExceptionally(f.cause());
+                else
+                    future.complete(null);
+            });
 
-        channel.close(promise);
+            channel.close(promise);
+        }
     }
 
     public String getConnectionInfo() {

[05/19] incubator-tinkerpop git commit: More fixes for in-session binding concurrency.

Posted by dk...@apache.org.

More fixes for in-session binding concurrency.

Tried to again ensure that multiple threads don't act on bindings at the same time.  Fixed a logging message for session close that was logging a rollback that may not have actually happened.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/a5a00bf1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/a5a00bf1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/a5a00bf1

Branch: refs/heads/master
Commit: a5a00bf1c42c286b5c1d042003c2521f7dc317d3
Parents: 888ac6a
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Feb 15 16:23:38 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Tue Feb 16 12:17:14 2016 -0500

----------------------------------------------------------------------
 .../server/op/AbstractEvalOpProcessor.java      | 49 ++++++++++++--------
 .../gremlin/server/op/session/Session.java      |  6 ++-
 .../server/op/session/SessionOpProcessor.java   |  4 +-
 .../server/GremlinServerIntegrateTest.java      | 27 +++++++++++
 4 files changed, 64 insertions(+), 22 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/a5a00bf1/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
----------------------------------------------------------------------
diff --git a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
index 1e37fa0..b8f48eb 100644
--- a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
+++ b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
@@ -46,6 +46,7 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import javax.script.Bindings;
+import javax.script.SimpleBindings;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.HashSet;
@@ -188,30 +189,40 @@ public abstract class AbstractEvalOpProcessor implements OpProcessor {
 
         final String script = (String) args.get(Tokens.ARGS_GREMLIN);
         final String language = args.containsKey(Tokens.ARGS_LANGUAGE) ? (String) args.get(Tokens.ARGS_LANGUAGE) : null;
-        final Bindings bindings = bindingsSupplier.get();
+        final Bindings bindings = new SimpleBindings();
 
         // sessionless requests are always transaction managed, but in-session requests are configurable.
         final boolean managedTransactionsForRequest = manageTransactions ?
                 true : (Boolean) args.getOrDefault(Tokens.ARGS_MANAGE_TRANSACTION, false);
 
-        final CompletableFuture<Object> evalFuture = gremlinExecutor.eval(script, language, bindings, null, o -> {
-            final Iterator itty = IteratorUtils.asIterator(o);
-
-            logger.debug("Preparing to iterate results from - {} - in thread [{}]", msg, Thread.currentThread().getName());
-
-            try {
-                handleIterator(context, itty);
-            } catch (TimeoutException ex) {
-                final String errorMessage = String.format("Response iteration exceeded the configured threshold for request [%s] - %s", msg, ex.getMessage());
-                logger.warn(errorMessage);
-                ctx.writeAndFlush(ResponseMessage.build(msg).code(ResponseStatusCode.SERVER_ERROR_TIMEOUT).statusMessage(errorMessage).create());
-                if (managedTransactionsForRequest) attemptRollback(msg, context.getGraphManager(), settings.strictTransactionManagement);
-            } catch (Exception ex) {
-                logger.warn(String.format("Exception processing a script on request [%s].", msg), ex);
-                ctx.writeAndFlush(ResponseMessage.build(msg).code(ResponseStatusCode.SERVER_ERROR).statusMessage(ex.getMessage()).create());
-                if (managedTransactionsForRequest) attemptRollback(msg, context.getGraphManager(), settings.strictTransactionManagement);
-            }
-        });
+        final GremlinExecutor.LifeCycle lifeCycle = GremlinExecutor.LifeCycle.build()
+                .beforeEval(b -> {
+                    try {
+                        b.putAll(bindingsSupplier.get());
+                    } catch (OpProcessorException ope) {
+                        ope.printStackTrace();
+                    }
+                })
+                .withResult(o -> {
+                    final Iterator itty = IteratorUtils.asIterator(o);
+
+                    logger.debug("Preparing to iterate results from - {} - in thread [{}]", msg, Thread.currentThread().getName());
+
+                    try {
+                        handleIterator(context, itty);
+                    } catch (TimeoutException ex) {
+                        final String errorMessage = String.format("Response iteration exceeded the configured threshold for request [%s] - %s", msg, ex.getMessage());
+                        logger.warn(errorMessage);
+                        ctx.writeAndFlush(ResponseMessage.build(msg).code(ResponseStatusCode.SERVER_ERROR_TIMEOUT).statusMessage(errorMessage).create());
+                        if (managedTransactionsForRequest) attemptRollback(msg, context.getGraphManager(), settings.strictTransactionManagement);
+                    } catch (Exception ex) {
+                        logger.warn(String.format("Exception processing a script on request [%s].", msg), ex);
+                        ctx.writeAndFlush(ResponseMessage.build(msg).code(ResponseStatusCode.SERVER_ERROR).statusMessage(ex.getMessage()).create());
+                        if (managedTransactionsForRequest) attemptRollback(msg, context.getGraphManager(), settings.strictTransactionManagement);
+                    }
+                }).create();
+
+        final CompletableFuture<Object> evalFuture = gremlinExecutor.eval(script, language, bindings, lifeCycle);
 
         evalFuture.handle((v, t) -> {
             timerContext.stop();

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/a5a00bf1/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/Session.java
----------------------------------------------------------------------
diff --git a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/Session.java b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/Session.java
index 70fa593..c29b45a 100644
--- a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/Session.java
+++ b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/Session.java
@@ -122,8 +122,10 @@ public class Session {
                 // that thread of execution from this session
                 try {
                     executor.submit(() -> {
-                        logger.info("Rolling back open transactions on {} before killing session: {}", kv.getKey(), session);
-                        if (g.tx().isOpen()) g.tx().rollback();
+                        if (g.tx().isOpen()) {
+                            logger.info("Rolling back open transactions on {} before killing session: {}", kv.getKey(), session);
+                            g.tx().rollback();
+                        }
                     }).get(30000, TimeUnit.MILLISECONDS);
                 } catch (Exception ex) {
                     logger.warn("An error occurred while attempting rollback when closing session: " + session, ex);

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/a5a00bf1/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/SessionOpProcessor.java
----------------------------------------------------------------------
diff --git a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/SessionOpProcessor.java b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/SessionOpProcessor.java
index d5e5e7c..b7e0d06 100644
--- a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/SessionOpProcessor.java
+++ b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/session/SessionOpProcessor.java
@@ -34,6 +34,7 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import javax.script.Bindings;
+import javax.script.SimpleBindings;
 import java.util.HashMap;
 import java.util.Map;
 import java.util.Optional;
@@ -149,7 +150,8 @@ public class SessionOpProcessor extends AbstractEvalOpProcessor {
         context.getChannelHandlerContext().channel().attr(StateKey.SESSION).set(session);
 
         evalOpInternal(context, session::getGremlinExecutor, () -> {
-            final Bindings bindings = session.getBindings();
+            final Bindings bindings = new SimpleBindings();
+            bindings.putAll(session.getBindings());
 
             // parameter bindings override session bindings if present
             Optional.ofNullable((Map<String, Object>) msg.getArgs().get(Tokens.ARGS_BINDINGS)).ifPresent(bindings::putAll);

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/a5a00bf1/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java
----------------------------------------------------------------------
diff --git a/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java b/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java
index 6ca7dff..7714a93 100644
--- a/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java
+++ b/gremlin-server/src/test/java/org/apache/tinkerpop/gremlin/server/GremlinServerIntegrateTest.java
@@ -780,4 +780,31 @@ public class GremlinServerIntegrateTest extends AbstractGremlinServerIntegration
             assertEquals("jason", v.value("name"));
         }
     }
+
+    @Test
+    public void shouldEnsureSessionBindingsAreThreadSafe() throws Exception {
+        final Cluster cluster = Cluster.open();
+        final Client client = cluster.connect(name.getMethodName());
+
+        client.submitAsync("a=100;b=1000;c=10000;null");
+        final int requests = 1000;
+        final List<CompletableFuture<ResultSet>> futures = new ArrayList<>(requests);
+        IntStream.range(0, requests).forEach(i -> {
+            try {
+                futures.add(client.submitAsync("a+b+c"));
+            } catch (Exception ex) {
+                throw new RuntimeException(ex);
+            }
+        });
+
+        assertEquals(requests, futures.size());
+
+        for(CompletableFuture<ResultSet> f : futures) {
+            final Result r = f.get().one();
+            assertEquals(11100, r.getInt());
+        }
+
+        client.close();
+        cluster.close();
+    }
 }

[16/19] incubator-tinkerpop git commit: splitted implementations-neo4j.asciidoc further into implementations-intro.asciidoc, implementations-tinkergraph.asciidoc and implementations-neo4j.asciidoc

Posted by dk...@apache.org.

splitted implementations-neo4j.asciidoc further into implementations-intro.asciidoc, implementations-tinkergraph.asciidoc and implementations-neo4j.asciidoc


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/c9149c23
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/c9149c23
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/c9149c23

Branch: refs/heads/master
Commit: c9149c2350fe45df60e0bbed9a6e6257f911a152
Parents: e9c6465
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Fri Feb 19 17:54:30 2016 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Fri Feb 19 17:54:30 2016 +0100

----------------------------------------------------------------------
 .../reference/implementations-intro.asciidoc    | 545 +++++++++++++++
 .../reference/implementations-neo4j.asciidoc    | 660 -------------------
 .../implementations-tinkergraph.asciidoc        | 144 ++++
 docs/src/reference/index.asciidoc               |   2 +
 4 files changed, 691 insertions(+), 660 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/c9149c23/docs/src/reference/implementations-intro.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/implementations-intro.asciidoc b/docs/src/reference/implementations-intro.asciidoc
new file mode 100644
index 0000000..5357ce3
--- /dev/null
+++ b/docs/src/reference/implementations-intro.asciidoc
@@ -0,0 +1,545 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[implementations]]
+Implementations
+===============
+
+image::gremlin-racecar.png[width=325]
+
+[[graph-system-provider-requirements]]
+Graph System Provider Requirements
+----------------------------------
+
+image:tinkerpop-enabled.png[width=140,float=left] At the core of TinkerPop3 is a Java8 API. The implementation of this
+core API and its validation via the `gremlin-test` suite is all that is required of a graph system provider wishing to
+provide a TinkerPop3-enabled graph engine. Once a graph system has a valid implementation, then all the applications
+provided by TinkerPop (e.g. Gremlin Console, Gremlin Server, etc.) and 3rd-party developers (e.g. Gremlin-Scala,
+Gremlin-JS, etc.) will integrate properly. Finally, please feel free to use the logo on the left to promote your
+TinkerPop3 implementation.
+
+Implementing Gremlin-Core
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The classes that a graph system provider should focus on implementing are itemized below. It is a good idea to study
+the <<tinkergraph-gremlin,TinkerGraph>> (in-memory OLTP and OLAP in `tinkergraph-gremlin`), <<neo4j-gremlin,Neo4jGraph>>
+(OTLP w/ transactions in `neo4j-gremlin`) and/or <<hadoop-gremlin,HadoopGraph>> (OLAP in `hadoop-gremlin`)
+implementations for ideas and patterns.
+
+. Online Transactional Processing Graph Systems (*OLTP*)
+ .. Structure API: `Graph`, `Element`, `Vertex`, `Edge`, `Property` and `Transaction` (if transactions are supported).
+ .. Process API: `TraversalStrategy` instances for optimizing Gremlin traversals to the provider's graph system (i.e. `TinkerGraphStepStrategy`).
+. Online Analytics Processing Graph Systems (*OLAP*)
+ .. Everything required of OTLP is required of OLAP (but not vice versa).
+ .. GraphComputer API: `GraphComputer`, `Messenger`, `Memory`.
+
+Please consider the following implementation notes:
+
+* Be sure your `Graph` implementation is named as `XXXGraph` (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
+* Use `StringHelper` to ensuring that the `toString()` representation of classes are consistent with other implementations.
+* Ensure that your implementation's `Features` (Graph, Vertex, etc.) are correct so that test cases handle particulars accordingly.
+* Use the numerous static method helper classes such as `ElementHelper`, `GraphComputerHelper`, `VertexProgramHelper`, etc.
+* There are a number of default methods on the provided interfaces that are semantically correct. However, if they are
+not efficient for the implementation, override them.
+* Implement the `structure/` package interfaces first and then, if desired, interfaces in the `process/` package interfaces.
+* `ComputerGraph` is a `Wrapper` system that ensure proper semantics during a GraphComputer computation.
+
+[[oltp-implementations]]
+OLTP Implementations
+^^^^^^^^^^^^^^^^^^^^
+
+image:pipes-character-1.png[width=110,float=right] The most important interfaces to implement are in the `structure/`
+package. These include interfaces like Graph, Vertex, Edge, Property, Transaction, etc. The `StructureStandardSuite`
+will ensure that the semantics of the methods implemented are correct. Moreover, there are numerous `Exceptions`
+classes with static exceptions that should be thrown by the graph system so that all the exceptions and their
+messages are consistent amongst all TinkerPop3 implementations.
+
+[[olap-implementations]]
+OLAP Implementations
+^^^^^^^^^^^^^^^^^^^^
+
+image:furnace-character-1.png[width=110,float=right] Implementing the OLAP interfaces may be a bit more complicated.
+Note that before OLAP interfaces are implemented, it is necessary for the OLTP interfaces to be, at minimal,
+implemented as specified in <<oltp-implementations,OLTP Implementations>>. A summary of each required interface
+implementation is presented below:
+
+. `GraphComputer`: A fluent builder for specifying an isolation level, a VertexProgram, and any number of MapReduce jobs to be submitted.
+. `Memory`: A global blackboard for ANDing, ORing, INCRing, and SETing values for specified keys.
+. `Messenger`: The system that collects and distributes messages being propagated by vertices executing the VertexProgram application.
+. `MapReduce.MapEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications map-phase.
+. `MapReduce.ReduceEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications combine- and reduce-phases.
+
+NOTE: The VertexProgram and MapReduce interfaces in the `process/computer/` package are not required by the graph
+system. Instead, these are interfaces to be implemented by application developers writing VertexPrograms and MapReduce jobs.
+
+IMPORTANT: TinkerPop3 provides three OLAP implementations: <<tinkergraph-gremlin,TinkerGraphComputer>> (TinkerGraph),
+<<giraphgraphcomputer,GiraphGraphComputer>> (HadoopGraph), and <<sparkgraphcomputer,`SparkGraphComputer`>> (Hadoop).
+Given the complexity of the OLAP system, it is good to study and copy many of the patterns used in these reference
+implementations.
+
+Implementing GraphComputer
+++++++++++++++++++++++++++
+
+image:furnace-character-3.png[width=150,float=right] The most complex method in GraphComputer is the `submit()`-method. The method must do the following:
+
+. Ensure the the GraphComputer has not already been executed.
+. Ensure that at least there is a VertexProgram or 1 MapReduce job.
+. If there is a VertexProgram, validate that it can execute on the GraphComputer given the respectively defined features.
+. Create the Memory to be used for the computation.
+. Execute the VertexProgram.setup() method once and only once.
+. Execute the VertexProgram.execute() method for each vertex.
+. Execute the VertexProgram.terminate() method once and if true, repeat VertexProgram.execute().
+. When VertexProgram.terminate() returns true, move to MapReduce job execution.
+. MapReduce jobs are not required to be executed in any specified order.
+. For each Vertex, execute MapReduce.map(). Then (if defined) execute MapReduce.combine() and MapReduce.reduce().
+. Update Memory with runtime information.
+. Construct a new `ComputerResult` containing the compute Graph and Memory.
+
+Implementing Memory
++++++++++++++++++++
+
+image:gremlin-brain.png[width=175,float=left] The Memory object is initially defined by `VertexProgram.setup()`.
+The memory data is available in the first round of the `VertexProgram.execute()` method. Each Vertex, when executing
+the VertexProgram, can update the Memory in its round. However, the update is not seen by the other vertices until
+the next round. At the end of the first round, all the updates are aggregated and the new memory data is available
+on the second round. This process repeats until the VertexProgram terminates.
+
+Implementing Messenger
+++++++++++++++++++++++
+
+The Messenger object is similar to the Memory object in that a vertex can read and write to the Messenger. However,
+the data it reads are the messages sent to the vertex in the previous step and the data it writes are the messages
+that will be readable by the receiving vertices in the subsequent round.
+
+Implementing MapReduce Emitters
++++++++++++++++++++++++++++++++
+
+image:hadoop-logo-notext.png[width=150,float=left] The MapReduce framework in TinkerPop3 is similar to the model
+popularized by link:http://apache.hadoop.org[Hadoop]. The primary difference is that all Mappers process the vertices
+of the graph, not an arbitrary key/value pair. However, the vertices' edges can not be accessed -- only their
+properties. This greatly reduces the amount of data needed to be pushed through the MapReduce engine as any edge
+information required, can be computed in the VertexProgram.execute() method. Moreover, at this stage, vertices can
+not be mutated, only their token and property data read. A Gremlin OLAP system needs to provide implementations for
+to particular classes: `MapReduce.MapEmitter` and `MapReduce.ReduceEmitter`. TinkerGraph's implementation is provided
+below which demonstrates the simplicity of the algorithm (especially when the data is all within the same JVM).
+
+[source,java]
+----
+public class TinkerMapEmitter<K, V> implements MapReduce.MapEmitter<K, V> {
+
+    public Map<K, Queue<V>> reduceMap;
+    public Queue<KeyValue<K, V>> mapQueue;
+    private final boolean doReduce;
+
+    public TinkerMapEmitter(final boolean doReduce) { <1>
+        this.doReduce = doReduce;
+        if (this.doReduce)
+            this.reduceMap = new ConcurrentHashMap<>();
+        else
+            this.mapQueue = new ConcurrentLinkedQueue<>();
+    }
+
+    @Override
+    public void emit(K key, V value) {
+        if (this.doReduce)
+            this.reduceMap.computeIfAbsent(key, k -> new ConcurrentLinkedQueue<>()).add(value); <2>
+        else
+            this.mapQueue.add(new KeyValue<>(key, value)); <3>
+    }
+
+    protected void complete(final MapReduce<K, V, ?, ?, ?> mapReduce) {
+        if (!this.doReduce && mapReduce.getMapKeySort().isPresent()) { <4>
+            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
+            final List<KeyValue<K, V>> list = new ArrayList<>(this.mapQueue);
+            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
+            this.mapQueue.clear();
+            this.mapQueue.addAll(list);
+        } else if (mapReduce.getMapKeySort().isPresent()) {
+            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
+            final List<Map.Entry<K, Queue<V>>> list = new ArrayList<>();
+            list.addAll(this.reduceMap.entrySet());
+            Collections.sort(list, Comparator.comparing(Map.Entry::getKey, comparator));
+            this.reduceMap = new LinkedHashMap<>();
+            list.forEach(entry -> this.reduceMap.put(entry.getKey(), entry.getValue()));
+        }
+    }
+}
+----
+
+<1> If the MapReduce job has a reduce, then use one data structure (`reduceMap`), else use another (`mapList`). The
+difference being that a reduction requires a grouping by key and therefore, the `Map<K,Queue<V>>` definition. If no
+reduction/grouping is required, then a simple `Queue<KeyValue<K,V>>` can be leveraged.
+<2> If reduce is to follow, then increment the Map with a new value for the key. `MapHelper` is a TinkerPop3 class
+with static methods for adding data to a Map.
+<3> If no reduce is to follow, then simply append a KeyValue to the queue.
+<4> When the map phase is complete, any map-result sorting required can be executed at this point.
+
+[source,java]
+----
+public class TinkerReduceEmitter<OK, OV> implements MapReduce.ReduceEmitter<OK, OV> {
+
+    protected Queue<KeyValue<OK, OV>> reduceQueue = new ConcurrentLinkedQueue<>();
+
+    @Override
+    public void emit(final OK key, final OV value) {
+        this.reduceQueue.add(new KeyValue<>(key, value));
+    }
+
+    protected void complete(final MapReduce<?, ?, OK, OV, ?> mapReduce) {
+        if (mapReduce.getReduceKeySort().isPresent()) {
+            final Comparator<OK> comparator = mapReduce.getReduceKeySort().get();
+            final List<KeyValue<OK, OV>> list = new ArrayList<>(this.reduceQueue);
+            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
+            this.reduceQueue.clear();
+            this.reduceQueue.addAll(list);
+        }
+    }
+}
+----
+
+The method `MapReduce.reduce()` is defined as:
+
+[source,java]
+public void reduce(final OK key, final Iterator<OV> values, final ReduceEmitter<OK, OV> emitter) { ... }
+
+In other words, for the TinkerGraph implementation, iterate through the entrySet of the `reduceMap` and call the
+`reduce()` method on each entry. The `reduce()` method can emit key/value pairs which are simply aggregated into a
+`Queue<KeyValue<OK,OV>>` in an analogous fashion to `TinkerMapEmitter` when no reduce is to follow. These two emitters
+are tied together in `TinkerGraphComputer.submit()`.
+
+[source,java]
+----
+...
+for (final MapReduce mapReduce : mapReducers) {
+    if (mapReduce.doStage(MapReduce.Stage.MAP)) {
+        final TinkerMapEmitter<?, ?> mapEmitter = new TinkerMapEmitter<>(mapReduce.doStage(MapReduce.Stage.REDUCE));
+        final SynchronizedIterator<Vertex> vertices = new SynchronizedIterator<>(this.graph.vertices());
+        workers.setMapReduce(mapReduce);
+        workers.mapReduceWorkerStart(MapReduce.Stage.MAP);
+        workers.executeMapReduce(workerMapReduce -> {
+            while (true) {
+                final Vertex vertex = vertices.next();
+                if (null == vertex) return;
+                workerMapReduce.map(ComputerGraph.mapReduce(vertex), mapEmitter);
+            }
+        });
+        workers.mapReduceWorkerEnd(MapReduce.Stage.MAP);
+
+        // sort results if a map output sort is defined
+        mapEmitter.complete(mapReduce);
+
+        // no need to run combiners as this is single machine
+        if (mapReduce.doStage(MapReduce.Stage.REDUCE)) {
+            final TinkerReduceEmitter<?, ?> reduceEmitter = new TinkerReduceEmitter<>();
+            final SynchronizedIterator<Map.Entry<?, Queue<?>>> keyValues = new SynchronizedIterator((Iterator) mapEmitter.reduceMap.entrySet().iterator());
+            workers.mapReduceWorkerStart(MapReduce.Stage.REDUCE);
+            workers.executeMapReduce(workerMapReduce -> {
+                while (true) {
+                    final Map.Entry<?, Queue<?>> entry = keyValues.next();
+                    if (null == entry) return;
+                        workerMapReduce.reduce(entry.getKey(), entry.getValue().iterator(), reduceEmitter);
+                    }
+                });
+            workers.mapReduceWorkerEnd(MapReduce.Stage.REDUCE);
+            reduceEmitter.complete(mapReduce); // sort results if a reduce output sort is defined
+            mapReduce.addResultToMemory(this.memory, reduceEmitter.reduceQueue.iterator()); <1>
+        } else {
+            mapReduce.addResultToMemory(this.memory, mapEmitter.mapQueue.iterator()); <2>
+        }
+    }
+}
+...
+----
+
+<1> Note that the final results of the reducer are provided to the Memory as specified by the application developer's
+`MapReduce.addResultToMemory()` implementation.
+<2> If there is no reduce stage, the the map-stage results are inserted into Memory as specified by the application
+developer's `MapReduce.addResultToMemory()` implementation.
+
+[[io-implementations]]
+IO Implementations
+^^^^^^^^^^^^^^^^^^
+
+If a `Graph` requires custom serializers for IO to work properly, implement the `Graph.io` method.  A typical example
+of where a `Graph` would require such a custom serializers is if their identifier system uses non-primitive values,
+such as OrientDB's `Rid` class.  From basic serialization of a single `Vertex` all the way up the stack to Gremlin
+Server, the need to know how to handle these complex identifiers is an important requirement.
+
+The first step to implementing custom serializers is to first implement the `IoRegistry` interface and register the
+custom classes and serializers to it. Each `Io` implementation has different requirements for what it expects from the
+`IoRegistry`:
+
+* *GraphML* - No custom serializers expected/allowed.
+* *GraphSON* - Register a Jackson `SimpleModule`.  The `SimpleModule` encapsulates specific classes to be serialized,
+so it does not need to be registered to a specific class in the `IoRegistry` (use `null`).
+* *Gryo* - Expects registration of one of three objects:
+** Register just the custom class with a `null` Kryo `Serializer` implementation - this class will use default "field-level" Kryo serialization.
+** Register the custom class with a specific Kryo `Serializer' implementation.
+** Register the custom class with a `Function<Kryo, Serializer>` for those cases where the Kryo `Serializer` requires the `Kryo` instance to get constructed.
+
+This implementation should provide a zero-arg constructor as the stack may require instantiation via reflection.
+Consider extending `AbstractIoRegistry` for convenience as follows:
+
+[source,java]
+----
+public class MyGraphIoRegistry extends AbstractIoRegistry {
+    public MyGraphIoRegistry() {
+        register(GraphSONIo.class, null, new MyGraphSimpleModule());
+        register(GryoIo.class, MyGraphIdClass.class, new MyGraphIdSerializer());
+    }
+}
+----
+
+In the `Graph.io` method, provide the `IoRegistry` object to the supplied `Builder` and call the `create` method to
+return that `Io` instance as follows:
+
+[source,java]
+----
+public <I extends Io> I io(final Io.Builder<I> builder) {
+    return (I) builder.graph(this).registry(myGraphIoRegistry).create();
+}}
+----
+
+In this way, `Graph` implementations can pre-configure custom serializers for IO interactions and users will not need
+to know about those details. Following this pattern will ensure proper execution of the test suite as well as
+simplified usage for end-users.
+
+IMPORTANT: Proper implementation of IO is critical to successful `Graph` operations in Gremlin Server.  The Test Suite
+does have "serialization" tests that provide some assurance that an implementation is working properly, but those
+tests cannot make assertions against any specifics of a custom serializer.  It is the responsibility of the
+implementer to test the specifics of their custom serializers.
+
+TIP: Consider separating serializer code into its own module, if possible, so that clients that use the `Graph`
+implementation remotely don't need a full dependency on the entire `Graph` - just the IO components and related
+classes being serialized.
+
+[[validating-with-gremlin-test]]
+Validating with Gremlin-Test
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+image:gremlin-edumacated.png[width=225]
+
+[source,xml]
+<dependency>
+  <groupId>org.apache.tinkerpop</groupId>
+  <artifactId>gremlin-test</artifactId>
+  <version>x.y.z</version>
+</dependency>
+<dependency>
+  <groupId>org.apache.tinkerpop</groupId>
+  <artifactId>gremlin-groovy-test</artifactId>
+  <version>x.y.z</version>
+</dependency>
+
+The operational semantics of any OLTP or OLAP implementation are validated by `gremlin-test` and functional
+interoperability with the Groovy environment is ensured by `gremlin-groovy-test`. To implement these tests, provide
+test case implementations as shown below, where `XXX` below denotes the name of the graph implementation (e.g.
+TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
+
+[source,java]
+----
+// Structure API tests
+@RunWith(StructureStandardSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
+public class XXXStructureStandardTest {}
+
+// Process API tests
+@RunWith(ProcessComputerSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
+public class XXXProcessComputerTest {}
+
+@RunWith(ProcessStandardSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
+public class XXXProcessStandardTest {}
+
+@RunWith(GroovyEnvironmentSuite.class)
+@GraphProviderClass(provider = XXXProvider.class, graph = TinkerGraph.class)
+public class XXXGroovyEnvironmentTest {}
+
+@RunWith(GroovyProcessStandardSuite.class)
+@GraphProviderClass(provider = XXXGraphProvider.class, graph = TinkerGraph.class)
+public class XXXGroovyProcessStandardTest {}
+
+@RunWith(GroovyProcessComputerSuite.class)
+@GraphProviderClass(provider = XXXGraphComputerProvider.class, graph = TinkerGraph.class)
+public class XXXGroovyProcessComputerTest {}
+----
+
+The above set of tests represent the minimum test suite set to implement.  There are other "integration" and
+"performance" tests that should be considered optional.  Implementing those tests requires the same pattern as shown above.
+
+IMPORTANT: It is as important to look at "ignored" tests as it is to look at ones that fail.  The `gremlin-test`
+suite utilizes the `Feature` implementation exposed by the `Graph` to determine which tests to execute.  If a test
+utilizes features that are not supported by the graph, it will ignore them.  While that may be fine, implementers
+should validate that the ignored tests are appropriately bypassed and that there are no mistakes in their feature
+definitions.  Moreover, implementers should consider filling gaps in their own test suites, especially when
+IO-related tests are being ignored.
+
+The only test-class that requires any code investment is the `GraphProvider` implementation class. This class is a
+used by the test suite to construct `Graph` configurations and instances and provides information about the
+implementation itself.  In most cases, it is best to simply extend `AbstractGraphProvider` as it provides many
+default implementations of the `GraphProvider` interface.
+
+Finally, specify the test suites that will be supported by the `Graph` implementation using the `@Graph.OptIn`
+annotation.  See the `TinkerGraph` implementation below as an example:
+
+[source,java]
+----
+@Graph.OptIn(Graph.OptIn.SUITE_STRUCTURE_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
+@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_COMPUTER)
+@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_ENVIRONMENT)
+public class TinkerGraph implements Graph {
+----
+
+Only include annotations for the suites the implementation will support.  Note that implementing the suite, but
+not specifying the appropriate annotation will prevent the suite from running (an obvious error message will appear
+in this case when running the mis-configured suite).
+
+There are times when there may be a specific test in the suite that the implementation cannot support (despite the
+features it implements) or should not otherwise be executed.  It is possible for implementers to "opt-out" of a test
+by using the `@Graph.OptOut` annotation.  The following is an example of this annotation usage as taken from
+`HadoopGraph`:
+
+[source, java]
+----
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
+@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
+        method = "g_V_matchXa_hasXname_GarciaX__a_inXwrittenByX_b__a_inXsungByX_bX",
+        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
+        method = "g_V_matchXa_inXsungByX_b__a_inXsungByX_c__b_outXwrittenByX_d__c_outXwrittenByX_e__d_hasXname_George_HarisonX__e_hasXname_Bob_MarleyXX",
+        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
+        method = "shouldNotAllowBadMemoryKeys",
+        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
+@Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
+        method = "shouldRequireRegisteringMemoryKeys",
+        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
+public class HadoopGraph implements Graph {
+----
+
+The above examples show how to ignore individual tests.  It is also possible to:
+
+* Ignore an entire test case (i.e. all the methods within the test) by setting the `method` to "*".
+* Ignore a "base" test class such that test that extend from those classes will all be ignored.  This style of
+ignoring is useful for Gremlin "process" tests that have bases classes that are extended by various Gremlin flavors (e.g. groovy).
+* Ignore a `GraphComputer` test based on the type of `GraphComputer` being used.  Specify the "computer" attribute on
+the `OptOut` (which is an array specification) which should have a value of the `GraphComputer` implementation class
+that should ignore that test. This attribute should be left empty for "standard" execution and by default all
+`GraphComputer` implementations will be included in the `OptOut` so if there are multiple implementations, explicitly
+specify the ones that should be excluded.
+
+Also note that some of the tests in the Gremlin Test Suite are parameterized tests and require an additional level of
+specificity to be properly ignored.  To ignore these types of tests, examine the name template of the parameterized
+tests.  It is defined by a Java annotation that looks like this:
+
+[source, java]
+@Parameterized.Parameters(name = "expect({0})")
+
+The annotation above shows that the name of each parameterized test will be prefixed with "expect" and have
+parentheses wrapped around the first parameter (at index 0) value supplied to each test.  This information can
+only be garnered by studying the test set up itself.  Once the pattern is determined and the specific unique name of
+the parameterized test is identified, add it to the `specific` property on the `OptOut` annotation in addition to
+the other arguments.
+
+These annotations help provide users a level of transparency into test suite compliance (via the
+xref:describe-graph[describeGraph()] utility function). It also allows implementers to have a lot of flexibility in
+terms of how they wish to support TinkerPop.  For example, maybe there is a single test case that prevents an
+implementer from claiming support of a `Feature`.  The implementer could choose to either not support the `Feature`
+or to support it but "opt-out" of the test with a "reason" as to why so that users understand the limitation.
+
+IMPORTANT: Before using `OptOut` be sure that the reason for using it is sound and it is more of a last resort.
+It is possible that a test from the suite doesn't properly represent the expectations of a feature, is too broad or
+narrow for the semantics it is trying to enforce or simply contains a bug.  Please consider raising issues in the
+developer mailing list with such concerns before assuming `OptOut` is the only answer.
+
+IMPORTANT: There are no tests that specifically validate complete compliance with Gremlin Server.  Generally speaking,
+a `Graph` that passes the full Test Suite, should be compliant with Gremlin Server.  The one area where problems can
+occur is in serialization.  Always ensure that IO is properly implemented, that custom serializers are tested fully
+and ultimately integration test the `Graph` with an actual Gremlin Server instance.
+
+CAUTION: Configuring tests to run in parallel might result in errors that are difficult to debug as there is some
+shared state in test execution around graph configuration.  It is therefore recommended that parallelism be turned
+off for the test suite (the Maven SureFire Plugin is configured this way by default).  It may also be important to
+include this setting, `<reuseForks>false</reuseForks>`, in the SureFire configuration if tests are failing in an
+unexplainable way.
+
+Accessibility via GremlinPlugin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+image:gremlin-plugin.png[width=100,float=left] The applications distributed with TinkerPop3 do not distribute with
+any graph system implementations besides TinkerGraph. If your implementation is stored in a Maven repository (e.g.
+Maven Central Repository), then it is best to provide a `GremlinPlugin` implementation so the respective jars can be
+downloaded according and when required by the user. Neo4j's GremlinPlugin is provided below for reference.
+
+[source,java]
+----
+public class Neo4jGremlinPlugin implements GremlinPlugin {
+
+    private static final String IMPORT = "import ";
+    private static final String DOT_STAR = ".*";
+
+    private static final Set<String> IMPORTS = new HashSet<String>() {{
+        add(IMPORT + Neo4jGraph.class.getPackage().getName() + DOT_STAR);
+    }};
+
+    @Override
+    public String getName() {
+        return "neo4j";
+    }
+
+    @Override
+    public void pluginTo(final PluginAcceptor pluginAcceptor) {
+        pluginAcceptor.addImports(IMPORTS);
+    }
+}
+---- 
+
+With the above plugin implementations, users can now download respective binaries for Gremlin Console, Gremlin Server, etc.
+
+[source,groovy]
+gremlin> g = Neo4jGraph.open('/tmp/neo4j')
+No such property: Neo4jGraph for class: groovysh_evaluate
+Display stack trace? [yN]
+gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z
+==>loaded: [org.apache.tinkerpop, neo4j-gremlin, …]
+gremlin> :plugin use tinkerpop.neo4j
+==>tinkerpop.neo4j activated
+gremlin> g = Neo4jGraph.open('/tmp/neo4j')
+==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
+
+In-Depth Implementations
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+image:gremlin-painting.png[width=200,float=right] The graph system implementation details presented thus far are
+minimum requirements necessary to yield a valid TinkerPop3 implementation. However, there are other areas that a
+graph system provider can tweak to provide an implementation more optimized for their underlying graph engine. Typical
+areas of focus include:
+
+* Traversal Strategies: A <<traversalstrategy,TraversalStrategy>> can be used to alter a traversal prior to its
+execution. A typical example is converting a pattern of `g.V().has('name','marko')` into a global index lookup for
+all vertices with name "marko". In this way, a `O(|V|)` lookup becomes an `O(log(|V|))`. Please review
+`TinkerGraphStepStrategy` for ideas.
+* Step Implementations: Every <<graph-traversal-steps,step>> is ultimately referenced by the `GraphTraversal`
+interface. It is possible to extend `GraphTraversal` to use a graph system specific step implementation.

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/c9149c23/docs/src/reference/implementations-neo4j.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/implementations-neo4j.asciidoc b/docs/src/reference/implementations-neo4j.asciidoc
index 5602754..9accd25 100644
--- a/docs/src/reference/implementations-neo4j.asciidoc
+++ b/docs/src/reference/implementations-neo4j.asciidoc
@@ -14,666 +14,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 ////
-[[implementations]]
-Implementations
-===============
-
-image::gremlin-racecar.png[width=325]
-
-[[graph-system-provider-requirements]]
-Graph System Provider Requirements
-----------------------------------
-
-image:tinkerpop-enabled.png[width=140,float=left] At the core of TinkerPop3 is a Java8 API. The implementation of this
-core API and its validation via the `gremlin-test` suite is all that is required of a graph system provider wishing to
-provide a TinkerPop3-enabled graph engine. Once a graph system has a valid implementation, then all the applications
-provided by TinkerPop (e.g. Gremlin Console, Gremlin Server, etc.) and 3rd-party developers (e.g. Gremlin-Scala,
-Gremlin-JS, etc.) will integrate properly. Finally, please feel free to use the logo on the left to promote your
-TinkerPop3 implementation.
-
-Implementing Gremlin-Core
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The classes that a graph system provider should focus on implementing are itemized below. It is a good idea to study
-the <<tinkergraph-gremlin,TinkerGraph>> (in-memory OLTP and OLAP in `tinkergraph-gremlin`), <<neo4j-gremlin,Neo4jGraph>>
-(OTLP w/ transactions in `neo4j-gremlin`) and/or <<hadoop-gremlin,HadoopGraph>> (OLAP in `hadoop-gremlin`)
-implementations for ideas and patterns.
-
-. Online Transactional Processing Graph Systems (*OLTP*)
- .. Structure API: `Graph`, `Element`, `Vertex`, `Edge`, `Property` and `Transaction` (if transactions are supported).
- .. Process API: `TraversalStrategy` instances for optimizing Gremlin traversals to the provider's graph system (i.e. `TinkerGraphStepStrategy`).
-. Online Analytics Processing Graph Systems (*OLAP*)
- .. Everything required of OTLP is required of OLAP (but not vice versa).
- .. GraphComputer API: `GraphComputer`, `Messenger`, `Memory`.
-
-Please consider the following implementation notes:
-
-* Be sure your `Graph` implementation is named as `XXXGraph` (e.g. TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
-* Use `StringHelper` to ensuring that the `toString()` representation of classes are consistent with other implementations.
-* Ensure that your implementation's `Features` (Graph, Vertex, etc.) are correct so that test cases handle particulars accordingly.
-* Use the numerous static method helper classes such as `ElementHelper`, `GraphComputerHelper`, `VertexProgramHelper`, etc.
-* There are a number of default methods on the provided interfaces that are semantically correct. However, if they are
-not efficient for the implementation, override them.
-* Implement the `structure/` package interfaces first and then, if desired, interfaces in the `process/` package interfaces.
-* `ComputerGraph` is a `Wrapper` system that ensure proper semantics during a GraphComputer computation.
-
-[[oltp-implementations]]
-OLTP Implementations
-^^^^^^^^^^^^^^^^^^^^
-
-image:pipes-character-1.png[width=110,float=right] The most important interfaces to implement are in the `structure/`
-package. These include interfaces like Graph, Vertex, Edge, Property, Transaction, etc. The `StructureStandardSuite`
-will ensure that the semantics of the methods implemented are correct. Moreover, there are numerous `Exceptions`
-classes with static exceptions that should be thrown by the graph system so that all the exceptions and their
-messages are consistent amongst all TinkerPop3 implementations.
-
-[[olap-implementations]]
-OLAP Implementations
-^^^^^^^^^^^^^^^^^^^^
-
-image:furnace-character-1.png[width=110,float=right] Implementing the OLAP interfaces may be a bit more complicated.
-Note that before OLAP interfaces are implemented, it is necessary for the OLTP interfaces to be, at minimal,
-implemented as specified in <<oltp-implementations,OLTP Implementations>>. A summary of each required interface
-implementation is presented below:
-
-. `GraphComputer`: A fluent builder for specifying an isolation level, a VertexProgram, and any number of MapReduce jobs to be submitted.
-. `Memory`: A global blackboard for ANDing, ORing, INCRing, and SETing values for specified keys.
-. `Messenger`: The system that collects and distributes messages being propagated by vertices executing the VertexProgram application.
-. `MapReduce.MapEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications map-phase.
-. `MapReduce.ReduceEmitter`: The system that collects key/value pairs being emitted by the MapReduce applications combine- and reduce-phases.
-
-NOTE: The VertexProgram and MapReduce interfaces in the `process/computer/` package are not required by the graph
-system. Instead, these are interfaces to be implemented by application developers writing VertexPrograms and MapReduce jobs.
-
-IMPORTANT: TinkerPop3 provides three OLAP implementations: <<tinkergraph-gremlin,TinkerGraphComputer>> (TinkerGraph),
-<<giraphgraphcomputer,GiraphGraphComputer>> (HadoopGraph), and <<sparkgraphcomputer,`SparkGraphComputer`>> (Hadoop).
-Given the complexity of the OLAP system, it is good to study and copy many of the patterns used in these reference
-implementations.
-
-Implementing GraphComputer
-++++++++++++++++++++++++++
-
-image:furnace-character-3.png[width=150,float=right] The most complex method in GraphComputer is the `submit()`-method. The method must do the following:
-
-. Ensure the the GraphComputer has not already been executed.
-. Ensure that at least there is a VertexProgram or 1 MapReduce job.
-. If there is a VertexProgram, validate that it can execute on the GraphComputer given the respectively defined features.
-. Create the Memory to be used for the computation.
-. Execute the VertexProgram.setup() method once and only once.
-. Execute the VertexProgram.execute() method for each vertex.
-. Execute the VertexProgram.terminate() method once and if true, repeat VertexProgram.execute().
-. When VertexProgram.terminate() returns true, move to MapReduce job execution.
-. MapReduce jobs are not required to be executed in any specified order.
-. For each Vertex, execute MapReduce.map(). Then (if defined) execute MapReduce.combine() and MapReduce.reduce().
-. Update Memory with runtime information.
-. Construct a new `ComputerResult` containing the compute Graph and Memory.
-
-Implementing Memory
-+++++++++++++++++++
-
-image:gremlin-brain.png[width=175,float=left] The Memory object is initially defined by `VertexProgram.setup()`.
-The memory data is available in the first round of the `VertexProgram.execute()` method. Each Vertex, when executing
-the VertexProgram, can update the Memory in its round. However, the update is not seen by the other vertices until
-the next round. At the end of the first round, all the updates are aggregated and the new memory data is available
-on the second round. This process repeats until the VertexProgram terminates.
-
-Implementing Messenger
-++++++++++++++++++++++
-
-The Messenger object is similar to the Memory object in that a vertex can read and write to the Messenger. However,
-the data it reads are the messages sent to the vertex in the previous step and the data it writes are the messages
-that will be readable by the receiving vertices in the subsequent round.
-
-Implementing MapReduce Emitters
-+++++++++++++++++++++++++++++++
-
-image:hadoop-logo-notext.png[width=150,float=left] The MapReduce framework in TinkerPop3 is similar to the model
-popularized by link:http://apache.hadoop.org[Hadoop]. The primary difference is that all Mappers process the vertices
-of the graph, not an arbitrary key/value pair. However, the vertices' edges can not be accessed -- only their
-properties. This greatly reduces the amount of data needed to be pushed through the MapReduce engine as any edge
-information required, can be computed in the VertexProgram.execute() method. Moreover, at this stage, vertices can
-not be mutated, only their token and property data read. A Gremlin OLAP system needs to provide implementations for
-to particular classes: `MapReduce.MapEmitter` and `MapReduce.ReduceEmitter`. TinkerGraph's implementation is provided
-below which demonstrates the simplicity of the algorithm (especially when the data is all within the same JVM).
-
-[source,java]
-----
-public class TinkerMapEmitter<K, V> implements MapReduce.MapEmitter<K, V> {
-
-    public Map<K, Queue<V>> reduceMap;
-    public Queue<KeyValue<K, V>> mapQueue;
-    private final boolean doReduce;
-
-    public TinkerMapEmitter(final boolean doReduce) { <1>
-        this.doReduce = doReduce;
-        if (this.doReduce)
-            this.reduceMap = new ConcurrentHashMap<>();
-        else
-            this.mapQueue = new ConcurrentLinkedQueue<>();
-    }
-
-    @Override
-    public void emit(K key, V value) {
-        if (this.doReduce)
-            this.reduceMap.computeIfAbsent(key, k -> new ConcurrentLinkedQueue<>()).add(value); <2>
-        else
-            this.mapQueue.add(new KeyValue<>(key, value)); <3>
-    }
-
-    protected void complete(final MapReduce<K, V, ?, ?, ?> mapReduce) {
-        if (!this.doReduce && mapReduce.getMapKeySort().isPresent()) { <4>
-            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
-            final List<KeyValue<K, V>> list = new ArrayList<>(this.mapQueue);
-            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
-            this.mapQueue.clear();
-            this.mapQueue.addAll(list);
-        } else if (mapReduce.getMapKeySort().isPresent()) {
-            final Comparator<K> comparator = mapReduce.getMapKeySort().get();
-            final List<Map.Entry<K, Queue<V>>> list = new ArrayList<>();
-            list.addAll(this.reduceMap.entrySet());
-            Collections.sort(list, Comparator.comparing(Map.Entry::getKey, comparator));
-            this.reduceMap = new LinkedHashMap<>();
-            list.forEach(entry -> this.reduceMap.put(entry.getKey(), entry.getValue()));
-        }
-    }
-}
-----
-
-<1> If the MapReduce job has a reduce, then use one data structure (`reduceMap`), else use another (`mapList`). The
-difference being that a reduction requires a grouping by key and therefore, the `Map<K,Queue<V>>` definition. If no
-reduction/grouping is required, then a simple `Queue<KeyValue<K,V>>` can be leveraged.
-<2> If reduce is to follow, then increment the Map with a new value for the key. `MapHelper` is a TinkerPop3 class
-with static methods for adding data to a Map.
-<3> If no reduce is to follow, then simply append a KeyValue to the queue.
-<4> When the map phase is complete, any map-result sorting required can be executed at this point.
-
-[source,java]
-----
-public class TinkerReduceEmitter<OK, OV> implements MapReduce.ReduceEmitter<OK, OV> {
-
-    protected Queue<KeyValue<OK, OV>> reduceQueue = new ConcurrentLinkedQueue<>();
-
-    @Override
-    public void emit(final OK key, final OV value) {
-        this.reduceQueue.add(new KeyValue<>(key, value));
-    }
-
-    protected void complete(final MapReduce<?, ?, OK, OV, ?> mapReduce) {
-        if (mapReduce.getReduceKeySort().isPresent()) {
-            final Comparator<OK> comparator = mapReduce.getReduceKeySort().get();
-            final List<KeyValue<OK, OV>> list = new ArrayList<>(this.reduceQueue);
-            Collections.sort(list, Comparator.comparing(KeyValue::getKey, comparator));
-            this.reduceQueue.clear();
-            this.reduceQueue.addAll(list);
-        }
-    }
-}
-----
-
-The method `MapReduce.reduce()` is defined as:
-
-[source,java]
-public void reduce(final OK key, final Iterator<OV> values, final ReduceEmitter<OK, OV> emitter) { ... }
-
-In other words, for the TinkerGraph implementation, iterate through the entrySet of the `reduceMap` and call the
-`reduce()` method on each entry. The `reduce()` method can emit key/value pairs which are simply aggregated into a
-`Queue<KeyValue<OK,OV>>` in an analogous fashion to `TinkerMapEmitter` when no reduce is to follow. These two emitters
-are tied together in `TinkerGraphComputer.submit()`.
-
-[source,java]
-----
-...
-for (final MapReduce mapReduce : mapReducers) {
-    if (mapReduce.doStage(MapReduce.Stage.MAP)) {
-        final TinkerMapEmitter<?, ?> mapEmitter = new TinkerMapEmitter<>(mapReduce.doStage(MapReduce.Stage.REDUCE));
-        final SynchronizedIterator<Vertex> vertices = new SynchronizedIterator<>(this.graph.vertices());
-        workers.setMapReduce(mapReduce);
-        workers.mapReduceWorkerStart(MapReduce.Stage.MAP);
-        workers.executeMapReduce(workerMapReduce -> {
-            while (true) {
-                final Vertex vertex = vertices.next();
-                if (null == vertex) return;
-                workerMapReduce.map(ComputerGraph.mapReduce(vertex), mapEmitter);
-            }
-        });
-        workers.mapReduceWorkerEnd(MapReduce.Stage.MAP);
-
-        // sort results if a map output sort is defined
-        mapEmitter.complete(mapReduce);
-
-        // no need to run combiners as this is single machine
-        if (mapReduce.doStage(MapReduce.Stage.REDUCE)) {
-            final TinkerReduceEmitter<?, ?> reduceEmitter = new TinkerReduceEmitter<>();
-            final SynchronizedIterator<Map.Entry<?, Queue<?>>> keyValues = new SynchronizedIterator((Iterator) mapEmitter.reduceMap.entrySet().iterator());
-            workers.mapReduceWorkerStart(MapReduce.Stage.REDUCE);
-            workers.executeMapReduce(workerMapReduce -> {
-                while (true) {
-                    final Map.Entry<?, Queue<?>> entry = keyValues.next();
-                    if (null == entry) return;
-                        workerMapReduce.reduce(entry.getKey(), entry.getValue().iterator(), reduceEmitter);
-                    }
-                });
-            workers.mapReduceWorkerEnd(MapReduce.Stage.REDUCE);
-            reduceEmitter.complete(mapReduce); // sort results if a reduce output sort is defined
-            mapReduce.addResultToMemory(this.memory, reduceEmitter.reduceQueue.iterator()); <1>
-        } else {
-            mapReduce.addResultToMemory(this.memory, mapEmitter.mapQueue.iterator()); <2>
-        }
-    }
-}
-...
-----
-
-<1> Note that the final results of the reducer are provided to the Memory as specified by the application developer's
-`MapReduce.addResultToMemory()` implementation.
-<2> If there is no reduce stage, the the map-stage results are inserted into Memory as specified by the application
-developer's `MapReduce.addResultToMemory()` implementation.
-
-[[io-implementations]]
-IO Implementations
-^^^^^^^^^^^^^^^^^^
-
-If a `Graph` requires custom serializers for IO to work properly, implement the `Graph.io` method.  A typical example
-of where a `Graph` would require such a custom serializers is if their identifier system uses non-primitive values,
-such as OrientDB's `Rid` class.  From basic serialization of a single `Vertex` all the way up the stack to Gremlin
-Server, the need to know how to handle these complex identifiers is an important requirement.
-
-The first step to implementing custom serializers is to first implement the `IoRegistry` interface and register the
-custom classes and serializers to it. Each `Io` implementation has different requirements for what it expects from the
-`IoRegistry`:
-
-* *GraphML* - No custom serializers expected/allowed.
-* *GraphSON* - Register a Jackson `SimpleModule`.  The `SimpleModule` encapsulates specific classes to be serialized,
-so it does not need to be registered to a specific class in the `IoRegistry` (use `null`).
-* *Gryo* - Expects registration of one of three objects:
-** Register just the custom class with a `null` Kryo `Serializer` implementation - this class will use default "field-level" Kryo serialization.
-** Register the custom class with a specific Kryo `Serializer' implementation.
-** Register the custom class with a `Function<Kryo, Serializer>` for those cases where the Kryo `Serializer` requires the `Kryo` instance to get constructed.
-
-This implementation should provide a zero-arg constructor as the stack may require instantiation via reflection.
-Consider extending `AbstractIoRegistry` for convenience as follows:
-
-[source,java]
-----
-public class MyGraphIoRegistry extends AbstractIoRegistry {
-    public MyGraphIoRegistry() {
-        register(GraphSONIo.class, null, new MyGraphSimpleModule());
-        register(GryoIo.class, MyGraphIdClass.class, new MyGraphIdSerializer());
-    }
-}
-----
-
-In the `Graph.io` method, provide the `IoRegistry` object to the supplied `Builder` and call the `create` method to
-return that `Io` instance as follows:
-
-[source,java]
-----
-public <I extends Io> I io(final Io.Builder<I> builder) {
-    return (I) builder.graph(this).registry(myGraphIoRegistry).create();
-}}
-----
-
-In this way, `Graph` implementations can pre-configure custom serializers for IO interactions and users will not need
-to know about those details. Following this pattern will ensure proper execution of the test suite as well as
-simplified usage for end-users.
-
-IMPORTANT: Proper implementation of IO is critical to successful `Graph` operations in Gremlin Server.  The Test Suite
-does have "serialization" tests that provide some assurance that an implementation is working properly, but those
-tests cannot make assertions against any specifics of a custom serializer.  It is the responsibility of the
-implementer to test the specifics of their custom serializers.
-
-TIP: Consider separating serializer code into its own module, if possible, so that clients that use the `Graph`
-implementation remotely don't need a full dependency on the entire `Graph` - just the IO components and related
-classes being serialized.
-
-[[validating-with-gremlin-test]]
-Validating with Gremlin-Test
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-image:gremlin-edumacated.png[width=225]
-
-[source,xml]
-<dependency>
-  <groupId>org.apache.tinkerpop</groupId>
-  <artifactId>gremlin-test</artifactId>
-  <version>x.y.z</version>
-</dependency>
-<dependency>
-  <groupId>org.apache.tinkerpop</groupId>
-  <artifactId>gremlin-groovy-test</artifactId>
-  <version>x.y.z</version>
-</dependency>
-
-The operational semantics of any OLTP or OLAP implementation are validated by `gremlin-test` and functional
-interoperability with the Groovy environment is ensured by `gremlin-groovy-test`. To implement these tests, provide
-test case implementations as shown below, where `XXX` below denotes the name of the graph implementation (e.g.
-TinkerGraph, Neo4jGraph, HadoopGraph, etc.).
-
-[source,java]
-----
-// Structure API tests
-@RunWith(StructureStandardSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
-public class XXXStructureStandardTest {}
-
-// Process API tests
-@RunWith(ProcessComputerSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
-public class XXXProcessComputerTest {}
-
-@RunWith(ProcessStandardSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
-public class XXXProcessStandardTest {}
-
-@RunWith(GroovyEnvironmentSuite.class)
-@GraphProviderClass(provider = XXXProvider.class, graph = TinkerGraph.class)
-public class XXXGroovyEnvironmentTest {}
-
-@RunWith(GroovyProcessStandardSuite.class)
-@GraphProviderClass(provider = XXXGraphProvider.class, graph = TinkerGraph.class)
-public class XXXGroovyProcessStandardTest {}
-
-@RunWith(GroovyProcessComputerSuite.class)
-@GraphProviderClass(provider = XXXGraphComputerProvider.class, graph = TinkerGraph.class)
-public class XXXGroovyProcessComputerTest {}
-----
-
-The above set of tests represent the minimum test suite set to implement.  There are other "integration" and
-"performance" tests that should be considered optional.  Implementing those tests requires the same pattern as shown above.
-
-IMPORTANT: It is as important to look at "ignored" tests as it is to look at ones that fail.  The `gremlin-test`
-suite utilizes the `Feature` implementation exposed by the `Graph` to determine which tests to execute.  If a test
-utilizes features that are not supported by the graph, it will ignore them.  While that may be fine, implementers
-should validate that the ignored tests are appropriately bypassed and that there are no mistakes in their feature
-definitions.  Moreover, implementers should consider filling gaps in their own test suites, especially when
-IO-related tests are being ignored.
-
-The only test-class that requires any code investment is the `GraphProvider` implementation class. This class is a
-used by the test suite to construct `Graph` configurations and instances and provides information about the
-implementation itself.  In most cases, it is best to simply extend `AbstractGraphProvider` as it provides many
-default implementations of the `GraphProvider` interface.
-
-Finally, specify the test suites that will be supported by the `Graph` implementation using the `@Graph.OptIn`
-annotation.  See the `TinkerGraph` implementation below as an example:
-
-[source,java]
-----
-@Graph.OptIn(Graph.OptIn.SUITE_STRUCTURE_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
-@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_PROCESS_COMPUTER)
-@Graph.OptIn(Graph.OptIn.SUITE_GROOVY_ENVIRONMENT)
-public class TinkerGraph implements Graph {
-----
-
-Only include annotations for the suites the implementation will support.  Note that implementing the suite, but
-not specifying the appropriate annotation will prevent the suite from running (an obvious error message will appear
-in this case when running the mis-configured suite).
-
-There are times when there may be a specific test in the suite that the implementation cannot support (despite the
-features it implements) or should not otherwise be executed.  It is possible for implementers to "opt-out" of a test
-by using the `@Graph.OptOut` annotation.  The following is an example of this annotation usage as taken from
-`HadoopGraph`:
-
-[source, java]
-----
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
-@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
-        method = "g_V_matchXa_hasXname_GarciaX__a_inXwrittenByX_b__a_inXsungByX_bX",
-        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.graph.step.map.MatchTest$Traversals",
-        method = "g_V_matchXa_inXsungByX_b__a_inXsungByX_c__b_outXwrittenByX_d__c_outXwrittenByX_e__d_hasXname_George_HarisonX__e_hasXname_Bob_MarleyXX",
-        reason = "Hadoop-Gremlin is OLAP-oriented and for OLTP operations, linear-scan joins are required. This particular tests takes many minutes to execute.")
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
-        method = "shouldNotAllowBadMemoryKeys",
-        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
-@Graph.OptOut(
-        test = "org.apache.tinkerpop.gremlin.process.computer.GraphComputerTest",
-        method = "shouldRequireRegisteringMemoryKeys",
-        reason = "Hadoop does a hard kill on failure and stops threads which stops test cases. Exception handling semantics are correct though.")
-public class HadoopGraph implements Graph {
-----
-
-The above examples show how to ignore individual tests.  It is also possible to:
-
-* Ignore an entire test case (i.e. all the methods within the test) by setting the `method` to "*".
-* Ignore a "base" test class such that test that extend from those classes will all be ignored.  This style of
-ignoring is useful for Gremlin "process" tests that have bases classes that are extended by various Gremlin flavors (e.g. groovy).
-* Ignore a `GraphComputer` test based on the type of `GraphComputer` being used.  Specify the "computer" attribute on
-the `OptOut` (which is an array specification) which should have a value of the `GraphComputer` implementation class
-that should ignore that test. This attribute should be left empty for "standard" execution and by default all
-`GraphComputer` implementations will be included in the `OptOut` so if there are multiple implementations, explicitly
-specify the ones that should be excluded.
-
-Also note that some of the tests in the Gremlin Test Suite are parameterized tests and require an additional level of
-specificity to be properly ignored.  To ignore these types of tests, examine the name template of the parameterized
-tests.  It is defined by a Java annotation that looks like this:
-
-[source, java]
-@Parameterized.Parameters(name = "expect({0})")
-
-The annotation above shows that the name of each parameterized test will be prefixed with "expect" and have
-parentheses wrapped around the first parameter (at index 0) value supplied to each test.  This information can
-only be garnered by studying the test set up itself.  Once the pattern is determined and the specific unique name of
-the parameterized test is identified, add it to the `specific` property on the `OptOut` annotation in addition to
-the other arguments.
-
-These annotations help provide users a level of transparency into test suite compliance (via the
-xref:describe-graph[describeGraph()] utility function). It also allows implementers to have a lot of flexibility in
-terms of how they wish to support TinkerPop.  For example, maybe there is a single test case that prevents an
-implementer from claiming support of a `Feature`.  The implementer could choose to either not support the `Feature`
-or to support it but "opt-out" of the test with a "reason" as to why so that users understand the limitation.
-
-IMPORTANT: Before using `OptOut` be sure that the reason for using it is sound and it is more of a last resort.
-It is possible that a test from the suite doesn't properly represent the expectations of a feature, is too broad or
-narrow for the semantics it is trying to enforce or simply contains a bug.  Please consider raising issues in the
-developer mailing list with such concerns before assuming `OptOut` is the only answer.
-
-IMPORTANT: There are no tests that specifically validate complete compliance with Gremlin Server.  Generally speaking,
-a `Graph` that passes the full Test Suite, should be compliant with Gremlin Server.  The one area where problems can
-occur is in serialization.  Always ensure that IO is properly implemented, that custom serializers are tested fully
-and ultimately integration test the `Graph` with an actual Gremlin Server instance.
-
-CAUTION: Configuring tests to run in parallel might result in errors that are difficult to debug as there is some
-shared state in test execution around graph configuration.  It is therefore recommended that parallelism be turned
-off for the test suite (the Maven SureFire Plugin is configured this way by default).  It may also be important to
-include this setting, `<reuseForks>false</reuseForks>`, in the SureFire configuration if tests are failing in an
-unexplainable way.
-
-Accessibility via GremlinPlugin
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-image:gremlin-plugin.png[width=100,float=left] The applications distributed with TinkerPop3 do not distribute with
-any graph system implementations besides TinkerGraph. If your implementation is stored in a Maven repository (e.g.
-Maven Central Repository), then it is best to provide a `GremlinPlugin` implementation so the respective jars can be
-downloaded according and when required by the user. Neo4j's GremlinPlugin is provided below for reference.
-
-[source,java]
-----
-public class Neo4jGremlinPlugin implements GremlinPlugin {
-
-    private static final String IMPORT = "import ";
-    private static final String DOT_STAR = ".*";
-
-    private static final Set<String> IMPORTS = new HashSet<String>() {{
-        add(IMPORT + Neo4jGraph.class.getPackage().getName() + DOT_STAR);
-    }};
-
-    @Override
-    public String getName() {
-        return "neo4j";
-    }
-
-    @Override
-    public void pluginTo(final PluginAcceptor pluginAcceptor) {
-        pluginAcceptor.addImports(IMPORTS);
-    }
-}
----- 
-
-With the above plugin implementations, users can now download respective binaries for Gremlin Console, Gremlin Server, etc.
-
-[source,groovy]
-gremlin> g = Neo4jGraph.open('/tmp/neo4j')
-No such property: Neo4jGraph for class: groovysh_evaluate
-Display stack trace? [yN]
-gremlin> :install org.apache.tinkerpop neo4j-gremlin x.y.z
-==>loaded: [org.apache.tinkerpop, neo4j-gremlin, …]
-gremlin> :plugin use tinkerpop.neo4j
-==>tinkerpop.neo4j activated
-gremlin> g = Neo4jGraph.open('/tmp/neo4j')
-==>neo4jgraph[EmbeddedGraphDatabase [/tmp/neo4j]]
-
-In-Depth Implementations
-~~~~~~~~~~~~~~~~~~~~~~~~
-
-image:gremlin-painting.png[width=200,float=right] The graph system implementation details presented thus far are
-minimum requirements necessary to yield a valid TinkerPop3 implementation. However, there are other areas that a
-graph system provider can tweak to provide an implementation more optimized for their underlying graph engine. Typical
-areas of focus include:
-
-* Traversal Strategies: A <<traversalstrategy,TraversalStrategy>> can be used to alter a traversal prior to its
-execution. A typical example is converting a pattern of `g.V().has('name','marko')` into a global index lookup for
-all vertices with name "marko". In this way, a `O(|V|)` lookup becomes an `O(log(|V|))`. Please review
-`TinkerGraphStepStrategy` for ideas.
-* Step Implementations: Every <<graph-traversal-steps,step>> is ultimately referenced by the `GraphTraversal`
-interface. It is possible to extend `GraphTraversal` to use a graph system specific step implementation.
-
-
-[[tinkergraph-gremlin]]
-TinkerGraph-Gremlin
--------------------
-
-[source,xml]
-----
-<dependency>
-   <groupId>org.apache.tinkerpop</groupId>
-   <artifactId>tinkergraph-gremlin</artifactId>
-   <version>x.y.z</version>
-</dependency>
-----
-
-image:tinkerpop-character.png[width=100,float=left] TinkerGraph is a single machine, in-memory (with optional
-persistence), non-transactional graph engine that provides both OLTP and OLAP functionality. It is deployed with
-TinkerPop3 and serves as the reference implementation for other providers to study in order to understand the
-semantics of the various methods of the TinkerPop3 API. Constructing a simple graph in Java8 is presented below.
-
-[source,java]
-Graph g = TinkerGraph.open();
-Vertex marko = g.addVertex("name","marko","age",29);
-Vertex lop = g.addVertex("name","lop","lang","java");
-marko.addEdge("created",lop,"weight",0.6d);
-
-The above graph creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6
-property. Next, the graph can be queried as such.
-
-[source,java]
-g.V().has("name","marko").out("created").values("name")
-
-The `g.V().has("name","marko")` part of the query can be executed in two ways.
-
- * A linear scan of all vertices filtering out those vertices that don't have the name "marko"
- * A `O(log(|V|))` index lookup for all vertices with the name "marko"
-
-Given the initial graph construction in the first code block, no index was defined and thus, a linear scan is executed.
-However, if the graph was constructed as such, then an index lookup would be used.
-
-[source,java]
-Graph g = TinkerGraph.open();
-g.createIndex("name",Vertex.class)
-
-The execution times for a vertex lookup by property is provided below for both no-index and indexed version of
-TinkerGraph over the Grateful Dead graph.
-
-[gremlin-groovy]
-----
-graph = TinkerGraph.open()
-g = graph.traversal()
-graph.io(graphml()).readGraph('data/grateful-dead.xml')
-clock(1000) {g.V().has('name','Garcia').iterate()} <1>
-graph = TinkerGraph.open()
-g = graph.traversal()
-graph.createIndex('name',Vertex.class)
-graph.io(graphml()).readGraph('data/grateful-dead.xml')
-clock(1000){g.V().has('name','Garcia').iterate()} <2>
-----
-
-<1> Determine the average runtime of 1000 vertex lookups when no `name`-index is defined.
-<2> Determine the average runtime of 1000 vertex lookups when a `name`-index is defined.
-
-IMPORTANT: Each graph system will have different mechanism by which indices and schemas are defined. TinkerPop3
-does not require any conformance in this area. In TinkerGraph, the only definitions are around indices. With other
-graph systems, property value types, indices, edge labels, etc. may be required to be defined _a priori_ to adding
-data to the graph.
-
-NOTE: TinkerGraph is distributed with Gremlin Server and is therefore automatically available to it for configuration.
-
-Configuration
-~~~~~~~~~~~~~
-
-TinkerGraph has several settings that can be provided on creation via `Configuration` object:
-
-[width="100%",cols="2,10",options="header"]
-|=========================================================
-|Property |Description
-|gremlin.graph |`org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph`
-|gremlin.tinkergraph.vertexIdManager |The `IdManager` implementation to use for vertices.
-|gremlin.tinkergraph.edgeIdManager |The `IdManager` implementation to use for edges.
-|gremlin.tinkergraph.vertexPropertyIdManager |The `IdManager` implementation to use for vertex properties.
-|gremlin.tinkergraph.defaultVertexPropertyCardinality |The default `VertexProperty.Cardinality` to use when `Vertex.property(k,v)` is called.
-|gremlin.tinkergraph.graphLocation |The path and file name for where TinkerGraph should persist the graph data. If a
-value is specified here, the the `gremlin.tinkergraph.graphFormat` should also be specified.  If this value is not
-included (default), then the graph will stay in-memory and not be loaded/persisted to disk.
-|gremlin.tinkergraph.graphFormat |The format to use to serialize the graph which may be one of the following:
-`graphml`, `graphson`, `gryo`, or a fully qualified class name that implements Io.Builder interface (which allows for
-external third party graph reader/writer formats to be used for persistence).
-If a value is specified here, then the `gremlin.tinkergraph.graphLocation` should
-also be specified.  If this value is not included (default), then the graph will stay in-memory and not be
-loaded/persisted to disk.
-|=========================================================
-
-The `IdManager` settings above refer to how TinkerGraph will control identifiers for vertices, edges and vertex
-properties.  There are several options for each of these settings: `ANY`, `LONG`, `INTEGER`, `UUID`, or the fully
-qualified class name of an `IdManager` implementation on the classpath.  When not specified, the default values
-for all settings is `ANY`, meaning that the graph will work with any object on the JVM as the identifier and will
-generate new identifiers from `Long` when the identifier is not user supplied.  TinkerGraph will also expect the
-user to understand the types used for identifiers when querying, meaning that `g.V(1)` and `g.V(1L)` could return
-two different vertices.  `LONG`, `INTEGER` and `UUID` settings will try to coerce identifier values to the expected
-type as well as generate new identifiers with that specified type.
-
-If the TinkerGraph is configured for persistence with `gremlin.tinkergraph.graphLocation` and
-`gremlin.tinkergraph.graphFormat`, then the graph will be written to the specified location with the specified
-format when `Graph.close()` is called.  In addition, if these settings are present, TinkerGraph will attempt to
-load the graph from the specified location.
-
-IMPORTANT: If choosing `graphson` as the `gremlin.tinkergraph.graphFormat`, be sure to also establish the  various
-`IdManager` settings as well to ensure that identifiers are properly coerced to the appropriate types as GraphSON
-can lose the identifier's type during serialization (i.e. it will assume `Integer` when the default for TinkerGraph
-is `Long`, which could lead to load errors that result in a message like, "Vertex with id already exists").
-
-It is important to consider the data being imported to TinkerGraph with respect to `defaultVertexPropertyCardinality`
-setting.  For example, if a `.gryo` file is known to contain multi-property data, be sure to set the default
-cardinality to `list` or else the data will import as `single`.  Consider the following:
-
-[gremlin-groovy]
-----
-graph = TinkerGraph.open()
-graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
-g = graph.traversal()
-g.V().properties()
-conf = new BaseConfiguration()
-conf.setProperty("gremlin.tinkergraph.defaultVertexPropertyCardinality","list")
-graph = TinkerGraph.open(conf)
-graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
-g = graph.traversal()
-g.V().properties()
-----
-
 [[neo4j-gremlin]]
 Neo4j-Gremlin
 -------------

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/c9149c23/docs/src/reference/implementations-tinkergraph.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/implementations-tinkergraph.asciidoc b/docs/src/reference/implementations-tinkergraph.asciidoc
new file mode 100644
index 0000000..aadec06
--- /dev/null
+++ b/docs/src/reference/implementations-tinkergraph.asciidoc
@@ -0,0 +1,144 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[tinkergraph-gremlin]]
+TinkerGraph-Gremlin
+-------------------
+
+[source,xml]
+----
+<dependency>
+   <groupId>org.apache.tinkerpop</groupId>
+   <artifactId>tinkergraph-gremlin</artifactId>
+   <version>x.y.z</version>
+</dependency>
+----
+
+image:tinkerpop-character.png[width=100,float=left] TinkerGraph is a single machine, in-memory (with optional
+persistence), non-transactional graph engine that provides both OLTP and OLAP functionality. It is deployed with
+TinkerPop3 and serves as the reference implementation for other providers to study in order to understand the
+semantics of the various methods of the TinkerPop3 API. Constructing a simple graph in Java8 is presented below.
+
+[source,java]
+Graph g = TinkerGraph.open();
+Vertex marko = g.addVertex("name","marko","age",29);
+Vertex lop = g.addVertex("name","lop","lang","java");
+marko.addEdge("created",lop,"weight",0.6d);
+
+The above graph creates two vertices named "marko" and "lop" and connects them via a created-edge with a weight=0.6
+property. Next, the graph can be queried as such.
+
+[source,java]
+g.V().has("name","marko").out("created").values("name")
+
+The `g.V().has("name","marko")` part of the query can be executed in two ways.
+
+ * A linear scan of all vertices filtering out those vertices that don't have the name "marko"
+ * A `O(log(|V|))` index lookup for all vertices with the name "marko"
+
+Given the initial graph construction in the first code block, no index was defined and thus, a linear scan is executed.
+However, if the graph was constructed as such, then an index lookup would be used.
+
+[source,java]
+Graph g = TinkerGraph.open();
+g.createIndex("name",Vertex.class)
+
+The execution times for a vertex lookup by property is provided below for both no-index and indexed version of
+TinkerGraph over the Grateful Dead graph.
+
+[gremlin-groovy]
+----
+graph = TinkerGraph.open()
+g = graph.traversal()
+graph.io(graphml()).readGraph('data/grateful-dead.xml')
+clock(1000) {g.V().has('name','Garcia').iterate()} <1>
+graph = TinkerGraph.open()
+g = graph.traversal()
+graph.createIndex('name',Vertex.class)
+graph.io(graphml()).readGraph('data/grateful-dead.xml')
+clock(1000){g.V().has('name','Garcia').iterate()} <2>
+----
+
+<1> Determine the average runtime of 1000 vertex lookups when no `name`-index is defined.
+<2> Determine the average runtime of 1000 vertex lookups when a `name`-index is defined.
+
+IMPORTANT: Each graph system will have different mechanism by which indices and schemas are defined. TinkerPop3
+does not require any conformance in this area. In TinkerGraph, the only definitions are around indices. With other
+graph systems, property value types, indices, edge labels, etc. may be required to be defined _a priori_ to adding
+data to the graph.
+
+NOTE: TinkerGraph is distributed with Gremlin Server and is therefore automatically available to it for configuration.
+
+Configuration
+~~~~~~~~~~~~~
+
+TinkerGraph has several settings that can be provided on creation via `Configuration` object:
+
+[width="100%",cols="2,10",options="header"]
+|=========================================================
+|Property |Description
+|gremlin.graph |`org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph`
+|gremlin.tinkergraph.vertexIdManager |The `IdManager` implementation to use for vertices.
+|gremlin.tinkergraph.edgeIdManager |The `IdManager` implementation to use for edges.
+|gremlin.tinkergraph.vertexPropertyIdManager |The `IdManager` implementation to use for vertex properties.
+|gremlin.tinkergraph.defaultVertexPropertyCardinality |The default `VertexProperty.Cardinality` to use when `Vertex.property(k,v)` is called.
+|gremlin.tinkergraph.graphLocation |The path and file name for where TinkerGraph should persist the graph data. If a
+value is specified here, the the `gremlin.tinkergraph.graphFormat` should also be specified.  If this value is not
+included (default), then the graph will stay in-memory and not be loaded/persisted to disk.
+|gremlin.tinkergraph.graphFormat |The format to use to serialize the graph which may be one of the following:
+`graphml`, `graphson`, `gryo`, or a fully qualified class name that implements Io.Builder interface (which allows for
+external third party graph reader/writer formats to be used for persistence).
+If a value is specified here, then the `gremlin.tinkergraph.graphLocation` should
+also be specified.  If this value is not included (default), then the graph will stay in-memory and not be
+loaded/persisted to disk.
+|=========================================================
+
+The `IdManager` settings above refer to how TinkerGraph will control identifiers for vertices, edges and vertex
+properties.  There are several options for each of these settings: `ANY`, `LONG`, `INTEGER`, `UUID`, or the fully
+qualified class name of an `IdManager` implementation on the classpath.  When not specified, the default values
+for all settings is `ANY`, meaning that the graph will work with any object on the JVM as the identifier and will
+generate new identifiers from `Long` when the identifier is not user supplied.  TinkerGraph will also expect the
+user to understand the types used for identifiers when querying, meaning that `g.V(1)` and `g.V(1L)` could return
+two different vertices.  `LONG`, `INTEGER` and `UUID` settings will try to coerce identifier values to the expected
+type as well as generate new identifiers with that specified type.
+
+If the TinkerGraph is configured for persistence with `gremlin.tinkergraph.graphLocation` and
+`gremlin.tinkergraph.graphFormat`, then the graph will be written to the specified location with the specified
+format when `Graph.close()` is called.  In addition, if these settings are present, TinkerGraph will attempt to
+load the graph from the specified location.
+
+IMPORTANT: If choosing `graphson` as the `gremlin.tinkergraph.graphFormat`, be sure to also establish the  various
+`IdManager` settings as well to ensure that identifiers are properly coerced to the appropriate types as GraphSON
+can lose the identifier's type during serialization (i.e. it will assume `Integer` when the default for TinkerGraph
+is `Long`, which could lead to load errors that result in a message like, "Vertex with id already exists").
+
+It is important to consider the data being imported to TinkerGraph with respect to `defaultVertexPropertyCardinality`
+setting.  For example, if a `.gryo` file is known to contain multi-property data, be sure to set the default
+cardinality to `list` or else the data will import as `single`.  Consider the following:
+
+[gremlin-groovy]
+----
+graph = TinkerGraph.open()
+graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
+g = graph.traversal()
+g.V().properties()
+conf = new BaseConfiguration()
+conf.setProperty("gremlin.tinkergraph.defaultVertexPropertyCardinality","list")
+graph = TinkerGraph.open(conf)
+graph.io(gryo()).readGraph("data/tinkerpop-crew.kryo")
+g = graph.traversal()
+g.V().properties()
+----

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/c9149c23/docs/src/reference/index.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/index.asciidoc b/docs/src/reference/index.asciidoc
index 8546063..0006403 100644
--- a/docs/src/reference/index.asciidoc
+++ b/docs/src/reference/index.asciidoc
@@ -32,6 +32,8 @@ include::the-graphcomputer.asciidoc[]
 
 include::gremlin-applications.asciidoc[]
 
+include::implementations-intro.asciidoc[]
+include::implementations-tinkergraph.asciidoc[]
 include::implementations-neo4j.asciidoc[]
 include::implementations-hadoop.asciidoc[]

[14/19] incubator-tinkerpop git commit: deactivate Spark plugin for implementations-neo4j.asciidoc and Neo4j plugin for implementations-hadoop.asciidoc

Posted by dk...@apache.org.

deactivate Spark plugin for implementations-neo4j.asciidoc and Neo4j plugin for implementations-hadoop.asciidoc


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/e9c6465c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/e9c6465c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/e9c6465c

Branch: refs/heads/master
Commit: e9c6465c9060b2ad4a537bd75ab71c5bae25cf22
Parents: bbf5b3f
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Thu Feb 18 23:53:44 2016 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Thu Feb 18 23:53:44 2016 +0100

----------------------------------------------------------------------
 docs/preprocessor/preprocess-file.sh | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/e9c6465c/docs/preprocessor/preprocess-file.sh
----------------------------------------------------------------------
diff --git a/docs/preprocessor/preprocess-file.sh b/docs/preprocessor/preprocess-file.sh
index a8bd3e3..fdc4799 100755
--- a/docs/preprocessor/preprocess-file.sh
+++ b/docs/preprocessor/preprocess-file.sh
@@ -37,7 +37,7 @@ fi
 trap cleanup INT
 
 function cleanup {
-  rm -rf ${output}
+  rm -rf ${output} ${CONSOLE_HOME}/.ext
   exit 255
 }
 
@@ -57,6 +57,27 @@ if [ $(grep -c '^\[gremlin' ${input}) -gt 0 ]; then
   fi
   pushd "${CONSOLE_HOME}" > /dev/null
 
+  doc=`basename ${input} .asciidoc`
+
+  case "${doc}" in
+    "implementations-neo4j")
+      # deactivate Spark plugin to prevent version conflicts between TinkerPop's Spark jars and Neo4j's Spark jars
+      mkdir .ext
+      mv ext/spark-gremlin .ext/
+      cat ext/plugins.txt | tee .ext/plugins.all | grep -Fv 'SparkGremlinPlugin' > .ext/plugins.txt
+      ;;
+    "implementations-hadoop")
+      # deactivate Neo4j plugin to prevent version conflicts between TinkerPop's Spark jars and Neo4j's Spark jars
+      mkdir .ext
+      mv ext/neo4j-gremlin .ext/
+      cat ext/plugins.txt | tee .ext/plugins.all | grep -Fv 'Neo4jGremlinPlugin' > .ext/plugins.txt
+      ;;
+  esac
+
+  if [ -d ".ext" ]; then
+    mv .ext/plugins.txt ext/
+  fi
+
   awk -f ${AWK_SCRIPTS}/prepare.awk ${input} |
   awk -f ${AWK_SCRIPTS}/init-code-blocks.awk |
   awk -f ${AWK_SCRIPTS}/progressbar.awk -v tpl=${AWK_SCRIPTS}/progressbar.groovy.template | HADOOP_GREMLIN_LIBS="${CONSOLE_HOME}/ext/giraph-gremlin/lib:${CONSOLE_HOME}/ext/tinkergraph-gremlin/lib" bin/gremlin.sh |
@@ -70,6 +91,12 @@ if [ $(grep -c '^\[gremlin' ${input}) -gt 0 ]; then
     [ ${ec} -eq 0 ] || break
   done
 
+  if [ -d ".ext" ]; then
+    mv .ext/plugins.all ext/plugins.txt
+    mv .ext/* ext/
+    rm -r .ext/
+  fi
+
   if [ ${ec} -eq 0 ]; then
     ec=`grep -c '\bpb([0-9][0-9]*);' ${output}`
   fi

[03/19] incubator-tinkerpop git commit: Added Apache Gremlin to contributing docs.

Posted by dk...@apache.org.

Added Apache Gremlin to contributing docs.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/7b8ce4ba
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/7b8ce4ba
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/7b8ce4ba

Branch: refs/heads/master
Commit: 7b8ce4bafa0b14322b0be8e3faf7869fb3757620
Parents: 4aeb4b2
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Tue Feb 16 08:28:22 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Tue Feb 16 08:28:22 2016 -0500

----------------------------------------------------------------------
 docs/src/dev/developer/contributing.asciidoc | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/7b8ce4ba/docs/src/dev/developer/contributing.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/dev/developer/contributing.asciidoc b/docs/src/dev/developer/contributing.asciidoc
index 730e7f1..88d600c 100644
--- a/docs/src/dev/developer/contributing.asciidoc
+++ b/docs/src/dev/developer/contributing.asciidoc
@@ -24,19 +24,16 @@ license and warrant that you have the legal authority to do so.
 Getting Started
 ---------------
 
-New contributors can start development with TinkerPop by first link:https://help.github.com/articles/fork-a-repo/[forking
-then cloning] the Apache TinkerPop link:https://github.com/apache/incubator-tinkerpop[GitHub repository]. Generally
-speaking it is best to tie any work done to an issue in link:https://issues.apache.org/jira/browse/TINKERPOP[JIRA].
-Either scan through JIRA for an existing open issue to work on or create a new one.
-
-NOTE: For those who are trying to find a place to start to contribute, consider looking at unresolved issues that
-have the "trivial" priority as these issues are specifically set aside as
-link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20TINKERPOP%20AND%20resolution%20%3D%20Unresolved%20AND%20priority%20%3D%20Trivial%20ORDER%20BY%20key%20DESC[low-hanging fruit]
-for newcomers.
+image:gremlin-apache.png[width=250,float=left] New contributors can start development with TinkerPop by first
+link:https://help.github.com/articles/fork-a-repo/[forking then cloning] the Apache TinkerPop
+link:https://github.com/apache/incubator-tinkerpop[GitHub repository]. Generally speaking it is best to tie any
+work done to an issue in link:https://issues.apache.org/jira/browse/TINKERPOP[JIRA]. Either scan through JIRA for an
+existing open issue to work on or create a new one.
 
 After making changes, submit a link:https://help.github.com/articles/using-pull-requests/[pull request] through
 GitHub, where the name of the pull request is prefixed with the JIRA issue number.  In this way, the pull request
-and its comments get tied back to the JIRA issue it references.
+and its comments get tied back to the JIRA issue it references, thus triggering automation that pushes GitHub comments
+into the JIRA ticket.
 
 Before issuing your pull request, please be sure of the following:
 
@@ -56,6 +53,11 @@ a successful `mvn clean install`, do `mvn verify -DskipIntegrationTests=false -p
 Once a pull request is submitted it must go through <<rtc,review>> and will be merged once three TinkerPop committers
 offer positive vote and achieve Apache consensus.
 
+NOTE: For those who are trying to find a place to start to contribute, consider looking at unresolved issues that
+have the "trivial" priority as these issues are specifically set aside as
+link:https://issues.apache.org/jira/issues/?jql=project%20%3D%20TINKERPOP%20AND%20resolution%20%3D%20Unresolved%20AND%20priority%20%3D%20Trivial%20ORDER%20BY%20key%20DESC[low-hanging fruit]
+for newcomers.
+
 [[building-testing]]
 Building and Testing
 --------------------

[15/19] incubator-tinkerpop git commit: Fixed a broken test from a previous commit.

Posted by dk...@apache.org.

Fixed a broken test from a previous commit.

Not sure how the test in gremlin server was failing as I thought I'd run the integration test suite prior to commit, but I suppose I somehow did not. CTR


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/23e90f81
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/23e90f81
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/23e90f81

Branch: refs/heads/master
Commit: 23e90f81939d8e875ee67654de419b5b928e778f
Parents: f19311b
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Feb 19 07:17:46 2016 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Fri Feb 19 07:18:42 2016 -0500

----------------------------------------------------------------------
 .../tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java     | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/23e90f81/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
----------------------------------------------------------------------
diff --git a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
index cc4b4ad..e168857 100644
--- a/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
+++ b/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/AbstractEvalOpProcessor.java
@@ -230,7 +230,9 @@ public abstract class AbstractEvalOpProcessor implements OpProcessor {
             timerContext.stop();
 
             if (t != null) {
-                if (t instanceof TimedInterruptTimeoutException) {
+                if (t instanceof OpProcessorException) {
+                    ctx.writeAndFlush(((OpProcessorException) t).getResponseMessage());
+                } else if (t instanceof TimedInterruptTimeoutException) {
                     // occurs when the TimedInterruptCustomizerProvider is in play
                     final String errorMessage = String.format("A timeout occurred within the script during evaluation of [%s] - consider increasing the limit given to TimedInterruptCustomizerProvider", msg);
                     logger.warn(errorMessage);

[18/19] incubator-tinkerpop git commit: Merge branch 'TINKERPOP-1168' into tp31

Posted by dk...@apache.org.

Merge branch 'TINKERPOP-1168' into tp31


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/26f81b99
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/26f81b99
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/26f81b99

Branch: refs/heads/master
Commit: 26f81b991b7fed66180949fabcff57f5c93ab6d1
Parents: 23e90f8 606c68b
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Fri Feb 19 19:38:12 2016 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Fri Feb 19 19:38:12 2016 +0100

----------------------------------------------------------------------
 docs/preprocessor/preprocess-file.sh            |   29 +-
 docs/preprocessor/preprocess.sh                 |    2 +-
 .../reference/implementations-hadoop.asciidoc   |  929 +++++++++
 .../reference/implementations-intro.asciidoc    |  545 ++++++
 .../reference/implementations-neo4j.asciidoc    |  261 +++
 .../implementations-tinkergraph.asciidoc        |  144 ++
 docs/src/reference/implementations.asciidoc     | 1835 ------------------
 docs/src/reference/index.asciidoc               |    5 +-
 8 files changed, 1912 insertions(+), 1838 deletions(-)
----------------------------------------------------------------------