You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by sp...@apache.org on 2018/08/09 18:12:43 UTC

[01/12] tinkerpop git commit: TINKERPOP-1967 Added connectedComponent() step

Repository: tinkerpop
Updated Branches:
  refs/heads/master f88ace142 -> 0c2fc21fb


TINKERPOP-1967 Added connectedComponent() step

Deprecated the recipe for "Connected Components" but left the old content present as I felt it had educational value.


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/0acceb90
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/0acceb90
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/0acceb90

Branch: refs/heads/master
Commit: 0acceb90efc6516df20586d756765f94efc6cbf2
Parents: f88ace1
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu May 17 14:44:01 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:40 2018 -0400

----------------------------------------------------------------------
 CHANGELOG.asciidoc                              |   1 +
 docs/src/recipes/connected-components.asciidoc  |   8 +-
 docs/src/reference/the-graphcomputer.asciidoc   |   6 +
 docs/src/reference/the-traversal.asciidoc       |  35 +++
 docs/src/upgrade/release-3.4.x.asciidoc         |  32 +++
 .../tinkerpop/gremlin/jsr223/CoreImports.java   |   4 +
 .../ConnectedComponentVertexProgram.java        | 234 +++++++++++++++++++
 .../traversal/step/map/ConnectedComponent.java  |  38 +++
 .../ConnectedComponentVertexProgramStep.java    |  98 ++++++++
 .../gremlin/process/remote/RemoteGraph.java     |   4 +
 .../traversal/dsl/graph/GraphTraversal.java     |  15 ++
 .../traversal/dsl/graph/GraphTraversalTest.java |   2 +-
 .../Process/Traversal/GraphTraversal.cs         |   9 +
 .../lib/process/graph-traversal.js              |  10 +
 .../gremlin_python/process/graph_traversal.py   |   4 +
 .../gremlin/process/ProcessComputerSuite.java   |   2 +
 .../step/map/ConnectedComponentTest.java        | 117 ++++++++++
 .../computer/SparkHadoopGraphProvider.java      |   2 +
 18 files changed, 619 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc
index 303d963..481a7d3 100644
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@ -35,6 +35,7 @@ This release also includes changes from <<release-3-3-3, 3.3.3>>.
 * Replaced `Parameterizing.addPropertyMutations()` with `Configuring.configure()`.
 * Changed interface hierarchy for `Parameterizing` and `Mutating` interfaces as they are tightly related.
 * Introduced the `with()` step modulator which can supply configuration options to `Configuring` steps.
+* Added `connectedComponent()` step and related `VertexProgram`.
 * Added `supportsUpsert()` option to `VertexFeatures` and `EdgeFeatures`.
 * `min()` and `max()` now support all types implementing `Comparable`.
 * Change the `toString()` of `Path` to be standardized as other graph elements are.

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index 46d61eb..70abdbd 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -18,7 +18,13 @@ limitations under the License.
 == Connected Components
 
 Gremlin can be used to find link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected components]
-in a graph. Consider the following graph which has three connected components:
+in a graph. As of TinkerPop 3.4.0, the process has been simplified to the `connectedComponent()`-step which is
+described in more detail in the link:
+link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+
+The `connectedComponent()`-step replaces the original recipe described below from earlier versions of TinkerPop,
+however the algorithm from that old recipe remains interesting for educational purposes and has thus not been removed.
+The original recipe considers the following graph which has three connected components:
 
 image:connected-components.png[width=600]
 

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/reference/the-graphcomputer.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/the-graphcomputer.asciidoc b/docs/src/reference/the-graphcomputer.asciidoc
index 1d7586c..9cd1c76 100644
--- a/docs/src/reference/the-graphcomputer.asciidoc
+++ b/docs/src/reference/the-graphcomputer.asciidoc
@@ -403,6 +403,12 @@ g.V().peerPressure().by('cluster').valueMap()
 g.V().peerPressure().by(outE('knows')).by('cluster').valueMap()
 ----
 
+[[connectedcomponentvertexprogram]]
+=== ConnectedComponentVertexProgram
+
+The `ConnectedComponentVertexProgram` identifies link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component]
+instances in a graph. See <<connectedcomponent-step,`connectedComponent()`>>-step for more information.
+
 [[bulkdumpervertexprogram]]
 [[clonevertexprogram]]
 === CloneVertexProgram

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/reference/the-traversal.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/the-traversal.asciidoc b/docs/src/reference/the-traversal.asciidoc
index 7d385c8..4d5b48a 100644
--- a/docs/src/reference/the-traversal.asciidoc
+++ b/docs/src/reference/the-traversal.asciidoc
@@ -625,6 +625,41 @@ g.V().coin(1.0)
 
 link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#coin-double-++[`coin(double)`]
 
+[[connectedcomponent-step]]
+=== ConnectedComponent Step
+
+The `connectedComponent()` step performs a computation to identify link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component]
+instances in a graph. When this step completes, the vertices will be labelled with a component identifier to denote
+the component to which they are associated.
+
+IMPORTANT: The `connectedComponent()`-step is a `VertexComputing`-step and as such, can only be used against a graph
+that supports `GraphComputer` (OLAP).
+
+[gremlin-groovy,modern]
+----
+g = graph.traversal().withComputer()
+g.V().
+  connectedComponent().
+    with(ConnectedComponent.propertyName, 'component').
+  project('name','component').
+    by('name').
+    by('component')
+g.V().hasLabel('person').
+  connectedComponent().
+    with(ConnectedComponent.propertyName, 'component').
+    with(ConnectedComponent.edges, outE('knows')).
+  project('name','component').
+    by('name').
+    by('component')
+----
+
+Note the use of the `with()` modulating step which provides configuration options to the algorithm. It takes
+configuration keys from the `ConnectedComponent` class and is automatically imported to the Gremlin Console.
+
+*Additional References*
+
+link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#connectedComponent--++[`connectedComponent()`]
+
 [[constant-step]]
 === Constant Step
 

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/upgrade/release-3.4.x.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/upgrade/release-3.4.x.asciidoc b/docs/src/upgrade/release-3.4.x.asciidoc
index 587e761..00f2635 100644
--- a/docs/src/upgrade/release-3.4.x.asciidoc
+++ b/docs/src/upgrade/release-3.4.x.asciidoc
@@ -65,6 +65,38 @@ release where breaking changes are allowed.
 See: link:https://issues.apache.org/jira/browse/TINKERPOP-1975[TINKERPOP-1975],
 link:http://tinkerpop.apache.org/docs/3.4.0/reference/#with-step[Reference Documentation]
 
+==== connectedComponent() Step
+
+In prior version of TinkerPop, it was recommended that the identification of
+link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component] instances in a graph be
+computed by way of a reasonably complex bit of Gremlin that looked something like this:
+
+[source,groovy]
+----
+g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
+  path().aggregate("p").
+  unfold().dedup().
+  map(__.as("v").select("p").unfold().
+         filter(unfold().where(eq("v"))).
+         unfold().dedup().order().by(id).fold()).
+  dedup()
+----
+
+The above approach had a number of drawbacks that included a large execution cost as well as incompatibilities in OLAP.
+To simplify usage of this commonly use graph algorithm, TinkerPop 3.4.0 introduces the `connectedComponent()` step
+which reduces the above operation to:
+
+[source,groovy]
+----
+g.withComputer().V().connectedComponent()
+----
+
+It is important to note that this step does require the use of a `GraphComputer` to work, as it utilizes a
+`VertexProgram` behind the scenes.
+
+See: link:https://issues.apache.org/jira/browse/TINKERPOP-1967[TINKERPOP-1967],
+link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation]
+
 ==== io() Step
 
 There have been some important changes to IO operations for reading and writing graph data. The use of `Graph.io()`

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
index 72ad47a..eb3b831 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
@@ -42,6 +42,7 @@ import org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVerte
 import org.apache.tinkerpop.gremlin.process.computer.bulkloading.IncrementalBulkLoader;
 import org.apache.tinkerpop.gremlin.process.computer.bulkloading.OneTimeBulkLoader;
 import org.apache.tinkerpop.gremlin.process.computer.clone.CloneVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
 import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.ClusterCountMapReduce;
 import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.ClusterPopulationMapReduce;
 import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.PeerPressureVertexProgram;
@@ -49,6 +50,7 @@ import org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankMa
 import org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.MemoryTraversalSideEffects;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PageRank;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PeerPressure;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.strategy.decoration.VertexProgramStrategy;
@@ -251,6 +253,8 @@ public final class CoreImports {
         // graph computer
         CLASS_IMPORTS.add(Computer.class);
         CLASS_IMPORTS.add(ComputerResult.class);
+        CLASS_IMPORTS.add(ConnectedComponent.class);
+        CLASS_IMPORTS.add(ConnectedComponentVertexProgram.class);
         CLASS_IMPORTS.add(GraphComputer.class);
         CLASS_IMPORTS.add(Memory.class);
         CLASS_IMPORTS.add(VertexProgram.class);

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
new file mode 100644
index 0000000..de718f1
--- /dev/null
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.process.computer.clustering.connected;
+
+import org.apache.commons.configuration.BaseConfiguration;
+import org.apache.commons.configuration.Configuration;
+import org.apache.commons.configuration.ConfigurationUtils;
+import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
+import org.apache.tinkerpop.gremlin.process.computer.Memory;
+import org.apache.tinkerpop.gremlin.process.computer.MemoryComputeKey;
+import org.apache.tinkerpop.gremlin.process.computer.MessageScope;
+import org.apache.tinkerpop.gremlin.process.computer.Messenger;
+import org.apache.tinkerpop.gremlin.process.computer.VertexComputeKey;
+import org.apache.tinkerpop.gremlin.process.computer.VertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.util.AbstractVertexProgramBuilder;
+import org.apache.tinkerpop.gremlin.process.traversal.Operator;
+import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
+import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
+import org.apache.tinkerpop.gremlin.structure.Direction;
+import org.apache.tinkerpop.gremlin.structure.Edge;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.structure.VertexProperty;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Set;
+
+/**
+ * Identifies "Connected Component" instances in a graph by assigning a component identifier (the lexicographically
+ * least string value of the vertex in the component) to each vertex.
+ *
+ * @author Stephen Mallette (http://stephen.genoprime.com)
+ * @author Daniel Kuppitz (http://gremlin.guru)
+ */
+public class ConnectedComponentVertexProgram implements VertexProgram<String> {
+
+    public static final String COMPONENT = "gremlin.connectedComponentVertexProgram.component";
+    private static final String PROPERTY = "gremlin.connectedComponentVertexProgram.property";
+    private static final String EDGE_TRAVERSAL = "gremlin.pageRankVertexProgram.edgeTraversal";
+    private static final String HAS_HALTED = "gremlin.connectedComponentVertexProgram.hasHalted";
+    private static final String VOTE_TO_HALT = "gremlin.connectedComponentVertexProgram.voteToHalt";
+
+    private static final Set<MemoryComputeKey> MEMORY_COMPUTE_KEYS = Collections.singleton(MemoryComputeKey.of(VOTE_TO_HALT, Operator.and, false, true));
+    private MessageScope.Local<?> scope = MessageScope.Local.of(__::bothE);
+    private Set<MessageScope> scopes;
+    private String property = COMPONENT;
+    private boolean hasHalted = false;
+    private PureTraversal<Vertex, Edge> edgeTraversal = null;
+    private Configuration configuration;
+
+    private ConnectedComponentVertexProgram() {}
+
+    @Override
+    public void loadState(final Graph graph, final Configuration config) {
+        configuration = new BaseConfiguration();
+        if (config != null) {
+            ConfigurationUtils.copy(config, configuration);
+        }
+
+        if (configuration.containsKey(EDGE_TRAVERSAL)) {
+            this.edgeTraversal = PureTraversal.loadState(configuration, EDGE_TRAVERSAL, graph);
+            this.scope = MessageScope.Local.of(() -> this.edgeTraversal.get().clone());
+        }
+
+        scopes = new HashSet<>(Collections.singletonList(scope));
+
+        this.property = configuration.getString(PROPERTY, COMPONENT);
+        this.hasHalted = configuration.getBoolean(HAS_HALTED, false);
+    }
+
+    @Override
+    public void storeState(final Configuration config) {
+        VertexProgram.super.storeState(config);
+        if (configuration != null) {
+            ConfigurationUtils.copy(configuration, config);
+        }
+    }
+
+    @Override
+    public void setup(final Memory memory) {
+        memory.set(VOTE_TO_HALT, true);
+    }
+
+    @Override
+    public void execute(final Vertex vertex, final Messenger<String> messenger, final Memory memory) {
+        if (memory.isInitialIteration()) {
+            // on the first pass, just initialize the component to its own id then pass it to all adjacent vertices
+            // for evaluation
+            vertex.property(VertexProperty.Cardinality.single, property, vertex.id().toString());
+
+            // no need to send messages from traversers that were filtered - happens for stuff like
+            // g.V().hasLabel('software').connectedComponent() (i.e. when there is a TraversalVertexProgram in the mix
+            // halting traversers. only want to send messages from traversers that are still hanging about after
+            // the filter. the unfiltered vertices can only react to messages sent to them. of course, this can
+            // lead to weirdness in results. 
+            if (vertex.edges(Direction.BOTH).hasNext() && !(hasHalted && !vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).isPresent())) {
+                // since there was message passing we don't want to halt on the first round. this should only trigger
+                // a single pass finish if the graph is completely disconnected (technically, it won't even really
+                // work in cases where halted traversers come into play
+                messenger.sendMessage(scope, vertex.id().toString());
+                memory.add(VOTE_TO_HALT, false);
+            }
+        } else {
+            // by the second iteration all vertices that matter should have a component assigned
+            String currentComponent = vertex.value(property);
+            boolean different = false;
+
+            // iterate through messages received and determine if there is a component that has a lesser value than
+            // the currently assigned one
+            final Iterator<String> componentIterator = messenger.receiveMessages();
+            while(componentIterator.hasNext()) {
+                final String candidateComponent = componentIterator.next();
+                if (candidateComponent.compareTo(currentComponent) < 0) {
+                    currentComponent = candidateComponent;
+                    different = true;
+                }
+            }
+
+            // if there is a better component then assign it and notify adjacent vertices. triggering the message
+            // passing should not halt future executions
+            if (different) {
+                vertex.property(VertexProperty.Cardinality.single, property, currentComponent);
+                messenger.sendMessage(scope, currentComponent);
+                memory.add(VOTE_TO_HALT, false);
+            }
+        }
+    }
+
+    @Override
+    public Set<VertexComputeKey> getVertexComputeKeys() {
+        return new HashSet<>(Collections.singletonList(VertexComputeKey.of(property, false)));
+    }
+
+    @Override
+    public Set<MemoryComputeKey> getMemoryComputeKeys() {
+        return MEMORY_COMPUTE_KEYS;
+    }
+
+    @Override
+    public boolean terminate(final Memory memory) {
+        final boolean voteToHalt = memory.<Boolean>get(VOTE_TO_HALT);
+        if (voteToHalt) {
+            return true;
+        } else {
+            // it is basically always assumed that the program will want to halt, but if message passing occurs, the
+            // program will want to continue, thus reset false values to true for future iterations
+            memory.set(VOTE_TO_HALT, true);
+            return false;
+        }
+    }
+
+    @Override
+    public Set<MessageScope> getMessageScopes(final Memory memory) {
+        return scopes;
+    }
+
+    @Override
+    public GraphComputer.ResultGraph getPreferredResultGraph() {
+        return GraphComputer.ResultGraph.NEW;
+    }
+
+    @Override
+    public GraphComputer.Persist getPreferredPersist() {
+        return GraphComputer.Persist.VERTEX_PROPERTIES;
+    }
+
+
+    @Override
+    @SuppressWarnings("CloneDoesntCallSuperClone,CloneDoesntDeclareCloneNotSupportedException")
+    public ConnectedComponentVertexProgram clone() {
+        return this;
+    }
+
+    @Override
+    public Features getFeatures() {
+        return new Features() {
+            @Override
+            public boolean requiresLocalMessageScopes() {
+                return true;
+            }
+
+            @Override
+            public boolean requiresVertexPropertyAddition() {
+                return true;
+            }
+        };
+    }
+
+    public static ConnectedComponentVertexProgram.Builder build() {
+        return new ConnectedComponentVertexProgram.Builder();
+    }
+
+    public static final class Builder extends AbstractVertexProgramBuilder<ConnectedComponentVertexProgram.Builder> {
+
+        private Builder() {
+            super(ConnectedComponentVertexProgram.class);
+        }
+
+        public ConnectedComponentVertexProgram.Builder hasHalted(final boolean hasHalted) {
+            this.configuration.setProperty(HAS_HALTED, hasHalted);
+            return this;
+        }
+
+        public ConnectedComponentVertexProgram.Builder edges(final Traversal.Admin<Vertex, Edge> edgeTraversal) {
+            PureTraversal.storeState(this.configuration, EDGE_TRAVERSAL, edgeTraversal);
+            return this;
+        }
+
+        public ConnectedComponentVertexProgram.Builder property(final String key) {
+            this.configuration.setProperty(PROPERTY, key);
+            return this;
+        }
+    }
+}

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
new file mode 100644
index 0000000..85558bc
--- /dev/null
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map;
+
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+
+/**
+ * Configuration options to be passed to the {@link GraphTraversal#with(String, Object)} step on
+ * {@link GraphTraversal#connectedComponent()} ()}.
+ */
+public class ConnectedComponent {
+    /**
+     * Configures the edge to traverse when calculating the pagerank.
+     */
+    public static final String edges = Graph.Hidden.hide("tinkerpop.connectedComponent.edges");
+
+    /**
+     * Configures the name of the property within which to store the pagerank value.
+     */
+    public static final String propertyName = Graph.Hidden.hide("tinkerpop.connectedComponent.propertyName");
+}

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
new file mode 100644
index 0000000..edeb497
--- /dev/null
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map;
+
+import org.apache.tinkerpop.gremlin.process.computer.GraphFilter;
+import org.apache.tinkerpop.gremlin.process.computer.Memory;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.TraversalStrategies;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
+import org.apache.tinkerpop.gremlin.process.traversal.step.Configuring;
+import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent;
+import org.apache.tinkerpop.gremlin.process.traversal.step.util.Parameters;
+import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
+import org.apache.tinkerpop.gremlin.structure.Edge;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.structure.util.StringFactory;
+
+/**
+ * @author Stephen Mallette (http://stephen.genoprime.com)
+ */
+public final class ConnectedComponentVertexProgramStep extends VertexProgramStep implements TraversalParent, Configuring {
+
+    private Parameters parameters = new Parameters();
+    private PureTraversal<Vertex, Edge> edgeTraversal;
+    private String clusterProperty = ConnectedComponentVertexProgram.COMPONENT;
+
+    public ConnectedComponentVertexProgramStep(final Traversal.Admin traversal) {
+        super(traversal);
+        this.configure(ConnectedComponent.edges, __.<Vertex>bothE());
+    }
+
+    @Override
+    public void configure(final Object... keyValues) {
+        if (keyValues[0].equals(ConnectedComponent.edges)) {
+            if (!(keyValues[1] instanceof Traversal))
+                throw new IllegalArgumentException("ConnectedComponent.edges requires a Traversal as its argument");
+            this.edgeTraversal = new PureTraversal<>(((Traversal<Vertex,Edge>) keyValues[1]).asAdmin());
+            this.integrateChild(this.edgeTraversal.get());
+        } else if (keyValues[0].equals(ConnectedComponent.propertyName)) {
+            if (!(keyValues[1] instanceof String))
+                throw new IllegalArgumentException("ConnectedComponent.propertyName requires a String as its argument");
+            this.clusterProperty = (String) keyValues[1];
+        } else {
+            this.parameters.set(this, keyValues);
+        }
+    }
+
+    @Override
+    public Parameters getParameters() {
+        return parameters;
+    }
+
+    @Override
+    public int hashCode() {
+        return super.hashCode() ^ this.clusterProperty.hashCode();
+    }
+
+    @Override
+    public String toString() {
+        return StringFactory.stepString(this, this.clusterProperty, new GraphFilter(this.computer));
+    }
+
+    @Override
+    public ConnectedComponentVertexProgram generateProgram(final Graph graph, final Memory memory) {
+        final Traversal.Admin<Vertex, Edge> detachedTraversal = this.edgeTraversal.getPure();
+        detachedTraversal.setStrategies(TraversalStrategies.GlobalCache.getStrategies(graph.getClass()));
+        return ConnectedComponentVertexProgram.build().
+                hasHalted(memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS)).
+                edges(detachedTraversal).
+                property(this.clusterProperty).create(graph);
+    }
+
+    @Override
+    public ConnectedComponentVertexProgramStep clone() {
+        return (ConnectedComponentVertexProgramStep) super.clone();
+    }
+
+}

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
index f19d5fa..9bc1771 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
@@ -48,6 +48,10 @@ import java.util.Iterator;
 @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
 @Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
 @Graph.OptOut(
+        test = "org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest",
+        method = "*",
+        reason = "hmmmm")
+@Graph.OptOut(
         test = "org.apache.tinkerpop.gremlin.process.traversal.CoreTraversalTest",
         method = "*",
         reason = "The test suite does not support profiling or lambdas and for groovy tests: 'Could not locate method: GraphTraversalSource.withStrategies([{traversalCategory=interface org.apache.tinkerpop.gremlin.process.traversal.TraversalStrategy$DecorationStrategy}])'")

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
index a675ad1..4fd247f 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
@@ -19,6 +19,8 @@
 package org.apache.tinkerpop.gremlin.process.traversal.dsl.graph;
 
 import org.apache.tinkerpop.gremlin.process.computer.VertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponentVertexProgramStep;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PageRankVertexProgramStep;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PeerPressureVertexProgramStep;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ProgramVertexProgramStep;
@@ -2452,6 +2454,18 @@ public interface GraphTraversal<S, E> extends Traversal<S, E> {
     }
 
     /**
+     * Executes a Connected Component algorithm over the graph.
+     *
+     * @return the traversal with the appended {@link ConnectedComponentVertexProgram}
+     * @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#connectedcomponent-step" target="_blank">Reference Documentation - ConnectedComponent Step</a>
+     * @since 3.4.0
+     */
+    public default GraphTraversal<S, E> connectedComponent() {
+        this.asAdmin().getBytecode().addStep(Symbols.connectedComponent);
+        return this.asAdmin().addStep((Step<E, E>) new ConnectedComponentVertexProgramStep(this.asAdmin()));
+    }
+
+    /**
      * Executes an arbitrary {@link VertexProgram} over the graph.
      *
      * @return the traversal with the appended {@link ProgramVertexProgramStep}
@@ -2882,6 +2896,7 @@ public interface GraphTraversal<S, E> extends Traversal<S, E> {
 
         public static final String pageRank = "pageRank";
         public static final String peerPressure = "peerPressure";
+        public static final String connectedComponent = "connectedComponent";
         public static final String program = "program";
 
         public static final String by = "by";

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
index 3d9a549..40ca312 100644
--- a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
+++ b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
@@ -43,7 +43,7 @@ import static org.junit.Assert.assertEquals;
 public class GraphTraversalTest {
     private static final Logger logger = LoggerFactory.getLogger(GraphTraversalTest.class);
 
-    private static Set<String> NO_GRAPH = new HashSet<>(Arrays.asList("asAdmin", "by", "read", "write", "with", "option", "iterate", "to", "from", "profile", "pageRank", "peerPressure", "program", "none"));
+    private static Set<String> NO_GRAPH = new HashSet<>(Arrays.asList("asAdmin", "by", "read", "write", "with", "option", "iterate", "to", "from", "profile", "pageRank", "connectedComponent", "peerPressure", "program", "none"));
     private static Set<String> NO_ANONYMOUS = new HashSet<>(Arrays.asList("start", "__"));
     private static Set<String> IGNORES_BYTECODE = new HashSet<>(Arrays.asList("asAdmin", "read", "write", "iterate", "mapValues", "mapKeys"));
 

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
----------------------------------------------------------------------
diff --git a/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs b/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
index 361b246..e2fa3e1 100644
--- a/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
+++ b/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
@@ -401,6 +401,15 @@ namespace Gremlin.Net.Process.Traversal
         }
 
         /// <summary>
+        ///     Adds the connectedComponent step to this <see cref="GraphTraversal{SType, EType}" />.
+        /// </summary>
+        public GraphTraversal<S, E> ConnectedComponent ()
+        {
+            Bytecode.AddStep("connectedComponent");
+            return Wrap<S, E>(this);
+        }
+
+        /// <summary>
         ///     Adds the constant step to this <see cref="GraphTraversal{SType, EType}" />.
         /// </summary>
         public GraphTraversal<S, E2> Constant<E2> (E2 e)

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
----------------------------------------------------------------------
diff --git a/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js b/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
index 4f39fa5..1c8d639 100644
--- a/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
+++ b/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
@@ -353,6 +353,16 @@ class GraphTraversal extends Traversal {
   }
   
   /**
+   * Graph traversal connectedComponent method.
+   * @param {...Object} args
+   * @returns {GraphTraversal}
+   */
+  connectedComponent(...args) {
+    this.bytecode.addStep('connectedComponent', args);
+    return this;
+  }
+  
+  /**
    * Graph traversal constant method.
    * @param {...Object} args
    * @returns {GraphTraversal}

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
----------------------------------------------------------------------
diff --git a/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py b/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
index f8b8285..b073aa8 100644
--- a/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
+++ b/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
@@ -194,6 +194,10 @@ class GraphTraversal(Traversal):
         self.bytecode.add_step("coin", *args)
         return self
 
+    def connectedComponent(self, *args):
+        self.bytecode.add_step("connectedComponent", *args)
+        return self
+
     def constant(self, *args):
         self.bytecode.add_step("constant", *args)
         return self

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
index eab562d..6466ae8 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
@@ -50,6 +50,7 @@ import org.apache.tinkerpop.gremlin.process.traversal.step.filter.TailTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.filter.WhereTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.AddEdgeTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.CoalesceTest;
+import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConstantTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.CountTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapTest;
@@ -143,6 +144,7 @@ public class ProcessComputerSuite extends AbstractGremlinSuite {
 
             // map
             CoalesceTest.Traversals.class,
+            ConnectedComponentTest.Traversals.class,
             ConstantTest.Traversals.class,
             CountTest.Traversals.class,
             FlatMapTest.Traversals.class,

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
new file mode 100644
index 0000000..25e618a
--- /dev/null
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.process.traversal.step.map;
+
+import org.apache.tinkerpop.gremlin.LoadGraphWith;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
+import org.apache.tinkerpop.gremlin.process.AbstractGremlinProcessTest;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent;
+import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.junit.Test;
+
+import static org.apache.tinkerpop.gremlin.LoadGraphWith.GraphData.MODERN;
+import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.bothE;
+import static org.junit.Assert.assertEquals;
+
+/**
+ * @author Stephen Mallette (http://stephen.genoprime.com)
+ */
+public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest {
+
+    public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent();
+
+    public abstract Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent();
+
+    public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+
+    @Test
+    @LoadGraphWith(MODERN)
+    public void g_V_connectedComponent() {
+        final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent();
+        printTraversalForm(traversal);
+        int counter = 0;
+        while (traversal.hasNext()) {
+            final Vertex vertex = traversal.next();
+            counter++;
+            assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+        }
+        assertEquals(6, counter);
+    }
+
+    @Test
+    @LoadGraphWith(MODERN)
+    public void g_V_hasLabelXsoftwareX_connectedComponent() {
+        final Traversal<Vertex, Vertex> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent();
+        printTraversalForm(traversal);
+        int counter = 0;
+        while (traversal.hasNext()) {
+            final Vertex vertex = traversal.next();
+            final String name = vertex.value("name");
+            switch (name) {
+                case "lop":
+                case "ripple":
+                    assertEquals("3", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+                    break;
+            }
+            counter++;
+        }
+        assertEquals(2, counter);
+    }
+
+    @Test
+    @LoadGraphWith(MODERN)
+    public void g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX() {
+        final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+        printTraversalForm(traversal);
+        int counter = 0;
+        while (traversal.hasNext()) {
+            final Vertex vertex = traversal.next();
+            final String name = vertex.value("name");
+            switch (name) {
+                case "marko":
+                case "vadas":
+                case "josh":
+                    assertEquals("1", vertex.value("cluster"));
+                    break;
+                case "peter":
+                    assertEquals("6", vertex.value("cluster"));
+                    break;
+            }
+            counter++;
+        }
+        assertEquals(4, counter);
+    }
+
+    public static class Traversals extends ConnectedComponentTest {
+        @Override
+        public Traversal<Vertex, Vertex> get_g_V_connectedComponent() {
+            return g.V().connectedComponent();
+        }
+
+        @Override
+        public Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent() {
+            return g.V().hasLabel("software").connectedComponent();
+        }
+        @Override
+        public Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX() {
+            return g.V().hasLabel("person").connectedComponent().with(ConnectedComponent.edges, bothE("knows")).with(ConnectedComponent.propertyName, "cluster");
+        }
+    }
+}

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
index c778c6d..2f727c8 100644
--- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
@@ -39,6 +39,7 @@ import org.apache.tinkerpop.gremlin.process.computer.Computer;
 import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
 import org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph;
 import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.PageRankTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.PeerPressureTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.map.ProgramTest;
@@ -115,6 +116,7 @@ public class SparkHadoopGraphProvider extends AbstractFileGraphProvider {
         if (null != loadGraphWith &&
                 !test.equals(ProgramTest.Traversals.class) &&
                 !test.equals(PageRankTest.Traversals.class) &&
+                !test.equals(ConnectedComponentTest.Traversals.class) &&
                 !test.equals(PeerPressureTest.Traversals.class) &&
                 !test.equals(FileSystemStorageCheck.class) &&
                 !testMethodName.equals("shouldSupportJobChaining") &&  // GraphComputerTest.shouldSupportJobChaining


[10/12] tinkerpop git commit: TINKERPOP-1967 Added a component field to the ConnectedComponent class

Posted by sp...@apache.org.
TINKERPOP-1967 Added a component field to the ConnectedComponent class

In this way the user can access the default more readily and is available in GLVs


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/8954c271
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/8954c271
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/8954c271

Branch: refs/heads/master
Commit: 8954c2713d6b2b2736b2892c03b0fe6dbe6bf018
Parents: 16231d6
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 08:29:13 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 docs/src/recipes/connected-components.asciidoc                | 5 ++++-
 .../computer/traversal/step/map/ConnectedComponent.java       | 7 +++++++
 2 files changed, 11 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/8954c271/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index e6d0f7a..c46180f 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -64,10 +64,13 @@ The traversal looks like:
 [gremlin-groovy,existing]
 ----
 g.withComputer().V().connectedComponent().
-    group().by('gremlin.connectedComponentVertexProgram.component').
+    group().by(component).
     select(values).unfold()
 ----
 
+NOTE: The `component` option passed to `by()` is statically imported from `ConnectedComponent` and refers to the
+default property key within which the result of the algorithm is stored.
+
 A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
 
 [gremlin-groovy,existing]

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/8954c271/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
index 85558bc..a2223d8 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
@@ -18,6 +18,7 @@
  */
 package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map;
 
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
 import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
 import org.apache.tinkerpop.gremlin.structure.Graph;
 
@@ -26,6 +27,12 @@ import org.apache.tinkerpop.gremlin.structure.Graph;
  * {@link GraphTraversal#connectedComponent()} ()}.
  */
 public class ConnectedComponent {
+
+    /**
+     * The default property key name that will hold the result of the algorithm.
+     */
+    public static final String component = ConnectedComponentVertexProgram.COMPONENT;
+
     /**
      * Configures the edge to traverse when calculating the pagerank.
      */


[02/12] tinkerpop git commit: TINKERPOP-1967 Removed some dead comments/code

Posted by sp...@apache.org.
TINKERPOP-1967 Removed some dead comments/code


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/cbae0a30
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/cbae0a30
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/cbae0a30

Branch: refs/heads/master
Commit: cbae0a30093303aeab1257c8ff344abd62f806cb
Parents: eb6a4de
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Aug 3 06:51:55 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 .../clustering/connected/ConnectedComponentVertexProgram.java | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/cbae0a30/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
index 82907eb..acc8f3c 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
@@ -62,7 +62,6 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
     public static final String COMPONENT = "gremlin.connectedComponentVertexProgram.component";
     private static final String PROPERTY = "gremlin.connectedComponentVertexProgram.property";
     private static final String EDGE_TRAVERSAL = "gremlin.pageRankVertexProgram.edgeTraversal";
-    private static final String HAS_HALTED = "gremlin.connectedComponentVertexProgram.hasHalted";
     private static final String VOTE_TO_HALT = "gremlin.connectedComponentVertexProgram.voteToHalt";
 
     private static final Set<MemoryComputeKey> MEMORY_COMPUTE_KEYS = Collections.singleton(MemoryComputeKey.of(VOTE_TO_HALT, Operator.and, false, true));
@@ -122,11 +121,7 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
             // for evaluation
             vertex.property(VertexProperty.Cardinality.single, property, vertex.id().toString());
 
-            // no need to send messages from traversers that were filtered - happens for stuff like
-            // g.V().hasLabel('software').connectedComponent() (i.e. when there is a TraversalVertexProgram in the mix
-            // halting traversers. only want to send messages from traversers that are still hanging about after
-            // the filter. the unfiltered vertices can only react to messages sent to them. of course, this can
-            // lead to weirdness in results. 
+            // vertices that have no edges remain in their own component - nothing to message pass here
             if (vertex.edges(Direction.BOTH).hasNext()) {
                 // since there was message passing we don't want to halt on the first round. this should only trigger
                 // a single pass finish if the graph is completely disconnected (technically, it won't even really


[06/12] tinkerpop git commit: Extended the connected-components recipe

Posted by sp...@apache.org.
Extended the connected-components recipe


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/a0119640
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/a0119640
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/a0119640

Branch: refs/heads/master
Commit: a011964030f36c87b1ecb22eefb1e7deeffed3b9
Parents: 002560a
Author: HadoopMarc <vt...@xs4all.nl>
Authored: Sun Jun 10 15:17:17 2018 +0200
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 docs/src/recipes/connected-components.asciidoc |  94 ++++++++++++--------
 docs/static/images/cc-scale-ratio.png          | Bin 0 -> 14393 bytes
 docs/static/images/cc-scale-size.png           | Bin 0 -> 12220 bytes
 3 files changed, 58 insertions(+), 36 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/a0119640/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index edbeec5..850c31f 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -31,11 +31,11 @@ Depending on the size of the graph, three solution regimes can be discriminated:
 
 1. Small graphs that fit in the memory of a single machine
 
-2. Medium graphs backed by storage for which a linear scan is still feasible. This regime is left to third party
+2. Medium-sized graphs backed by storage for which an OLTP linear scan is still feasible. This regime is left to third party
 TinkerPop implementations, since TinkerPop itself has no storage-backed reference implementations. The idea is that
 component membership is stored in the graph, rather than in memory.
 
-3. Large graphs requiring an OLAP approach to yield results in a reasonable time.
+3. Large graphs requiring an approach with `HadoopGraph` and `SparkGraphComputer` to yield results in a reasonable time.
 
 
 These regimes are discussed separately using the following graph with three weakly connected components:
@@ -55,16 +55,21 @@ g.addV().property(id, "A").as("a").
   addE("link").from("d").to("e").iterate()
 ----
 
+==== Small graph traversals
 
-===== Small graphs
-
-Connected components in a small graph can be determined with both an OLTP traversal and the OLAP
+Connected components in a small graph can be determined with either an OLTP traversal or the OLAP
 `connectedComponent()`-step. The `connectedComponent()`-step is available as of TinkerPop 3.4.0 and is
 described in more detail in the
 link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+The traversal looks like:
+[gremlin-groovy,existing]
+----
+g.withComputer().V().connectedComponent().
+    group().by('gremlin.connectedComponentVertexProgram.component').
+    select(values).unfold()
+----
 
 A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
-
 [gremlin-groovy,existing]
 ----
 g.V().emit(cyclicPath().or().not(both())).                                    <1>
@@ -73,7 +78,6 @@ g.V().emit(cyclicPath().or().not(both())).                                    <1
     by(path().unfold().dedup().fold()).                                       <4>
     select(values).unfold()                                                   <5>
 ----
-
 <1> The initial emit() step allows for output of isolated vertices, in addition to the discovery of
 components as described in (2).
 
@@ -83,7 +87,7 @@ path.  Collection `'a'` is used to keep track of visited vertices, for both subt
 and new traversals resulting from the `g.V()` linear scan.
 
 <3> While `'a'` nicely keeps track of vertices already visited, the actual components need to be extracted from the
-path information of surviving traversers. The `path().unfold().limit(1)` closure provides the starting vertex
+path information. The `path().unfold().limit(1)` closure provides the starting vertex
 of surviving traversers, which can be used to group the components.
 
 <4> This clause collects the unique vertices from all paths with the same starting vertex, thus from the same
@@ -91,39 +95,57 @@ weak component.
 
 <5> The values of the groupby map contain the lists of vertices making up the requested components.
 
-This algorithm completes in linear time with the number of vertices and edges, because a traversal is started for each
-vertex and each edge with its associated out-vertex is visited exactly once.
-
 
-==== Large graphs
 
-Large graphs require an OLAP solution with a custom VertexProgram that can be run using a graph implementation's
-GraphComputer, in particular `SparkGraphComputer` on a `HadoopGraph`. The OLAP solution also runs faster for most
-medium-sized graphs, that is when these graph have a 'natural' structure with a limited maximum path length.
+==== Small graph scalability
 
-The TinkerPop library of vertex programs contains the `WeakComponentsVertexProgram` which can be run in the same
-way as the link:http://tinkerpop.apache.org/docs/x.y.z/reference/#peerpressurevertexprogram[PeerPressureVertexProgram]:
+The scalability of the OLTP traversal and the `connectedComponent()`-step for in-memory graphs is shown in the figures
+below.
 
-[gremlin-groovy,existing]
-----
-result = g.getGraph().compute().
-    program(WeakComponentsVertexProgram.build().maxIterations(100).create()).
-    mapReduce(ClusterPopulationMapReduce.build().create()).
-    mapReduce(ClusterCountMapReduce.build().create()).
-    submit().get()
-result.memory().clusterPopulation
-gResult = result.graph().traversal()
-gResult.V().valueMap(true)
-----
+[[cc-scale-size]]
+.Run times for finding connected components in a randomly generated graph with 10 components of equal size and with an edge/vertex ratio of 6
+image::cc-scale-size.png[width=600, side=bottom]
 
-The vertex program has interconnected vertices exchange id's and store the lowest id until no vertex receives a
-lower id. This algorithm is commonly applied in
+In general, the `connectedComponent()`-step is almost a factor two faster than the OLTP traversal. Only, for very
+small graphs the overhead of running the ConnectedComponentVertexProgram is larger than that of the OLTP traversal.
+The vertex program works by having interconnected vertices exchange id's and store the lowest id until no vertex
+receives a lower id. This algorithm is commonly applied in
 link:https://en.wikipedia.org/wiki/Bulk_synchronous_parallel[bulk synchronous parallel] systems, e.g. in
-link:https://spark.apache.org/graphx[Apache Spark GraphX].
+link:https://spark.apache.org/graphx[Apache Spark GraphX]. Overhead for the vertex program arises because it has to run
+as many cycles as the largest length of the shortest paths between any two vertices in a component of the graph. In
+every cycle each vertex has to be checked for being
+"halted". Overhead of the OLTP traversal consists of each traverser having to carry complete path information. For
+pure depth-first-search or breadth-first-search implementations, connected-component algotithms should scale
+as [.big]##O##(V+E). For the traversals in the figure above this is almost the case.
+
+
+[[cc-scale-ratio]]
+.Run times for finding connected components in a randomly generated graph with 10 components, each consisting of 6400 vertices
+image::cc-scale-ratio.png[width=600]
+
+The random graphs used for the scalability tests can be modulated with the edge/vertex ratio. For small ratios the
+components generated are more lint-like and harder to process by the `connectedComponent()`-step. For high ratios
+the components are more mesh-like and the ConnectedComponentVertexProgram needs few cycles to process the graph. These
+characteristics show clearly from the graph. Indeed, for a given number of vertices, the run time of the
+`connectedComponent()`-step does not depend on the number of edges, but rather on the maximum shortest path length in
+the graph.
+
+
+==== Large graphs
+
+Large graphs in TinkerPop require distributed processing by `SparkGraphComputer` to get results in a reasonable time (OLAP
+approach). This means that the graph must be available as `HadoopGraph` (third party TinkerPop implementations often
+allow to make a graph available as an `HadoopGraph` by providing an Hadoop `InputFormat`). Running the
+`connectedComponent()`-step on
+an `HadoopGraph` works the same as for a small graph, provided that `SparkGraphComputer` is specified as the graph computer,
+either with the `gremlin.hadoop.defaultGraphComputer` property or as part of the `withComputer()`-step.
+
+Scalability of the the `connectedComponent()`-step with `SparkGraphComputer` is high, but note that:
+
+* the graph should fit in the memory of the Spark cluster to allow the VertexProgram to run its cycles without spilling
+intermediate results to disk and loosing most of the gains from the distributed processing
 
-==== Scalability
+* as discussed for small graphs, the BSP algorithm does not play well with graphs having a large shortest path between
+any pair of vertices. Overcoming this limitation is still a
+link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[subject of academic research].
 
-ToDo:
- - limits and run time regime 1
- - test of friendster graph regime 3
- - discuss: link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[http://www.vldb.org/pvldb/vol7/p1821-yan.pdf]
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/a0119640/docs/static/images/cc-scale-ratio.png
----------------------------------------------------------------------
diff --git a/docs/static/images/cc-scale-ratio.png b/docs/static/images/cc-scale-ratio.png
new file mode 100644
index 0000000..33a842d
Binary files /dev/null and b/docs/static/images/cc-scale-ratio.png differ

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/a0119640/docs/static/images/cc-scale-size.png
----------------------------------------------------------------------
diff --git a/docs/static/images/cc-scale-size.png b/docs/static/images/cc-scale-size.png
new file mode 100644
index 0000000..2b08a89
Binary files /dev/null and b/docs/static/images/cc-scale-size.png differ


[12/12] tinkerpop git commit: TINKERPOP-1967 Added GLV tests for connectedComponent()

Posted by sp...@apache.org.
TINKERPOP-1967 Added GLV tests for connectedComponent()


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/0c2fc21f
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/0c2fc21f
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/0c2fc21f

Branch: refs/heads/master
Commit: 0c2fc21fb9534060aa8758406e5e38d553ea12be
Parents: 2cc2aed
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu Aug 9 12:29:40 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 12:29:40 2018 -0400

----------------------------------------------------------------------
 .../test/cucumber/feature-steps.js              |  4 ++
 .../features/map/ConnectedComponent.feature     | 76 ++++++++++++++++++++
 .../step/map/ConnectedComponentTest.java        |  4 +-
 3 files changed, 82 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0c2fc21f/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
----------------------------------------------------------------------
diff --git a/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js b/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
index 96eeaaf..a0ae7ca 100644
--- a/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
+++ b/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
@@ -63,6 +63,10 @@ const ignoreReason = {
 
 const ignoredScenarios = {
   // An associative array containing the scenario name as key, for example:
+  'g_V_connectedComponent_hasXcomponentX': new IgnoreError(ignoreReason.computerNotSupported),
+  'g_V_dedup_connectedComponent_hasXcomponentX': new IgnoreError(ignoreReason.computerNotSupported),
+  'g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX': new IgnoreError(ignoreReason.computerNotSupported),
+  'g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX_project_byXnameX_byXclusterX': new IgnoreError(ignoreReason.computerNotSupported),
   'g_V_pageRank_hasXpageRankX': new IgnoreError(ignoreReason.computerNotSupported),
   'g_V_outXcreatedX_pageRank_byXbothEX_byXprojectRankX_timesX0X_valueMapXname_projectRankX': new IgnoreError(ignoreReason.computerNotSupported),
   'g_V_pageRank_order_byXpageRank_decrX_byXnameX_name': new IgnoreError(ignoreReason.computerNotSupported),

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0c2fc21f/gremlin-test/features/map/ConnectedComponent.feature
----------------------------------------------------------------------
diff --git a/gremlin-test/features/map/ConnectedComponent.feature b/gremlin-test/features/map/ConnectedComponent.feature
new file mode 100644
index 0000000..0f95488
--- /dev/null
+++ b/gremlin-test/features/map/ConnectedComponent.feature
@@ -0,0 +1,76 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+Feature: Step - connectedComponent()
+                
+  Scenario: g_V_connectedComponent_hasXcomponentX
+    Given the modern graph
+    And the traversal of
+      """
+      g.withComputer().V().connectedComponent().has("gremlin.connectedComponentVertexProgram.component")
+      """
+    When iterated to list
+    Then the result should be unordered
+      | result |
+      | v[marko] |
+      | v[vadas] |
+      | v[lop] |
+      | v[josh] |
+      | v[ripple] |
+      | v[peter] |
+
+  Scenario: g_V_dedup_connectedComponent_hasXcomponentX
+    Given the modern graph
+    And the traversal of
+      """
+      g.withComputer().V().dedup().connectedComponent().has("gremlin.connectedComponentVertexProgram.component")
+      """
+    When iterated to list
+    Then the result should be unordered
+      | result |
+      | v[marko] |
+      | v[vadas] |
+      | v[lop] |
+      | v[josh] |
+      | v[ripple] |
+      | v[peter] |
+
+  Scenario: g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX
+    Given the modern graph
+    And the traversal of
+      """
+      g.withComputer().V().hasLabel("software").connectedComponent().project("name","component").by("name").by("gremlin.connectedComponentVertexProgram.component")
+      """
+    When iterated to list
+    Then the result should be unordered
+      | result |
+      | m[{"name": "lop", "component": "1"}] |
+      | m[{"name": "ripple", "component": "1"}] |
+
+  Scenario: g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX_project_byXnameX_byXclusterX
+    Given the modern graph
+    And the traversal of
+      """
+      g.withComputer().V().hasLabel("person").connectedComponent().with("~tinkerpop.connectedComponent.edges", __.bothE("knows")).with("~tinkerpop.connectedComponent.propertyName", "cluster").project("name","cluster").by("name").by("cluster")
+      """
+    When iterated to list
+    Then the result should be unordered
+      | result |
+      | m[{"name": "marko", "cluster": "1"}] |
+      | m[{"name": "vadas", "cluster": "1"}] |
+      | m[{"name": "josh", "cluster": "1"}] |
+      | m[{"name": "peter", "cluster": "6"}] |
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0c2fc21f/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index a542bed..bf0d047 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -108,12 +108,12 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
     public static class Traversals extends ConnectedComponentTest {
         @Override
         public Traversal<Vertex, Vertex> get_g_V_connectedComponent_hasXcomponentX() {
-            return g.V().connectedComponent();
+            return g.V().connectedComponent().has(ConnectedComponent.component);
         }
 
         @Override
         public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent_hasXcomponentX() {
-            return g.V().dedup().connectedComponent();
+            return g.V().dedup().connectedComponent().has(ConnectedComponent.component);
         }
 
         @Override


[08/12] tinkerpop git commit: TINKERPOP-1967 Added ConnectedComponentVertexProgram to core javadoc

Posted by sp...@apache.org.
TINKERPOP-1967 Added ConnectedComponentVertexProgram to core javadoc


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/eb6a4de6
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/eb6a4de6
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/eb6a4de6

Branch: refs/heads/master
Commit: eb6a4de63ac62181d8237c0ad7ec5b3aa072e0b7
Parents: 0eb6cb1
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu Aug 2 08:20:17 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 pom.xml | 3 +++
 1 file changed, 3 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/eb6a4de6/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index 4badaa1..ea42197 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1262,6 +1262,9 @@ limitations under the License.
                                         <include>org/apache/tinkerpop/gremlin/process/computer/*.java
                                         </include>
                                         <include>
+                                            org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
+                                        </include>
+                                        <include>
                                             org/apache/tinkerpop/gremlin/process/computer/clustering/peerpressure/PeerPressureVertexProgram.java
                                         </include>
                                         <include>


[04/12] tinkerpop git commit: TINKERPOP-1967 Referenced TINKERPOP-1976 in OptOut for connectedComponent on RemoteGraph

Posted by sp...@apache.org.
TINKERPOP-1967 Referenced TINKERPOP-1976 in OptOut for connectedComponent on RemoteGraph


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/dd7fc261
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/dd7fc261
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/dd7fc261

Branch: refs/heads/master
Commit: dd7fc261fde9837bde954962b4e7777a57a01327
Parents: b2cb187
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 11:22:25 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 .../org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/dd7fc261/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
index 9bc1771..bf59651 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
@@ -50,7 +50,7 @@ import java.util.Iterator;
 @Graph.OptOut(
         test = "org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest",
         method = "*",
-        reason = "hmmmm")
+        reason = "https://issues.apache.org/jira/browse/TINKERPOP-1976")
 @Graph.OptOut(
         test = "org.apache.tinkerpop.gremlin.process.traversal.CoreTraversalTest",
         method = "*",


[09/12] tinkerpop git commit: TINKERPOP-1967 Added a test for regressions on halted traversers

Posted by sp...@apache.org.
TINKERPOP-1967 Added a test for regressions on halted traversers


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/0eb6cb1a
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/0eb6cb1a
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/0eb6cb1a

Branch: refs/heads/master
Commit: 0eb6cb1a0d7df0790cb5c2f82a792cd80bef6f3f
Parents: dd7fc26
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 14:13:33 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 .../step/map/ConnectedComponentTest.java        | 21 ++++++++++++++++++++
 1 file changed, 21 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0eb6cb1a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index 8b1904f..3133382 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -39,6 +39,8 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
 
     public abstract Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent();
 
+    public abstract Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent();
+
     public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
 
     @Test
@@ -57,6 +59,20 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
 
     @Test
     @LoadGraphWith(MODERN)
+    public void g_V_dedup_connectedComponent() {
+        final Traversal<Vertex, Vertex> traversal = get_g_V_dedup_connectedComponent();
+        printTraversalForm(traversal);
+        int counter = 0;
+        while (traversal.hasNext()) {
+            final Vertex vertex = traversal.next();
+            counter++;
+            assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+        }
+        assertEquals(6, counter);
+    }
+
+    @Test
+    @LoadGraphWith(MODERN)
     public void g_V_hasLabelXsoftwareX_connectedComponent() {
         final Traversal<Vertex, Vertex> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent();
         printTraversalForm(traversal);
@@ -106,6 +122,11 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
         }
 
         @Override
+        public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent() {
+            return g.V().dedup().connectedComponent();
+        }
+
+        @Override
         public Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent() {
             return g.V().hasLabel("software").connectedComponent();
         }


[07/12] tinkerpop git commit: TINKERPOP-1967 Minor text cleanup for connectedComponent() docs

Posted by sp...@apache.org.
TINKERPOP-1967 Minor text cleanup for connectedComponent() docs


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/16231d6e
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/16231d6e
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/16231d6e

Branch: refs/heads/master
Commit: 16231d6ed9b6c5f00a6f28da6f03a96d664fcf99
Parents: a011964
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Jun 15 08:24:22 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 docs/src/recipes/connected-components.asciidoc | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/16231d6e/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index 850c31f..e6d0f7a 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -37,7 +37,6 @@ component membership is stored in the graph, rather than in memory.
 
 3. Large graphs requiring an approach with `HadoopGraph` and `SparkGraphComputer` to yield results in a reasonable time.
 
-
 These regimes are discussed separately using the following graph with three weakly connected components:
 
 image:connected-components.png[width=600]
@@ -70,6 +69,7 @@ g.withComputer().V().connectedComponent().
 ----
 
 A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
+
 [gremlin-groovy,existing]
 ----
 g.V().emit(cyclicPath().or().not(both())).                                    <1>
@@ -95,8 +95,6 @@ weak component.
 
 <5> The values of the groupby map contain the lists of vertices making up the requested components.
 
-
-
 ==== Small graph scalability
 
 The scalability of the OLTP traversal and the `connectedComponent()`-step for in-memory graphs is shown in the figures
@@ -118,7 +116,6 @@ every cycle each vertex has to be checked for being
 pure depth-first-search or breadth-first-search implementations, connected-component algotithms should scale
 as [.big]##O##(V+E). For the traversals in the figure above this is almost the case.
 
-
 [[cc-scale-ratio]]
 .Run times for finding connected components in a randomly generated graph with 10 components, each consisting of 6400 vertices
 image::cc-scale-ratio.png[width=600]
@@ -130,7 +127,6 @@ characteristics show clearly from the graph. Indeed, for a given number of verti
 `connectedComponent()`-step does not depend on the number of edges, but rather on the maximum shortest path length in
 the graph.
 
-
 ==== Large graphs
 
 Large graphs in TinkerPop require distributed processing by `SparkGraphComputer` to get results in a reasonable time (OLAP
@@ -142,10 +138,8 @@ either with the `gremlin.hadoop.defaultGraphComputer` property or as part of the
 
 Scalability of the the `connectedComponent()`-step with `SparkGraphComputer` is high, but note that:
 
-* the graph should fit in the memory of the Spark cluster to allow the VertexProgram to run its cycles without spilling
-intermediate results to disk and loosing most of the gains from the distributed processing
-
-* as discussed for small graphs, the BSP algorithm does not play well with graphs having a large shortest path between
+* The graph should fit in the memory of the Spark cluster to allow the VertexProgram to run its cycles without spilling
+intermediate results to disk and loosing most of the gains from the distributed processing.
+* As discussed for small graphs, the BSP algorithm does not play well with graphs having a large shortest path between
 any pair of vertices. Overcoming this limitation is still a
-link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[subject of academic research].
-
+link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[subject of academic research].
\ No newline at end of file


[05/12] tinkerpop git commit: Merged vtslab recipe for connected components

Posted by sp...@apache.org.
Merged vtslab recipe for connected components


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/002560a5
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/002560a5
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/002560a5

Branch: refs/heads/master
Commit: 002560a5d9584119d5a5af7f874d110583390741
Parents: 0acceb9
Author: HadoopMarc <vt...@xs4all.nl>
Authored: Mon May 21 14:03:54 2018 +0200
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 docs/src/recipes/connected-components.asciidoc | 123 +++++++++++++-------
 1 file changed, 83 insertions(+), 40 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/002560a5/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index 70abdbd..edbeec5 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -14,17 +14,31 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 ////
+
+// @author Daniel Kuppitz (anwer on gremlin user list)
+// @author Robert Dale (answer on gremlin user list)
+// @author Marc de Lignie
+
 [[connected-components]]
 == Connected Components
 
 Gremlin can be used to find link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected components]
-in a graph. As of TinkerPop 3.4.0, the process has been simplified to the `connectedComponent()`-step which is
-described in more detail in the link:
-link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+in a graph. In a directed graph like in TinkerPop, components can be weakly or strongly connected. This recipe is
+restricted to finding link:https://en.wikipedia.org/wiki/Directed_graph#Directed_graph_connectivity[weakly
+connected components], in which the direction of edges is not taken into account.
+
+Depending on the size of the graph, three solution regimes can be discriminated:
+
+1. Small graphs that fit in the memory of a single machine
+
+2. Medium graphs backed by storage for which a linear scan is still feasible. This regime is left to third party
+TinkerPop implementations, since TinkerPop itself has no storage-backed reference implementations. The idea is that
+component membership is stored in the graph, rather than in memory.
 
-The `connectedComponent()`-step replaces the original recipe described below from earlier versions of TinkerPop,
-however the algorithm from that old recipe remains interesting for educational purposes and has thus not been removed.
-The original recipe considers the following graph which has three connected components:
+3. Large graphs requiring an OLAP approach to yield results in a reasonable time.
+
+
+These regimes are discussed separately using the following graph with three weakly connected components:
 
 image:connected-components.png[width=600]
 
@@ -41,46 +55,75 @@ g.addV().property(id, "A").as("a").
   addE("link").from("d").to("e").iterate()
 ----
 
-One way to detect the various subgraphs would be to do something like this:
+
+===== Small graphs
+
+Connected components in a small graph can be determined with both an OLTP traversal and the OLAP
+`connectedComponent()`-step. The `connectedComponent()`-step is available as of TinkerPop 3.4.0 and is
+described in more detail in the
+link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+
+A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
 
 [gremlin-groovy,existing]
 ----
-g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).  <1>
-  path().aggregate("p").                                                       <2>
-  unfold().dedup().                                                            <3>
-  map(__.as("v").select("p").unfold().                                         <4>
-         filter(unfold().where(eq("v"))).
-         unfold().dedup().order().by(id).fold()).
-  dedup()                                                                      <5>
+g.V().emit(cyclicPath().or().not(both())).                                    <1>
+    repeat(__.where(without('a')).store('a').both()).until(cyclicPath()).     <2>
+    group().by(path().unfold().limit(1)).                                     <3>
+    by(path().unfold().dedup().fold()).                                       <4>
+    select(values).unfold()                                                   <5>
 ----
 
-<1> Iterate all vertices and repeatedly traverse over both incoming and outgoing edges (TinkerPop doesn't support
-unidirectional graphs directly so it must be simulated by ignoring the direction with `both`). Note the use of `emit`
-prior to `repeat` as this allows for return of a single length path.
-<2> Aggregate the `path()` of the emitted vertices to "p". It is within these paths that the list of connected
-components will be identified. Obviously the paths list are duplicative in the sense that they contains different
-paths traveled over the same vertices.
-<3> Unroll the elements in the path list with `unfold` and `dedup`.
-<4> Use the first vertex in each path to filter against the paths stored in "p". When a path is found that has the
-vertex in it, dedup the vertices in the path, order it by the identifier. Each path output from this `map` step
-represents a connected component.
-<5> The connected component list is duplicative given the nature of the paths in "p", but now that the vertices within
-the paths are ordered, a final `dedup` will make the list of connective components unique.
-
-NOTE: This is a nice example of where running smaller pieces of a large Gremlin statement make it easier to see what
-is happening at each step. Consider running this example one line at a time (or perhaps even in a step at a time) to
-see the output at each point.
-
-While the above approach returns results nicely, the traversal doesn't appear to work with OLAP. A less efficient
-approach, but one more suited for OLAP execution looks quite similar but does not use `dedup` as heavily (thus
-`GraphComputer` is forced to analyze far more paths):
+<1> The initial emit() step allows for output of isolated vertices, in addition to the discovery of
+components as described in (2).
+
+<2> The entire component to which the first returned vertex belongs, is visited. To allow for components of any
+structure, a repeat loop is applied that only stops for a particular branch of the component when it detects a cyclic
+path.  Collection `'a'` is used to keep track of visited vertices, for both subtraversals within a component
+and new traversals resulting from the `g.V()` linear scan.
+
+<3> While `'a'` nicely keeps track of vertices already visited, the actual components need to be extracted from the
+path information of surviving traversers. The `path().unfold().limit(1)` closure provides the starting vertex
+of surviving traversers, which can be used to group the components.
+
+<4> This clause collects the unique vertices from all paths with the same starting vertex, thus from the same
+weak component.
+
+<5> The values of the groupby map contain the lists of vertices making up the requested components.
+
+This algorithm completes in linear time with the number of vertices and edges, because a traversal is started for each
+vertex and each edge with its associated out-vertex is visited exactly once.
+
+
+==== Large graphs
+
+Large graphs require an OLAP solution with a custom VertexProgram that can be run using a graph implementation's
+GraphComputer, in particular `SparkGraphComputer` on a `HadoopGraph`. The OLAP solution also runs faster for most
+medium-sized graphs, that is when these graph have a 'natural' structure with a limited maximum path length.
+
+The TinkerPop library of vertex programs contains the `WeakComponentsVertexProgram` which can be run in the same
+way as the link:http://tinkerpop.apache.org/docs/x.y.z/reference/#peerpressurevertexprogram[PeerPressureVertexProgram]:
 
 [gremlin-groovy,existing]
 ----
-g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
-  aggregate("p").by(path()).cap("p").unfold().limit(local, 1).
-  map(__.as("v").select("p").unfold().
-         filter(unfold().where(eq("v"))).
-         unfold().dedup().order().by(id).fold()
-  ).toSet()
+result = g.getGraph().compute().
+    program(WeakComponentsVertexProgram.build().maxIterations(100).create()).
+    mapReduce(ClusterPopulationMapReduce.build().create()).
+    mapReduce(ClusterCountMapReduce.build().create()).
+    submit().get()
+result.memory().clusterPopulation
+gResult = result.graph().traversal()
+gResult.V().valueMap(true)
 ----
+
+The vertex program has interconnected vertices exchange id's and store the lowest id until no vertex receives a
+lower id. This algorithm is commonly applied in
+link:https://en.wikipedia.org/wiki/Bulk_synchronous_parallel[bulk synchronous parallel] systems, e.g. in
+link:https://spark.apache.org/graphx[Apache Spark GraphX].
+
+==== Scalability
+
+ToDo:
+ - limits and run time regime 1
+ - test of friendster graph regime 3
+ - discuss: link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[http://www.vldb.org/pvldb/vol7/p1821-yan.pdf]
\ No newline at end of file


[11/12] tinkerpop git commit: TINKERPOP-1967 Rewrote connectedComponent() test to be more GLV friendly

Posted by sp...@apache.org.
TINKERPOP-1967 Rewrote connectedComponent() test to be more GLV friendly


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/2cc2aed5
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/2cc2aed5
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/2cc2aed5

Branch: refs/heads/master
Commit: 2cc2aed50d95f0f943fb012b421d4bcd0761a24d
Parents: cbae0a3
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu Aug 9 11:25:55 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 11:25:55 2018 -0400

----------------------------------------------------------------------
 .../step/map/ConnectedComponentTest.java        | 70 +++++++++-----------
 1 file changed, 30 insertions(+), 40 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/2cc2aed5/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index 3133382..a542bed 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -19,13 +19,15 @@
 package org.apache.tinkerpop.gremlin.process.traversal.step.map;
 
 import org.apache.tinkerpop.gremlin.LoadGraphWith;
-import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
 import org.apache.tinkerpop.gremlin.process.AbstractGremlinProcessTest;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent;
 import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
 import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils;
 import org.junit.Test;
 
+import java.util.Map;
+
 import static org.apache.tinkerpop.gremlin.LoadGraphWith.GraphData.MODERN;
 import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.bothE;
 import static org.junit.Assert.assertEquals;
@@ -35,55 +37,43 @@ import static org.junit.Assert.assertEquals;
  */
 public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest {
 
-    public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent();
+    public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_hasXcomponentX();
 
-    public abstract Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent();
+    public abstract Traversal<Vertex, Map<String,Object>> get_g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX();
 
-    public abstract Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent();
+    public abstract Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent_hasXcomponentX();
 
-    public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+    public abstract Traversal<Vertex, Map<String,Object>> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX_project_byXnameX_byXclusterX();
 
     @Test
     @LoadGraphWith(MODERN)
-    public void g_V_connectedComponent() {
-        final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent();
+    public void g_V_connectedComponent_hasXcomponentX() {
+        final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent_hasXcomponentX();
         printTraversalForm(traversal);
-        int counter = 0;
-        while (traversal.hasNext()) {
-            final Vertex vertex = traversal.next();
-            counter++;
-            assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
-        }
-        assertEquals(6, counter);
+        assertEquals(6, IteratorUtils.count(traversal));
     }
 
     @Test
     @LoadGraphWith(MODERN)
-    public void g_V_dedup_connectedComponent() {
-        final Traversal<Vertex, Vertex> traversal = get_g_V_dedup_connectedComponent();
+    public void g_V_dedup_connectedComponent_hasXcomponentX() {
+        final Traversal<Vertex, Vertex> traversal = get_g_V_dedup_connectedComponent_hasXcomponentX();
         printTraversalForm(traversal);
-        int counter = 0;
-        while (traversal.hasNext()) {
-            final Vertex vertex = traversal.next();
-            counter++;
-            assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
-        }
-        assertEquals(6, counter);
+        assertEquals(6, IteratorUtils.count(traversal));
     }
 
     @Test
     @LoadGraphWith(MODERN)
-    public void g_V_hasLabelXsoftwareX_connectedComponent() {
-        final Traversal<Vertex, Vertex> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent();
+    public void g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX() {
+        final Traversal<Vertex, Map<String,Object>> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX();
         printTraversalForm(traversal);
         int counter = 0;
         while (traversal.hasNext()) {
-            final Vertex vertex = traversal.next();
-            final String name = vertex.value("name");
+            final Map<String,Object> m = traversal.next();
+            final String name = m.get("name").toString();
             switch (name) {
                 case "lop":
                 case "ripple":
-                    assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+                    assertEquals(convertToVertexId("marko").toString(), m.get("component"));
                     break;
             }
             counter++;
@@ -93,21 +83,21 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
 
     @Test
     @LoadGraphWith(MODERN)
-    public void g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX() {
-        final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+    public void g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX_project_byXnameX_byXclusterX() {
+        final Traversal<Vertex, Map<String,Object>> traversal = get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX_project_byXnameX_byXclusterX();
         printTraversalForm(traversal);
         int counter = 0;
         while (traversal.hasNext()) {
-            final Vertex vertex = traversal.next();
-            final String name = vertex.value("name");
+            final Map<String,Object> m = traversal.next();
+            final String name = m.get("name").toString();
             switch (name) {
                 case "marko":
                 case "vadas":
                 case "josh":
-                    assertEquals("1", vertex.value("cluster"));
+                    assertEquals(convertToVertexId("marko").toString(), m.get("cluster"));
                     break;
                 case "peter":
-                    assertEquals("6", vertex.value("cluster"));
+                    assertEquals(convertToVertexId("peter").toString(), m.get("cluster"));
                     break;
             }
             counter++;
@@ -117,22 +107,22 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
 
     public static class Traversals extends ConnectedComponentTest {
         @Override
-        public Traversal<Vertex, Vertex> get_g_V_connectedComponent() {
+        public Traversal<Vertex, Vertex> get_g_V_connectedComponent_hasXcomponentX() {
             return g.V().connectedComponent();
         }
 
         @Override
-        public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent() {
+        public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent_hasXcomponentX() {
             return g.V().dedup().connectedComponent();
         }
 
         @Override
-        public Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent() {
-            return g.V().hasLabel("software").connectedComponent();
+        public Traversal<Vertex, Map<String,Object>> get_g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX() {
+            return g.V().hasLabel("software").connectedComponent().project("name","component").by("name").by(ConnectedComponent.component);
         }
         @Override
-        public Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX() {
-            return g.V().hasLabel("person").connectedComponent().with(ConnectedComponent.edges, bothE("knows")).with(ConnectedComponent.propertyName, "cluster");
+        public Traversal<Vertex, Map<String,Object>> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX_project_byXnameX_byXclusterX() {
+            return g.V().hasLabel("person").connectedComponent().with(ConnectedComponent.edges, bothE("knows")).with(ConnectedComponent.propertyName, "cluster").project("name","cluster").by("name").by("cluster");
         }
     }
 }


[03/12] tinkerpop git commit: TINKERPOP-1967 fixed up halted traversers

Posted by sp...@apache.org.
TINKERPOP-1967 fixed up halted traversers


Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/b2cb1874
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/b2cb1874
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/b2cb1874

Branch: refs/heads/master
Commit: b2cb187470794bfca352c8a9c7d0c444d102e46b
Parents: 8954c27
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 10:51:35 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400

----------------------------------------------------------------------
 .../ConnectedComponentVertexProgram.java        | 42 +++++++++++++++-----
 .../ConnectedComponentVertexProgramStep.java    | 26 ++++++++++--
 .../step/map/ConnectedComponentTest.java        |  2 +-
 3 files changed, 57 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/b2cb1874/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
index de718f1..82907eb 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
@@ -32,7 +32,10 @@ import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexPr
 import org.apache.tinkerpop.gremlin.process.computer.util.AbstractVertexProgramBuilder;
 import org.apache.tinkerpop.gremlin.process.traversal.Operator;
 import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.Traverser;
 import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
+import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.IndexedTraverserSet;
+import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.TraverserSet;
 import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
 import org.apache.tinkerpop.gremlin.structure.Direction;
 import org.apache.tinkerpop.gremlin.structure.Edge;
@@ -40,6 +43,8 @@ import org.apache.tinkerpop.gremlin.structure.Graph;
 import org.apache.tinkerpop.gremlin.structure.Vertex;
 import org.apache.tinkerpop.gremlin.structure.VertexProperty;
 
+import java.util.Arrays;
+import java.util.Collection;
 import java.util.Collections;
 import java.util.HashSet;
 import java.util.Iterator;
@@ -61,12 +66,14 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
     private static final String VOTE_TO_HALT = "gremlin.connectedComponentVertexProgram.voteToHalt";
 
     private static final Set<MemoryComputeKey> MEMORY_COMPUTE_KEYS = Collections.singleton(MemoryComputeKey.of(VOTE_TO_HALT, Operator.and, false, true));
+
     private MessageScope.Local<?> scope = MessageScope.Local.of(__::bothE);
     private Set<MessageScope> scopes;
     private String property = COMPONENT;
-    private boolean hasHalted = false;
     private PureTraversal<Vertex, Edge> edgeTraversal = null;
     private Configuration configuration;
+    private TraverserSet<Vertex> haltedTraversers;
+    private IndexedTraverserSet<Vertex, Vertex> haltedTraversersIndex;
 
     private ConnectedComponentVertexProgram() {}
 
@@ -85,7 +92,12 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
         scopes = new HashSet<>(Collections.singletonList(scope));
 
         this.property = configuration.getString(PROPERTY, COMPONENT);
-        this.hasHalted = configuration.getBoolean(HAS_HALTED, false);
+
+        this.haltedTraversers = TraversalVertexProgram.loadHaltedTraversers(configuration);
+        this.haltedTraversersIndex = new IndexedTraverserSet<>(v -> v);
+        for (final Traverser.Admin<Vertex> traverser : this.haltedTraversers) {
+            this.haltedTraversersIndex.add(traverser.split());
+        }
     }
 
     @Override
@@ -104,6 +116,8 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
     @Override
     public void execute(final Vertex vertex, final Messenger<String> messenger, final Memory memory) {
         if (memory.isInitialIteration()) {
+            copyHaltedTraversersFromMemory(vertex);
+
             // on the first pass, just initialize the component to its own id then pass it to all adjacent vertices
             // for evaluation
             vertex.property(VertexProperty.Cardinality.single, property, vertex.id().toString());
@@ -113,7 +127,7 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
             // halting traversers. only want to send messages from traversers that are still hanging about after
             // the filter. the unfiltered vertices can only react to messages sent to them. of course, this can
             // lead to weirdness in results. 
-            if (vertex.edges(Direction.BOTH).hasNext() && !(hasHalted && !vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).isPresent())) {
+            if (vertex.edges(Direction.BOTH).hasNext()) {
                 // since there was message passing we don't want to halt on the first round. this should only trigger
                 // a single pass finish if the graph is completely disconnected (technically, it won't even really
                 // work in cases where halted traversers come into play
@@ -148,7 +162,9 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
 
     @Override
     public Set<VertexComputeKey> getVertexComputeKeys() {
-        return new HashSet<>(Collections.singletonList(VertexComputeKey.of(property, false)));
+        return new HashSet<>(Arrays.asList(
+                VertexComputeKey.of(property, false),
+                VertexComputeKey.of(TraversalVertexProgram.HALTED_TRAVERSERS, false)));
     }
 
     @Override
@@ -158,6 +174,10 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
 
     @Override
     public boolean terminate(final Memory memory) {
+        if (memory.isInitialIteration() && this.haltedTraversersIndex != null) {
+            this.haltedTraversersIndex.clear();
+        }
+
         final boolean voteToHalt = memory.<Boolean>get(VOTE_TO_HALT);
         if (voteToHalt) {
             return true;
@@ -206,6 +226,15 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
         };
     }
 
+    private void copyHaltedTraversersFromMemory(final Vertex vertex) {
+        final Collection<Traverser.Admin<Vertex>> traversers = this.haltedTraversersIndex.get(vertex);
+        if (traversers != null) {
+            final TraverserSet<Vertex> newHaltedTraversers = new TraverserSet<>();
+            newHaltedTraversers.addAll(traversers);
+            vertex.property(VertexProperty.Cardinality.single, TraversalVertexProgram.HALTED_TRAVERSERS, newHaltedTraversers);
+        }
+    }
+
     public static ConnectedComponentVertexProgram.Builder build() {
         return new ConnectedComponentVertexProgram.Builder();
     }
@@ -216,11 +245,6 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
             super(ConnectedComponentVertexProgram.class);
         }
 
-        public ConnectedComponentVertexProgram.Builder hasHalted(final boolean hasHalted) {
-            this.configuration.setProperty(HAS_HALTED, hasHalted);
-            return this;
-        }
-
         public ConnectedComponentVertexProgram.Builder edges(final Traversal.Admin<Vertex, Edge> edgeTraversal) {
             PureTraversal.storeState(this.configuration, EDGE_TRAVERSAL, edgeTraversal);
             return this;

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/b2cb1874/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
index edeb497..c222cfa 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
@@ -29,11 +29,16 @@ import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
 import org.apache.tinkerpop.gremlin.process.traversal.step.Configuring;
 import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent;
 import org.apache.tinkerpop.gremlin.process.traversal.step.util.Parameters;
+import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.TraverserSet;
 import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
 import org.apache.tinkerpop.gremlin.structure.Edge;
 import org.apache.tinkerpop.gremlin.structure.Graph;
 import org.apache.tinkerpop.gremlin.structure.Vertex;
 import org.apache.tinkerpop.gremlin.structure.util.StringFactory;
+import org.apache.tinkerpop.gremlin.util.Serializer;
+
+import java.io.IOException;
+import java.util.Base64;
 
 /**
  * @author Stephen Mallette (http://stephen.genoprime.com)
@@ -84,10 +89,25 @@ public final class ConnectedComponentVertexProgramStep extends VertexProgramStep
     public ConnectedComponentVertexProgram generateProgram(final Graph graph, final Memory memory) {
         final Traversal.Admin<Vertex, Edge> detachedTraversal = this.edgeTraversal.getPure();
         detachedTraversal.setStrategies(TraversalStrategies.GlobalCache.getStrategies(graph.getClass()));
-        return ConnectedComponentVertexProgram.build().
-                hasHalted(memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS)).
+
+        final ConnectedComponentVertexProgram.Builder builder = ConnectedComponentVertexProgram.build().
                 edges(detachedTraversal).
-                property(this.clusterProperty).create(graph);
+                property(this.clusterProperty);
+
+        if (memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS)) {
+            final TraverserSet<?> haltedTraversers = memory.get(TraversalVertexProgram.HALTED_TRAVERSERS);
+            if (!haltedTraversers.isEmpty()) {
+                Object haltedTraversersValue;
+                try {
+                    haltedTraversersValue = Base64.getEncoder().encodeToString(Serializer.serializeObject(haltedTraversers));
+                } catch (final IOException ignored) {
+                    haltedTraversersValue = haltedTraversers;
+                }
+                builder.configure(TraversalVertexProgram.HALTED_TRAVERSERS, haltedTraversersValue);
+            }
+        }
+
+        return builder.create(graph);
     }
 
     @Override

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/b2cb1874/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index 25e618a..8b1904f 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -67,7 +67,7 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
             switch (name) {
                 case "lop":
                 case "ripple":
-                    assertEquals("3", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+                    assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
                     break;
             }
             counter++;