You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by sp...@apache.org on 2018/08/09 18:12:43 UTC
[01/12] tinkerpop git commit: TINKERPOP-1967 Added
connectedComponent() step
Repository: tinkerpop
Updated Branches:
refs/heads/master f88ace142 -> 0c2fc21fb
TINKERPOP-1967 Added connectedComponent() step
Deprecated the recipe for "Connected Components" but left the old content present as I felt it had educational value.
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/0acceb90
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/0acceb90
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/0acceb90
Branch: refs/heads/master
Commit: 0acceb90efc6516df20586d756765f94efc6cbf2
Parents: f88ace1
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu May 17 14:44:01 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:40 2018 -0400
----------------------------------------------------------------------
CHANGELOG.asciidoc | 1 +
docs/src/recipes/connected-components.asciidoc | 8 +-
docs/src/reference/the-graphcomputer.asciidoc | 6 +
docs/src/reference/the-traversal.asciidoc | 35 +++
docs/src/upgrade/release-3.4.x.asciidoc | 32 +++
.../tinkerpop/gremlin/jsr223/CoreImports.java | 4 +
.../ConnectedComponentVertexProgram.java | 234 +++++++++++++++++++
.../traversal/step/map/ConnectedComponent.java | 38 +++
.../ConnectedComponentVertexProgramStep.java | 98 ++++++++
.../gremlin/process/remote/RemoteGraph.java | 4 +
.../traversal/dsl/graph/GraphTraversal.java | 15 ++
.../traversal/dsl/graph/GraphTraversalTest.java | 2 +-
.../Process/Traversal/GraphTraversal.cs | 9 +
.../lib/process/graph-traversal.js | 10 +
.../gremlin_python/process/graph_traversal.py | 4 +
.../gremlin/process/ProcessComputerSuite.java | 2 +
.../step/map/ConnectedComponentTest.java | 117 ++++++++++
.../computer/SparkHadoopGraphProvider.java | 2 +
18 files changed, 619 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc
index 303d963..481a7d3 100644
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@ -35,6 +35,7 @@ This release also includes changes from <<release-3-3-3, 3.3.3>>.
* Replaced `Parameterizing.addPropertyMutations()` with `Configuring.configure()`.
* Changed interface hierarchy for `Parameterizing` and `Mutating` interfaces as they are tightly related.
* Introduced the `with()` step modulator which can supply configuration options to `Configuring` steps.
+* Added `connectedComponent()` step and related `VertexProgram`.
* Added `supportsUpsert()` option to `VertexFeatures` and `EdgeFeatures`.
* `min()` and `max()` now support all types implementing `Comparable`.
* Change the `toString()` of `Path` to be standardized as other graph elements are.
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index 46d61eb..70abdbd 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -18,7 +18,13 @@ limitations under the License.
== Connected Components
Gremlin can be used to find link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected components]
-in a graph. Consider the following graph which has three connected components:
+in a graph. As of TinkerPop 3.4.0, the process has been simplified to the `connectedComponent()`-step which is
+described in more detail in the link:
+link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+
+The `connectedComponent()`-step replaces the original recipe described below from earlier versions of TinkerPop,
+however the algorithm from that old recipe remains interesting for educational purposes and has thus not been removed.
+The original recipe considers the following graph which has three connected components:
image:connected-components.png[width=600]
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/reference/the-graphcomputer.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/the-graphcomputer.asciidoc b/docs/src/reference/the-graphcomputer.asciidoc
index 1d7586c..9cd1c76 100644
--- a/docs/src/reference/the-graphcomputer.asciidoc
+++ b/docs/src/reference/the-graphcomputer.asciidoc
@@ -403,6 +403,12 @@ g.V().peerPressure().by('cluster').valueMap()
g.V().peerPressure().by(outE('knows')).by('cluster').valueMap()
----
+[[connectedcomponentvertexprogram]]
+=== ConnectedComponentVertexProgram
+
+The `ConnectedComponentVertexProgram` identifies link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component]
+instances in a graph. See <<connectedcomponent-step,`connectedComponent()`>>-step for more information.
+
[[bulkdumpervertexprogram]]
[[clonevertexprogram]]
=== CloneVertexProgram
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/reference/the-traversal.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/reference/the-traversal.asciidoc b/docs/src/reference/the-traversal.asciidoc
index 7d385c8..4d5b48a 100644
--- a/docs/src/reference/the-traversal.asciidoc
+++ b/docs/src/reference/the-traversal.asciidoc
@@ -625,6 +625,41 @@ g.V().coin(1.0)
link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#coin-double-++[`coin(double)`]
+[[connectedcomponent-step]]
+=== ConnectedComponent Step
+
+The `connectedComponent()` step performs a computation to identify link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component]
+instances in a graph. When this step completes, the vertices will be labelled with a component identifier to denote
+the component to which they are associated.
+
+IMPORTANT: The `connectedComponent()`-step is a `VertexComputing`-step and as such, can only be used against a graph
+that supports `GraphComputer` (OLAP).
+
+[gremlin-groovy,modern]
+----
+g = graph.traversal().withComputer()
+g.V().
+ connectedComponent().
+ with(ConnectedComponent.propertyName, 'component').
+ project('name','component').
+ by('name').
+ by('component')
+g.V().hasLabel('person').
+ connectedComponent().
+ with(ConnectedComponent.propertyName, 'component').
+ with(ConnectedComponent.edges, outE('knows')).
+ project('name','component').
+ by('name').
+ by('component')
+----
+
+Note the use of the `with()` modulating step which provides configuration options to the algorithm. It takes
+configuration keys from the `ConnectedComponent` class and is automatically imported to the Gremlin Console.
+
+*Additional References*
+
+link:++http://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#connectedComponent--++[`connectedComponent()`]
+
[[constant-step]]
=== Constant Step
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/docs/src/upgrade/release-3.4.x.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/upgrade/release-3.4.x.asciidoc b/docs/src/upgrade/release-3.4.x.asciidoc
index 587e761..00f2635 100644
--- a/docs/src/upgrade/release-3.4.x.asciidoc
+++ b/docs/src/upgrade/release-3.4.x.asciidoc
@@ -65,6 +65,38 @@ release where breaking changes are allowed.
See: link:https://issues.apache.org/jira/browse/TINKERPOP-1975[TINKERPOP-1975],
link:http://tinkerpop.apache.org/docs/3.4.0/reference/#with-step[Reference Documentation]
+==== connectedComponent() Step
+
+In prior version of TinkerPop, it was recommended that the identification of
+link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[Connected Component] instances in a graph be
+computed by way of a reasonably complex bit of Gremlin that looked something like this:
+
+[source,groovy]
+----
+g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
+ path().aggregate("p").
+ unfold().dedup().
+ map(__.as("v").select("p").unfold().
+ filter(unfold().where(eq("v"))).
+ unfold().dedup().order().by(id).fold()).
+ dedup()
+----
+
+The above approach had a number of drawbacks that included a large execution cost as well as incompatibilities in OLAP.
+To simplify usage of this commonly use graph algorithm, TinkerPop 3.4.0 introduces the `connectedComponent()` step
+which reduces the above operation to:
+
+[source,groovy]
+----
+g.withComputer().V().connectedComponent()
+----
+
+It is important to note that this step does require the use of a `GraphComputer` to work, as it utilizes a
+`VertexProgram` behind the scenes.
+
+See: link:https://issues.apache.org/jira/browse/TINKERPOP-1967[TINKERPOP-1967],
+link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation]
+
==== io() Step
There have been some important changes to IO operations for reading and writing graph data. The use of `Graph.io()`
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
index 72ad47a..eb3b831 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/CoreImports.java
@@ -42,6 +42,7 @@ import org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVerte
import org.apache.tinkerpop.gremlin.process.computer.bulkloading.IncrementalBulkLoader;
import org.apache.tinkerpop.gremlin.process.computer.bulkloading.OneTimeBulkLoader;
import org.apache.tinkerpop.gremlin.process.computer.clone.CloneVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.ClusterCountMapReduce;
import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.ClusterPopulationMapReduce;
import org.apache.tinkerpop.gremlin.process.computer.clustering.peerpressure.PeerPressureVertexProgram;
@@ -49,6 +50,7 @@ import org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankMa
import org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram;
import org.apache.tinkerpop.gremlin.process.computer.traversal.MemoryTraversalSideEffects;
import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent;
import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PageRank;
import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PeerPressure;
import org.apache.tinkerpop.gremlin.process.computer.traversal.strategy.decoration.VertexProgramStrategy;
@@ -251,6 +253,8 @@ public final class CoreImports {
// graph computer
CLASS_IMPORTS.add(Computer.class);
CLASS_IMPORTS.add(ComputerResult.class);
+ CLASS_IMPORTS.add(ConnectedComponent.class);
+ CLASS_IMPORTS.add(ConnectedComponentVertexProgram.class);
CLASS_IMPORTS.add(GraphComputer.class);
CLASS_IMPORTS.add(Memory.class);
CLASS_IMPORTS.add(VertexProgram.class);
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
new file mode 100644
index 0000000..de718f1
--- /dev/null
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.process.computer.clustering.connected;
+
+import org.apache.commons.configuration.BaseConfiguration;
+import org.apache.commons.configuration.Configuration;
+import org.apache.commons.configuration.ConfigurationUtils;
+import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
+import org.apache.tinkerpop.gremlin.process.computer.Memory;
+import org.apache.tinkerpop.gremlin.process.computer.MemoryComputeKey;
+import org.apache.tinkerpop.gremlin.process.computer.MessageScope;
+import org.apache.tinkerpop.gremlin.process.computer.Messenger;
+import org.apache.tinkerpop.gremlin.process.computer.VertexComputeKey;
+import org.apache.tinkerpop.gremlin.process.computer.VertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.util.AbstractVertexProgramBuilder;
+import org.apache.tinkerpop.gremlin.process.traversal.Operator;
+import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
+import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
+import org.apache.tinkerpop.gremlin.structure.Direction;
+import org.apache.tinkerpop.gremlin.structure.Edge;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.structure.VertexProperty;
+
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.Set;
+
+/**
+ * Identifies "Connected Component" instances in a graph by assigning a component identifier (the lexicographically
+ * least string value of the vertex in the component) to each vertex.
+ *
+ * @author Stephen Mallette (http://stephen.genoprime.com)
+ * @author Daniel Kuppitz (http://gremlin.guru)
+ */
+public class ConnectedComponentVertexProgram implements VertexProgram<String> {
+
+ public static final String COMPONENT = "gremlin.connectedComponentVertexProgram.component";
+ private static final String PROPERTY = "gremlin.connectedComponentVertexProgram.property";
+ private static final String EDGE_TRAVERSAL = "gremlin.pageRankVertexProgram.edgeTraversal";
+ private static final String HAS_HALTED = "gremlin.connectedComponentVertexProgram.hasHalted";
+ private static final String VOTE_TO_HALT = "gremlin.connectedComponentVertexProgram.voteToHalt";
+
+ private static final Set<MemoryComputeKey> MEMORY_COMPUTE_KEYS = Collections.singleton(MemoryComputeKey.of(VOTE_TO_HALT, Operator.and, false, true));
+ private MessageScope.Local<?> scope = MessageScope.Local.of(__::bothE);
+ private Set<MessageScope> scopes;
+ private String property = COMPONENT;
+ private boolean hasHalted = false;
+ private PureTraversal<Vertex, Edge> edgeTraversal = null;
+ private Configuration configuration;
+
+ private ConnectedComponentVertexProgram() {}
+
+ @Override
+ public void loadState(final Graph graph, final Configuration config) {
+ configuration = new BaseConfiguration();
+ if (config != null) {
+ ConfigurationUtils.copy(config, configuration);
+ }
+
+ if (configuration.containsKey(EDGE_TRAVERSAL)) {
+ this.edgeTraversal = PureTraversal.loadState(configuration, EDGE_TRAVERSAL, graph);
+ this.scope = MessageScope.Local.of(() -> this.edgeTraversal.get().clone());
+ }
+
+ scopes = new HashSet<>(Collections.singletonList(scope));
+
+ this.property = configuration.getString(PROPERTY, COMPONENT);
+ this.hasHalted = configuration.getBoolean(HAS_HALTED, false);
+ }
+
+ @Override
+ public void storeState(final Configuration config) {
+ VertexProgram.super.storeState(config);
+ if (configuration != null) {
+ ConfigurationUtils.copy(configuration, config);
+ }
+ }
+
+ @Override
+ public void setup(final Memory memory) {
+ memory.set(VOTE_TO_HALT, true);
+ }
+
+ @Override
+ public void execute(final Vertex vertex, final Messenger<String> messenger, final Memory memory) {
+ if (memory.isInitialIteration()) {
+ // on the first pass, just initialize the component to its own id then pass it to all adjacent vertices
+ // for evaluation
+ vertex.property(VertexProperty.Cardinality.single, property, vertex.id().toString());
+
+ // no need to send messages from traversers that were filtered - happens for stuff like
+ // g.V().hasLabel('software').connectedComponent() (i.e. when there is a TraversalVertexProgram in the mix
+ // halting traversers. only want to send messages from traversers that are still hanging about after
+ // the filter. the unfiltered vertices can only react to messages sent to them. of course, this can
+ // lead to weirdness in results.
+ if (vertex.edges(Direction.BOTH).hasNext() && !(hasHalted && !vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).isPresent())) {
+ // since there was message passing we don't want to halt on the first round. this should only trigger
+ // a single pass finish if the graph is completely disconnected (technically, it won't even really
+ // work in cases where halted traversers come into play
+ messenger.sendMessage(scope, vertex.id().toString());
+ memory.add(VOTE_TO_HALT, false);
+ }
+ } else {
+ // by the second iteration all vertices that matter should have a component assigned
+ String currentComponent = vertex.value(property);
+ boolean different = false;
+
+ // iterate through messages received and determine if there is a component that has a lesser value than
+ // the currently assigned one
+ final Iterator<String> componentIterator = messenger.receiveMessages();
+ while(componentIterator.hasNext()) {
+ final String candidateComponent = componentIterator.next();
+ if (candidateComponent.compareTo(currentComponent) < 0) {
+ currentComponent = candidateComponent;
+ different = true;
+ }
+ }
+
+ // if there is a better component then assign it and notify adjacent vertices. triggering the message
+ // passing should not halt future executions
+ if (different) {
+ vertex.property(VertexProperty.Cardinality.single, property, currentComponent);
+ messenger.sendMessage(scope, currentComponent);
+ memory.add(VOTE_TO_HALT, false);
+ }
+ }
+ }
+
+ @Override
+ public Set<VertexComputeKey> getVertexComputeKeys() {
+ return new HashSet<>(Collections.singletonList(VertexComputeKey.of(property, false)));
+ }
+
+ @Override
+ public Set<MemoryComputeKey> getMemoryComputeKeys() {
+ return MEMORY_COMPUTE_KEYS;
+ }
+
+ @Override
+ public boolean terminate(final Memory memory) {
+ final boolean voteToHalt = memory.<Boolean>get(VOTE_TO_HALT);
+ if (voteToHalt) {
+ return true;
+ } else {
+ // it is basically always assumed that the program will want to halt, but if message passing occurs, the
+ // program will want to continue, thus reset false values to true for future iterations
+ memory.set(VOTE_TO_HALT, true);
+ return false;
+ }
+ }
+
+ @Override
+ public Set<MessageScope> getMessageScopes(final Memory memory) {
+ return scopes;
+ }
+
+ @Override
+ public GraphComputer.ResultGraph getPreferredResultGraph() {
+ return GraphComputer.ResultGraph.NEW;
+ }
+
+ @Override
+ public GraphComputer.Persist getPreferredPersist() {
+ return GraphComputer.Persist.VERTEX_PROPERTIES;
+ }
+
+
+ @Override
+ @SuppressWarnings("CloneDoesntCallSuperClone,CloneDoesntDeclareCloneNotSupportedException")
+ public ConnectedComponentVertexProgram clone() {
+ return this;
+ }
+
+ @Override
+ public Features getFeatures() {
+ return new Features() {
+ @Override
+ public boolean requiresLocalMessageScopes() {
+ return true;
+ }
+
+ @Override
+ public boolean requiresVertexPropertyAddition() {
+ return true;
+ }
+ };
+ }
+
+ public static ConnectedComponentVertexProgram.Builder build() {
+ return new ConnectedComponentVertexProgram.Builder();
+ }
+
+ public static final class Builder extends AbstractVertexProgramBuilder<ConnectedComponentVertexProgram.Builder> {
+
+ private Builder() {
+ super(ConnectedComponentVertexProgram.class);
+ }
+
+ public ConnectedComponentVertexProgram.Builder hasHalted(final boolean hasHalted) {
+ this.configuration.setProperty(HAS_HALTED, hasHalted);
+ return this;
+ }
+
+ public ConnectedComponentVertexProgram.Builder edges(final Traversal.Admin<Vertex, Edge> edgeTraversal) {
+ PureTraversal.storeState(this.configuration, EDGE_TRAVERSAL, edgeTraversal);
+ return this;
+ }
+
+ public ConnectedComponentVertexProgram.Builder property(final String key) {
+ this.configuration.setProperty(PROPERTY, key);
+ return this;
+ }
+ }
+}
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
new file mode 100644
index 0000000..85558bc
--- /dev/null
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map;
+
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+
+/**
+ * Configuration options to be passed to the {@link GraphTraversal#with(String, Object)} step on
+ * {@link GraphTraversal#connectedComponent()} ()}.
+ */
+public class ConnectedComponent {
+ /**
+ * Configures the edge to traverse when calculating the pagerank.
+ */
+ public static final String edges = Graph.Hidden.hide("tinkerpop.connectedComponent.edges");
+
+ /**
+ * Configures the name of the property within which to store the pagerank value.
+ */
+ public static final String propertyName = Graph.Hidden.hide("tinkerpop.connectedComponent.propertyName");
+}
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
new file mode 100644
index 0000000..edeb497
--- /dev/null
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map;
+
+import org.apache.tinkerpop.gremlin.process.computer.GraphFilter;
+import org.apache.tinkerpop.gremlin.process.computer.Memory;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.TraversalStrategies;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
+import org.apache.tinkerpop.gremlin.process.traversal.step.Configuring;
+import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent;
+import org.apache.tinkerpop.gremlin.process.traversal.step.util.Parameters;
+import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
+import org.apache.tinkerpop.gremlin.structure.Edge;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.structure.util.StringFactory;
+
+/**
+ * @author Stephen Mallette (http://stephen.genoprime.com)
+ */
+public final class ConnectedComponentVertexProgramStep extends VertexProgramStep implements TraversalParent, Configuring {
+
+ private Parameters parameters = new Parameters();
+ private PureTraversal<Vertex, Edge> edgeTraversal;
+ private String clusterProperty = ConnectedComponentVertexProgram.COMPONENT;
+
+ public ConnectedComponentVertexProgramStep(final Traversal.Admin traversal) {
+ super(traversal);
+ this.configure(ConnectedComponent.edges, __.<Vertex>bothE());
+ }
+
+ @Override
+ public void configure(final Object... keyValues) {
+ if (keyValues[0].equals(ConnectedComponent.edges)) {
+ if (!(keyValues[1] instanceof Traversal))
+ throw new IllegalArgumentException("ConnectedComponent.edges requires a Traversal as its argument");
+ this.edgeTraversal = new PureTraversal<>(((Traversal<Vertex,Edge>) keyValues[1]).asAdmin());
+ this.integrateChild(this.edgeTraversal.get());
+ } else if (keyValues[0].equals(ConnectedComponent.propertyName)) {
+ if (!(keyValues[1] instanceof String))
+ throw new IllegalArgumentException("ConnectedComponent.propertyName requires a String as its argument");
+ this.clusterProperty = (String) keyValues[1];
+ } else {
+ this.parameters.set(this, keyValues);
+ }
+ }
+
+ @Override
+ public Parameters getParameters() {
+ return parameters;
+ }
+
+ @Override
+ public int hashCode() {
+ return super.hashCode() ^ this.clusterProperty.hashCode();
+ }
+
+ @Override
+ public String toString() {
+ return StringFactory.stepString(this, this.clusterProperty, new GraphFilter(this.computer));
+ }
+
+ @Override
+ public ConnectedComponentVertexProgram generateProgram(final Graph graph, final Memory memory) {
+ final Traversal.Admin<Vertex, Edge> detachedTraversal = this.edgeTraversal.getPure();
+ detachedTraversal.setStrategies(TraversalStrategies.GlobalCache.getStrategies(graph.getClass()));
+ return ConnectedComponentVertexProgram.build().
+ hasHalted(memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS)).
+ edges(detachedTraversal).
+ property(this.clusterProperty).create(graph);
+ }
+
+ @Override
+ public ConnectedComponentVertexProgramStep clone() {
+ return (ConnectedComponentVertexProgramStep) super.clone();
+ }
+
+}
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
index f19d5fa..9bc1771 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
@@ -48,6 +48,10 @@ import java.util.Iterator;
@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_STANDARD)
@Graph.OptIn(Graph.OptIn.SUITE_PROCESS_COMPUTER)
@Graph.OptOut(
+ test = "org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest",
+ method = "*",
+ reason = "hmmmm")
+@Graph.OptOut(
test = "org.apache.tinkerpop.gremlin.process.traversal.CoreTraversalTest",
method = "*",
reason = "The test suite does not support profiling or lambdas and for groovy tests: 'Could not locate method: GraphTraversalSource.withStrategies([{traversalCategory=interface org.apache.tinkerpop.gremlin.process.traversal.TraversalStrategy$DecorationStrategy}])'")
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
index a675ad1..4fd247f 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java
@@ -19,6 +19,8 @@
package org.apache.tinkerpop.gremlin.process.traversal.dsl.graph;
import org.apache.tinkerpop.gremlin.process.computer.VertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponentVertexProgramStep;
import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PageRankVertexProgramStep;
import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.PeerPressureVertexProgramStep;
import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ProgramVertexProgramStep;
@@ -2452,6 +2454,18 @@ public interface GraphTraversal<S, E> extends Traversal<S, E> {
}
/**
+ * Executes a Connected Component algorithm over the graph.
+ *
+ * @return the traversal with the appended {@link ConnectedComponentVertexProgram}
+ * @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#connectedcomponent-step" target="_blank">Reference Documentation - ConnectedComponent Step</a>
+ * @since 3.4.0
+ */
+ public default GraphTraversal<S, E> connectedComponent() {
+ this.asAdmin().getBytecode().addStep(Symbols.connectedComponent);
+ return this.asAdmin().addStep((Step<E, E>) new ConnectedComponentVertexProgramStep(this.asAdmin()));
+ }
+
+ /**
* Executes an arbitrary {@link VertexProgram} over the graph.
*
* @return the traversal with the appended {@link ProgramVertexProgramStep}
@@ -2882,6 +2896,7 @@ public interface GraphTraversal<S, E> extends Traversal<S, E> {
public static final String pageRank = "pageRank";
public static final String peerPressure = "peerPressure";
+ public static final String connectedComponent = "connectedComponent";
public static final String program = "program";
public static final String by = "by";
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
index 3d9a549..40ca312 100644
--- a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
+++ b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalTest.java
@@ -43,7 +43,7 @@ import static org.junit.Assert.assertEquals;
public class GraphTraversalTest {
private static final Logger logger = LoggerFactory.getLogger(GraphTraversalTest.class);
- private static Set<String> NO_GRAPH = new HashSet<>(Arrays.asList("asAdmin", "by", "read", "write", "with", "option", "iterate", "to", "from", "profile", "pageRank", "peerPressure", "program", "none"));
+ private static Set<String> NO_GRAPH = new HashSet<>(Arrays.asList("asAdmin", "by", "read", "write", "with", "option", "iterate", "to", "from", "profile", "pageRank", "connectedComponent", "peerPressure", "program", "none"));
private static Set<String> NO_ANONYMOUS = new HashSet<>(Arrays.asList("start", "__"));
private static Set<String> IGNORES_BYTECODE = new HashSet<>(Arrays.asList("asAdmin", "read", "write", "iterate", "mapValues", "mapKeys"));
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
----------------------------------------------------------------------
diff --git a/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs b/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
index 361b246..e2fa3e1 100644
--- a/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
+++ b/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/GraphTraversal.cs
@@ -401,6 +401,15 @@ namespace Gremlin.Net.Process.Traversal
}
/// <summary>
+ /// Adds the connectedComponent step to this <see cref="GraphTraversal{SType, EType}" />.
+ /// </summary>
+ public GraphTraversal<S, E> ConnectedComponent ()
+ {
+ Bytecode.AddStep("connectedComponent");
+ return Wrap<S, E>(this);
+ }
+
+ /// <summary>
/// Adds the constant step to this <see cref="GraphTraversal{SType, EType}" />.
/// </summary>
public GraphTraversal<S, E2> Constant<E2> (E2 e)
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
----------------------------------------------------------------------
diff --git a/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js b/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
index 4f39fa5..1c8d639 100644
--- a/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
+++ b/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/graph-traversal.js
@@ -353,6 +353,16 @@ class GraphTraversal extends Traversal {
}
/**
+ * Graph traversal connectedComponent method.
+ * @param {...Object} args
+ * @returns {GraphTraversal}
+ */
+ connectedComponent(...args) {
+ this.bytecode.addStep('connectedComponent', args);
+ return this;
+ }
+
+ /**
* Graph traversal constant method.
* @param {...Object} args
* @returns {GraphTraversal}
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
----------------------------------------------------------------------
diff --git a/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py b/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
index f8b8285..b073aa8 100644
--- a/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
+++ b/gremlin-python/src/main/jython/gremlin_python/process/graph_traversal.py
@@ -194,6 +194,10 @@ class GraphTraversal(Traversal):
self.bytecode.add_step("coin", *args)
return self
+ def connectedComponent(self, *args):
+ self.bytecode.add_step("connectedComponent", *args)
+ return self
+
def constant(self, *args):
self.bytecode.add_step("constant", *args)
return self
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
index eab562d..6466ae8 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/ProcessComputerSuite.java
@@ -50,6 +50,7 @@ import org.apache.tinkerpop.gremlin.process.traversal.step.filter.TailTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.filter.WhereTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.AddEdgeTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.CoalesceTest;
+import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConstantTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.CountTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapTest;
@@ -143,6 +144,7 @@ public class ProcessComputerSuite extends AbstractGremlinSuite {
// map
CoalesceTest.Traversals.class,
+ ConnectedComponentTest.Traversals.class,
ConstantTest.Traversals.class,
CountTest.Traversals.class,
FlatMapTest.Traversals.class,
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
new file mode 100644
index 0000000..25e618a
--- /dev/null
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.tinkerpop.gremlin.process.traversal.step.map;
+
+import org.apache.tinkerpop.gremlin.LoadGraphWith;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
+import org.apache.tinkerpop.gremlin.process.AbstractGremlinProcessTest;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent;
+import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.junit.Test;
+
+import static org.apache.tinkerpop.gremlin.LoadGraphWith.GraphData.MODERN;
+import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.bothE;
+import static org.junit.Assert.assertEquals;
+
+/**
+ * @author Stephen Mallette (http://stephen.genoprime.com)
+ */
+public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest {
+
+ public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent();
+
+ public abstract Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent();
+
+ public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+
+ @Test
+ @LoadGraphWith(MODERN)
+ public void g_V_connectedComponent() {
+ final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent();
+ printTraversalForm(traversal);
+ int counter = 0;
+ while (traversal.hasNext()) {
+ final Vertex vertex = traversal.next();
+ counter++;
+ assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+ }
+ assertEquals(6, counter);
+ }
+
+ @Test
+ @LoadGraphWith(MODERN)
+ public void g_V_hasLabelXsoftwareX_connectedComponent() {
+ final Traversal<Vertex, Vertex> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent();
+ printTraversalForm(traversal);
+ int counter = 0;
+ while (traversal.hasNext()) {
+ final Vertex vertex = traversal.next();
+ final String name = vertex.value("name");
+ switch (name) {
+ case "lop":
+ case "ripple":
+ assertEquals("3", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+ break;
+ }
+ counter++;
+ }
+ assertEquals(2, counter);
+ }
+
+ @Test
+ @LoadGraphWith(MODERN)
+ public void g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX() {
+ final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+ printTraversalForm(traversal);
+ int counter = 0;
+ while (traversal.hasNext()) {
+ final Vertex vertex = traversal.next();
+ final String name = vertex.value("name");
+ switch (name) {
+ case "marko":
+ case "vadas":
+ case "josh":
+ assertEquals("1", vertex.value("cluster"));
+ break;
+ case "peter":
+ assertEquals("6", vertex.value("cluster"));
+ break;
+ }
+ counter++;
+ }
+ assertEquals(4, counter);
+ }
+
+ public static class Traversals extends ConnectedComponentTest {
+ @Override
+ public Traversal<Vertex, Vertex> get_g_V_connectedComponent() {
+ return g.V().connectedComponent();
+ }
+
+ @Override
+ public Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent() {
+ return g.V().hasLabel("software").connectedComponent();
+ }
+ @Override
+ public Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX() {
+ return g.V().hasLabel("person").connectedComponent().with(ConnectedComponent.edges, bothE("knows")).with(ConnectedComponent.propertyName, "cluster");
+ }
+ }
+}
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0acceb90/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
index c778c6d..2f727c8 100644
--- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkHadoopGraphProvider.java
@@ -39,6 +39,7 @@ import org.apache.tinkerpop.gremlin.process.computer.Computer;
import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
import org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+import org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.PageRankTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.PeerPressureTest;
import org.apache.tinkerpop.gremlin.process.traversal.step.map.ProgramTest;
@@ -115,6 +116,7 @@ public class SparkHadoopGraphProvider extends AbstractFileGraphProvider {
if (null != loadGraphWith &&
!test.equals(ProgramTest.Traversals.class) &&
!test.equals(PageRankTest.Traversals.class) &&
+ !test.equals(ConnectedComponentTest.Traversals.class) &&
!test.equals(PeerPressureTest.Traversals.class) &&
!test.equals(FileSystemStorageCheck.class) &&
!testMethodName.equals("shouldSupportJobChaining") && // GraphComputerTest.shouldSupportJobChaining
[10/12] tinkerpop git commit: TINKERPOP-1967 Added a component field
to the ConnectedComponent class
Posted by sp...@apache.org.
TINKERPOP-1967 Added a component field to the ConnectedComponent class
In this way the user can access the default more readily and is available in GLVs
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/8954c271
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/8954c271
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/8954c271
Branch: refs/heads/master
Commit: 8954c2713d6b2b2736b2892c03b0fe6dbe6bf018
Parents: 16231d6
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 08:29:13 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
docs/src/recipes/connected-components.asciidoc | 5 ++++-
.../computer/traversal/step/map/ConnectedComponent.java | 7 +++++++
2 files changed, 11 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/8954c271/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index e6d0f7a..c46180f 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -64,10 +64,13 @@ The traversal looks like:
[gremlin-groovy,existing]
----
g.withComputer().V().connectedComponent().
- group().by('gremlin.connectedComponentVertexProgram.component').
+ group().by(component).
select(values).unfold()
----
+NOTE: The `component` option passed to `by()` is statically imported from `ConnectedComponent` and refers to the
+default property key within which the result of the algorithm is stored.
+
A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
[gremlin-groovy,existing]
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/8954c271/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
index 85558bc..a2223d8 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponent.java
@@ -18,6 +18,7 @@
*/
package org.apache.tinkerpop.gremlin.process.computer.traversal.step.map;
+import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.structure.Graph;
@@ -26,6 +27,12 @@ import org.apache.tinkerpop.gremlin.structure.Graph;
* {@link GraphTraversal#connectedComponent()} ()}.
*/
public class ConnectedComponent {
+
+ /**
+ * The default property key name that will hold the result of the algorithm.
+ */
+ public static final String component = ConnectedComponentVertexProgram.COMPONENT;
+
/**
* Configures the edge to traverse when calculating the pagerank.
*/
[02/12] tinkerpop git commit: TINKERPOP-1967 Removed some dead
comments/code
Posted by sp...@apache.org.
TINKERPOP-1967 Removed some dead comments/code
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/cbae0a30
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/cbae0a30
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/cbae0a30
Branch: refs/heads/master
Commit: cbae0a30093303aeab1257c8ff344abd62f806cb
Parents: eb6a4de
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Aug 3 06:51:55 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
.../clustering/connected/ConnectedComponentVertexProgram.java | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/cbae0a30/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
index 82907eb..acc8f3c 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
@@ -62,7 +62,6 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
public static final String COMPONENT = "gremlin.connectedComponentVertexProgram.component";
private static final String PROPERTY = "gremlin.connectedComponentVertexProgram.property";
private static final String EDGE_TRAVERSAL = "gremlin.pageRankVertexProgram.edgeTraversal";
- private static final String HAS_HALTED = "gremlin.connectedComponentVertexProgram.hasHalted";
private static final String VOTE_TO_HALT = "gremlin.connectedComponentVertexProgram.voteToHalt";
private static final Set<MemoryComputeKey> MEMORY_COMPUTE_KEYS = Collections.singleton(MemoryComputeKey.of(VOTE_TO_HALT, Operator.and, false, true));
@@ -122,11 +121,7 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
// for evaluation
vertex.property(VertexProperty.Cardinality.single, property, vertex.id().toString());
- // no need to send messages from traversers that were filtered - happens for stuff like
- // g.V().hasLabel('software').connectedComponent() (i.e. when there is a TraversalVertexProgram in the mix
- // halting traversers. only want to send messages from traversers that are still hanging about after
- // the filter. the unfiltered vertices can only react to messages sent to them. of course, this can
- // lead to weirdness in results.
+ // vertices that have no edges remain in their own component - nothing to message pass here
if (vertex.edges(Direction.BOTH).hasNext()) {
// since there was message passing we don't want to halt on the first round. this should only trigger
// a single pass finish if the graph is completely disconnected (technically, it won't even really
[06/12] tinkerpop git commit: Extended the connected-components recipe
Posted by sp...@apache.org.
Extended the connected-components recipe
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/a0119640
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/a0119640
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/a0119640
Branch: refs/heads/master
Commit: a011964030f36c87b1ecb22eefb1e7deeffed3b9
Parents: 002560a
Author: HadoopMarc <vt...@xs4all.nl>
Authored: Sun Jun 10 15:17:17 2018 +0200
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
docs/src/recipes/connected-components.asciidoc | 94 ++++++++++++--------
docs/static/images/cc-scale-ratio.png | Bin 0 -> 14393 bytes
docs/static/images/cc-scale-size.png | Bin 0 -> 12220 bytes
3 files changed, 58 insertions(+), 36 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/a0119640/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index edbeec5..850c31f 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -31,11 +31,11 @@ Depending on the size of the graph, three solution regimes can be discriminated:
1. Small graphs that fit in the memory of a single machine
-2. Medium graphs backed by storage for which a linear scan is still feasible. This regime is left to third party
+2. Medium-sized graphs backed by storage for which an OLTP linear scan is still feasible. This regime is left to third party
TinkerPop implementations, since TinkerPop itself has no storage-backed reference implementations. The idea is that
component membership is stored in the graph, rather than in memory.
-3. Large graphs requiring an OLAP approach to yield results in a reasonable time.
+3. Large graphs requiring an approach with `HadoopGraph` and `SparkGraphComputer` to yield results in a reasonable time.
These regimes are discussed separately using the following graph with three weakly connected components:
@@ -55,16 +55,21 @@ g.addV().property(id, "A").as("a").
addE("link").from("d").to("e").iterate()
----
+==== Small graph traversals
-===== Small graphs
-
-Connected components in a small graph can be determined with both an OLTP traversal and the OLAP
+Connected components in a small graph can be determined with either an OLTP traversal or the OLAP
`connectedComponent()`-step. The `connectedComponent()`-step is available as of TinkerPop 3.4.0 and is
described in more detail in the
link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+The traversal looks like:
+[gremlin-groovy,existing]
+----
+g.withComputer().V().connectedComponent().
+ group().by('gremlin.connectedComponentVertexProgram.component').
+ select(values).unfold()
+----
A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
-
[gremlin-groovy,existing]
----
g.V().emit(cyclicPath().or().not(both())). <1>
@@ -73,7 +78,6 @@ g.V().emit(cyclicPath().or().not(both())). <1
by(path().unfold().dedup().fold()). <4>
select(values).unfold() <5>
----
-
<1> The initial emit() step allows for output of isolated vertices, in addition to the discovery of
components as described in (2).
@@ -83,7 +87,7 @@ path. Collection `'a'` is used to keep track of visited vertices, for both subt
and new traversals resulting from the `g.V()` linear scan.
<3> While `'a'` nicely keeps track of vertices already visited, the actual components need to be extracted from the
-path information of surviving traversers. The `path().unfold().limit(1)` closure provides the starting vertex
+path information. The `path().unfold().limit(1)` closure provides the starting vertex
of surviving traversers, which can be used to group the components.
<4> This clause collects the unique vertices from all paths with the same starting vertex, thus from the same
@@ -91,39 +95,57 @@ weak component.
<5> The values of the groupby map contain the lists of vertices making up the requested components.
-This algorithm completes in linear time with the number of vertices and edges, because a traversal is started for each
-vertex and each edge with its associated out-vertex is visited exactly once.
-
-==== Large graphs
-Large graphs require an OLAP solution with a custom VertexProgram that can be run using a graph implementation's
-GraphComputer, in particular `SparkGraphComputer` on a `HadoopGraph`. The OLAP solution also runs faster for most
-medium-sized graphs, that is when these graph have a 'natural' structure with a limited maximum path length.
+==== Small graph scalability
-The TinkerPop library of vertex programs contains the `WeakComponentsVertexProgram` which can be run in the same
-way as the link:http://tinkerpop.apache.org/docs/x.y.z/reference/#peerpressurevertexprogram[PeerPressureVertexProgram]:
+The scalability of the OLTP traversal and the `connectedComponent()`-step for in-memory graphs is shown in the figures
+below.
-[gremlin-groovy,existing]
-----
-result = g.getGraph().compute().
- program(WeakComponentsVertexProgram.build().maxIterations(100).create()).
- mapReduce(ClusterPopulationMapReduce.build().create()).
- mapReduce(ClusterCountMapReduce.build().create()).
- submit().get()
-result.memory().clusterPopulation
-gResult = result.graph().traversal()
-gResult.V().valueMap(true)
-----
+[[cc-scale-size]]
+.Run times for finding connected components in a randomly generated graph with 10 components of equal size and with an edge/vertex ratio of 6
+image::cc-scale-size.png[width=600, side=bottom]
-The vertex program has interconnected vertices exchange id's and store the lowest id until no vertex receives a
-lower id. This algorithm is commonly applied in
+In general, the `connectedComponent()`-step is almost a factor two faster than the OLTP traversal. Only, for very
+small graphs the overhead of running the ConnectedComponentVertexProgram is larger than that of the OLTP traversal.
+The vertex program works by having interconnected vertices exchange id's and store the lowest id until no vertex
+receives a lower id. This algorithm is commonly applied in
link:https://en.wikipedia.org/wiki/Bulk_synchronous_parallel[bulk synchronous parallel] systems, e.g. in
-link:https://spark.apache.org/graphx[Apache Spark GraphX].
+link:https://spark.apache.org/graphx[Apache Spark GraphX]. Overhead for the vertex program arises because it has to run
+as many cycles as the largest length of the shortest paths between any two vertices in a component of the graph. In
+every cycle each vertex has to be checked for being
+"halted". Overhead of the OLTP traversal consists of each traverser having to carry complete path information. For
+pure depth-first-search or breadth-first-search implementations, connected-component algotithms should scale
+as [.big]##O##(V+E). For the traversals in the figure above this is almost the case.
+
+
+[[cc-scale-ratio]]
+.Run times for finding connected components in a randomly generated graph with 10 components, each consisting of 6400 vertices
+image::cc-scale-ratio.png[width=600]
+
+The random graphs used for the scalability tests can be modulated with the edge/vertex ratio. For small ratios the
+components generated are more lint-like and harder to process by the `connectedComponent()`-step. For high ratios
+the components are more mesh-like and the ConnectedComponentVertexProgram needs few cycles to process the graph. These
+characteristics show clearly from the graph. Indeed, for a given number of vertices, the run time of the
+`connectedComponent()`-step does not depend on the number of edges, but rather on the maximum shortest path length in
+the graph.
+
+
+==== Large graphs
+
+Large graphs in TinkerPop require distributed processing by `SparkGraphComputer` to get results in a reasonable time (OLAP
+approach). This means that the graph must be available as `HadoopGraph` (third party TinkerPop implementations often
+allow to make a graph available as an `HadoopGraph` by providing an Hadoop `InputFormat`). Running the
+`connectedComponent()`-step on
+an `HadoopGraph` works the same as for a small graph, provided that `SparkGraphComputer` is specified as the graph computer,
+either with the `gremlin.hadoop.defaultGraphComputer` property or as part of the `withComputer()`-step.
+
+Scalability of the the `connectedComponent()`-step with `SparkGraphComputer` is high, but note that:
+
+* the graph should fit in the memory of the Spark cluster to allow the VertexProgram to run its cycles without spilling
+intermediate results to disk and loosing most of the gains from the distributed processing
-==== Scalability
+* as discussed for small graphs, the BSP algorithm does not play well with graphs having a large shortest path between
+any pair of vertices. Overcoming this limitation is still a
+link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[subject of academic research].
-ToDo:
- - limits and run time regime 1
- - test of friendster graph regime 3
- - discuss: link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[http://www.vldb.org/pvldb/vol7/p1821-yan.pdf]
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/a0119640/docs/static/images/cc-scale-ratio.png
----------------------------------------------------------------------
diff --git a/docs/static/images/cc-scale-ratio.png b/docs/static/images/cc-scale-ratio.png
new file mode 100644
index 0000000..33a842d
Binary files /dev/null and b/docs/static/images/cc-scale-ratio.png differ
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/a0119640/docs/static/images/cc-scale-size.png
----------------------------------------------------------------------
diff --git a/docs/static/images/cc-scale-size.png b/docs/static/images/cc-scale-size.png
new file mode 100644
index 0000000..2b08a89
Binary files /dev/null and b/docs/static/images/cc-scale-size.png differ
[12/12] tinkerpop git commit: TINKERPOP-1967 Added GLV tests for
connectedComponent()
Posted by sp...@apache.org.
TINKERPOP-1967 Added GLV tests for connectedComponent()
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/0c2fc21f
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/0c2fc21f
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/0c2fc21f
Branch: refs/heads/master
Commit: 0c2fc21fb9534060aa8758406e5e38d553ea12be
Parents: 2cc2aed
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu Aug 9 12:29:40 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 12:29:40 2018 -0400
----------------------------------------------------------------------
.../test/cucumber/feature-steps.js | 4 ++
.../features/map/ConnectedComponent.feature | 76 ++++++++++++++++++++
.../step/map/ConnectedComponentTest.java | 4 +-
3 files changed, 82 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0c2fc21f/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
----------------------------------------------------------------------
diff --git a/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js b/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
index 96eeaaf..a0ae7ca 100644
--- a/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
+++ b/gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/feature-steps.js
@@ -63,6 +63,10 @@ const ignoreReason = {
const ignoredScenarios = {
// An associative array containing the scenario name as key, for example:
+ 'g_V_connectedComponent_hasXcomponentX': new IgnoreError(ignoreReason.computerNotSupported),
+ 'g_V_dedup_connectedComponent_hasXcomponentX': new IgnoreError(ignoreReason.computerNotSupported),
+ 'g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX': new IgnoreError(ignoreReason.computerNotSupported),
+ 'g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX_project_byXnameX_byXclusterX': new IgnoreError(ignoreReason.computerNotSupported),
'g_V_pageRank_hasXpageRankX': new IgnoreError(ignoreReason.computerNotSupported),
'g_V_outXcreatedX_pageRank_byXbothEX_byXprojectRankX_timesX0X_valueMapXname_projectRankX': new IgnoreError(ignoreReason.computerNotSupported),
'g_V_pageRank_order_byXpageRank_decrX_byXnameX_name': new IgnoreError(ignoreReason.computerNotSupported),
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0c2fc21f/gremlin-test/features/map/ConnectedComponent.feature
----------------------------------------------------------------------
diff --git a/gremlin-test/features/map/ConnectedComponent.feature b/gremlin-test/features/map/ConnectedComponent.feature
new file mode 100644
index 0000000..0f95488
--- /dev/null
+++ b/gremlin-test/features/map/ConnectedComponent.feature
@@ -0,0 +1,76 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+Feature: Step - connectedComponent()
+
+ Scenario: g_V_connectedComponent_hasXcomponentX
+ Given the modern graph
+ And the traversal of
+ """
+ g.withComputer().V().connectedComponent().has("gremlin.connectedComponentVertexProgram.component")
+ """
+ When iterated to list
+ Then the result should be unordered
+ | result |
+ | v[marko] |
+ | v[vadas] |
+ | v[lop] |
+ | v[josh] |
+ | v[ripple] |
+ | v[peter] |
+
+ Scenario: g_V_dedup_connectedComponent_hasXcomponentX
+ Given the modern graph
+ And the traversal of
+ """
+ g.withComputer().V().dedup().connectedComponent().has("gremlin.connectedComponentVertexProgram.component")
+ """
+ When iterated to list
+ Then the result should be unordered
+ | result |
+ | v[marko] |
+ | v[vadas] |
+ | v[lop] |
+ | v[josh] |
+ | v[ripple] |
+ | v[peter] |
+
+ Scenario: g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX
+ Given the modern graph
+ And the traversal of
+ """
+ g.withComputer().V().hasLabel("software").connectedComponent().project("name","component").by("name").by("gremlin.connectedComponentVertexProgram.component")
+ """
+ When iterated to list
+ Then the result should be unordered
+ | result |
+ | m[{"name": "lop", "component": "1"}] |
+ | m[{"name": "ripple", "component": "1"}] |
+
+ Scenario: g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX_project_byXnameX_byXclusterX
+ Given the modern graph
+ And the traversal of
+ """
+ g.withComputer().V().hasLabel("person").connectedComponent().with("~tinkerpop.connectedComponent.edges", __.bothE("knows")).with("~tinkerpop.connectedComponent.propertyName", "cluster").project("name","cluster").by("name").by("cluster")
+ """
+ When iterated to list
+ Then the result should be unordered
+ | result |
+ | m[{"name": "marko", "cluster": "1"}] |
+ | m[{"name": "vadas", "cluster": "1"}] |
+ | m[{"name": "josh", "cluster": "1"}] |
+ | m[{"name": "peter", "cluster": "6"}] |
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0c2fc21f/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index a542bed..bf0d047 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -108,12 +108,12 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
public static class Traversals extends ConnectedComponentTest {
@Override
public Traversal<Vertex, Vertex> get_g_V_connectedComponent_hasXcomponentX() {
- return g.V().connectedComponent();
+ return g.V().connectedComponent().has(ConnectedComponent.component);
}
@Override
public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent_hasXcomponentX() {
- return g.V().dedup().connectedComponent();
+ return g.V().dedup().connectedComponent().has(ConnectedComponent.component);
}
@Override
[08/12] tinkerpop git commit: TINKERPOP-1967 Added
ConnectedComponentVertexProgram to core javadoc
Posted by sp...@apache.org.
TINKERPOP-1967 Added ConnectedComponentVertexProgram to core javadoc
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/eb6a4de6
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/eb6a4de6
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/eb6a4de6
Branch: refs/heads/master
Commit: eb6a4de63ac62181d8237c0ad7ec5b3aa072e0b7
Parents: 0eb6cb1
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu Aug 2 08:20:17 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
pom.xml | 3 +++
1 file changed, 3 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/eb6a4de6/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index 4badaa1..ea42197 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1262,6 +1262,9 @@ limitations under the License.
<include>org/apache/tinkerpop/gremlin/process/computer/*.java
</include>
<include>
+ org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
+ </include>
+ <include>
org/apache/tinkerpop/gremlin/process/computer/clustering/peerpressure/PeerPressureVertexProgram.java
</include>
<include>
[04/12] tinkerpop git commit: TINKERPOP-1967 Referenced
TINKERPOP-1976 in OptOut for connectedComponent on RemoteGraph
Posted by sp...@apache.org.
TINKERPOP-1967 Referenced TINKERPOP-1976 in OptOut for connectedComponent on RemoteGraph
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/dd7fc261
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/dd7fc261
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/dd7fc261
Branch: refs/heads/master
Commit: dd7fc261fde9837bde954962b4e7777a57a01327
Parents: b2cb187
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 11:22:25 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
.../org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/dd7fc261/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
index 9bc1771..bf59651 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/remote/RemoteGraph.java
@@ -50,7 +50,7 @@ import java.util.Iterator;
@Graph.OptOut(
test = "org.apache.tinkerpop.gremlin.process.traversal.step.map.ConnectedComponentTest",
method = "*",
- reason = "hmmmm")
+ reason = "https://issues.apache.org/jira/browse/TINKERPOP-1976")
@Graph.OptOut(
test = "org.apache.tinkerpop.gremlin.process.traversal.CoreTraversalTest",
method = "*",
[09/12] tinkerpop git commit: TINKERPOP-1967 Added a test for
regressions on halted traversers
Posted by sp...@apache.org.
TINKERPOP-1967 Added a test for regressions on halted traversers
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/0eb6cb1a
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/0eb6cb1a
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/0eb6cb1a
Branch: refs/heads/master
Commit: 0eb6cb1a0d7df0790cb5c2f82a792cd80bef6f3f
Parents: dd7fc26
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 14:13:33 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
.../step/map/ConnectedComponentTest.java | 21 ++++++++++++++++++++
1 file changed, 21 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/0eb6cb1a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index 8b1904f..3133382 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -39,6 +39,8 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
public abstract Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent();
+ public abstract Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent();
+
public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
@Test
@@ -57,6 +59,20 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
@Test
@LoadGraphWith(MODERN)
+ public void g_V_dedup_connectedComponent() {
+ final Traversal<Vertex, Vertex> traversal = get_g_V_dedup_connectedComponent();
+ printTraversalForm(traversal);
+ int counter = 0;
+ while (traversal.hasNext()) {
+ final Vertex vertex = traversal.next();
+ counter++;
+ assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+ }
+ assertEquals(6, counter);
+ }
+
+ @Test
+ @LoadGraphWith(MODERN)
public void g_V_hasLabelXsoftwareX_connectedComponent() {
final Traversal<Vertex, Vertex> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent();
printTraversalForm(traversal);
@@ -106,6 +122,11 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
}
@Override
+ public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent() {
+ return g.V().dedup().connectedComponent();
+ }
+
+ @Override
public Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent() {
return g.V().hasLabel("software").connectedComponent();
}
[07/12] tinkerpop git commit: TINKERPOP-1967 Minor text cleanup for
connectedComponent() docs
Posted by sp...@apache.org.
TINKERPOP-1967 Minor text cleanup for connectedComponent() docs
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/16231d6e
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/16231d6e
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/16231d6e
Branch: refs/heads/master
Commit: 16231d6ed9b6c5f00a6f28da6f03a96d664fcf99
Parents: a011964
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Fri Jun 15 08:24:22 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
docs/src/recipes/connected-components.asciidoc | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/16231d6e/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index 850c31f..e6d0f7a 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -37,7 +37,6 @@ component membership is stored in the graph, rather than in memory.
3. Large graphs requiring an approach with `HadoopGraph` and `SparkGraphComputer` to yield results in a reasonable time.
-
These regimes are discussed separately using the following graph with three weakly connected components:
image:connected-components.png[width=600]
@@ -70,6 +69,7 @@ g.withComputer().V().connectedComponent().
----
A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
+
[gremlin-groovy,existing]
----
g.V().emit(cyclicPath().or().not(both())). <1>
@@ -95,8 +95,6 @@ weak component.
<5> The values of the groupby map contain the lists of vertices making up the requested components.
-
-
==== Small graph scalability
The scalability of the OLTP traversal and the `connectedComponent()`-step for in-memory graphs is shown in the figures
@@ -118,7 +116,6 @@ every cycle each vertex has to be checked for being
pure depth-first-search or breadth-first-search implementations, connected-component algotithms should scale
as [.big]##O##(V+E). For the traversals in the figure above this is almost the case.
-
[[cc-scale-ratio]]
.Run times for finding connected components in a randomly generated graph with 10 components, each consisting of 6400 vertices
image::cc-scale-ratio.png[width=600]
@@ -130,7 +127,6 @@ characteristics show clearly from the graph. Indeed, for a given number of verti
`connectedComponent()`-step does not depend on the number of edges, but rather on the maximum shortest path length in
the graph.
-
==== Large graphs
Large graphs in TinkerPop require distributed processing by `SparkGraphComputer` to get results in a reasonable time (OLAP
@@ -142,10 +138,8 @@ either with the `gremlin.hadoop.defaultGraphComputer` property or as part of the
Scalability of the the `connectedComponent()`-step with `SparkGraphComputer` is high, but note that:
-* the graph should fit in the memory of the Spark cluster to allow the VertexProgram to run its cycles without spilling
-intermediate results to disk and loosing most of the gains from the distributed processing
-
-* as discussed for small graphs, the BSP algorithm does not play well with graphs having a large shortest path between
+* The graph should fit in the memory of the Spark cluster to allow the VertexProgram to run its cycles without spilling
+intermediate results to disk and loosing most of the gains from the distributed processing.
+* As discussed for small graphs, the BSP algorithm does not play well with graphs having a large shortest path between
any pair of vertices. Overcoming this limitation is still a
-link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[subject of academic research].
-
+link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[subject of academic research].
\ No newline at end of file
[05/12] tinkerpop git commit: Merged vtslab recipe for connected
components
Posted by sp...@apache.org.
Merged vtslab recipe for connected components
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/002560a5
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/002560a5
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/002560a5
Branch: refs/heads/master
Commit: 002560a5d9584119d5a5af7f874d110583390741
Parents: 0acceb9
Author: HadoopMarc <vt...@xs4all.nl>
Authored: Mon May 21 14:03:54 2018 +0200
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
docs/src/recipes/connected-components.asciidoc | 123 +++++++++++++-------
1 file changed, 83 insertions(+), 40 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/002560a5/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc
index 70abdbd..edbeec5 100644
--- a/docs/src/recipes/connected-components.asciidoc
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -14,17 +14,31 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
+
+// @author Daniel Kuppitz (anwer on gremlin user list)
+// @author Robert Dale (answer on gremlin user list)
+// @author Marc de Lignie
+
[[connected-components]]
== Connected Components
Gremlin can be used to find link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected components]
-in a graph. As of TinkerPop 3.4.0, the process has been simplified to the `connectedComponent()`-step which is
-described in more detail in the link:
-link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+in a graph. In a directed graph like in TinkerPop, components can be weakly or strongly connected. This recipe is
+restricted to finding link:https://en.wikipedia.org/wiki/Directed_graph#Directed_graph_connectivity[weakly
+connected components], in which the direction of edges is not taken into account.
+
+Depending on the size of the graph, three solution regimes can be discriminated:
+
+1. Small graphs that fit in the memory of a single machine
+
+2. Medium graphs backed by storage for which a linear scan is still feasible. This regime is left to third party
+TinkerPop implementations, since TinkerPop itself has no storage-backed reference implementations. The idea is that
+component membership is stored in the graph, rather than in memory.
-The `connectedComponent()`-step replaces the original recipe described below from earlier versions of TinkerPop,
-however the algorithm from that old recipe remains interesting for educational purposes and has thus not been removed.
-The original recipe considers the following graph which has three connected components:
+3. Large graphs requiring an OLAP approach to yield results in a reasonable time.
+
+
+These regimes are discussed separately using the following graph with three weakly connected components:
image:connected-components.png[width=600]
@@ -41,46 +55,75 @@ g.addV().property(id, "A").as("a").
addE("link").from("d").to("e").iterate()
----
-One way to detect the various subgraphs would be to do something like this:
+
+===== Small graphs
+
+Connected components in a small graph can be determined with both an OLTP traversal and the OLAP
+`connectedComponent()`-step. The `connectedComponent()`-step is available as of TinkerPop 3.4.0 and is
+described in more detail in the
+link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation].
+
+A straightforward way to detect the various subgraphs with an OLTP traversal is to do this:
[gremlin-groovy,existing]
----
-g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). <1>
- path().aggregate("p"). <2>
- unfold().dedup(). <3>
- map(__.as("v").select("p").unfold(). <4>
- filter(unfold().where(eq("v"))).
- unfold().dedup().order().by(id).fold()).
- dedup() <5>
+g.V().emit(cyclicPath().or().not(both())). <1>
+ repeat(__.where(without('a')).store('a').both()).until(cyclicPath()). <2>
+ group().by(path().unfold().limit(1)). <3>
+ by(path().unfold().dedup().fold()). <4>
+ select(values).unfold() <5>
----
-<1> Iterate all vertices and repeatedly traverse over both incoming and outgoing edges (TinkerPop doesn't support
-unidirectional graphs directly so it must be simulated by ignoring the direction with `both`). Note the use of `emit`
-prior to `repeat` as this allows for return of a single length path.
-<2> Aggregate the `path()` of the emitted vertices to "p". It is within these paths that the list of connected
-components will be identified. Obviously the paths list are duplicative in the sense that they contains different
-paths traveled over the same vertices.
-<3> Unroll the elements in the path list with `unfold` and `dedup`.
-<4> Use the first vertex in each path to filter against the paths stored in "p". When a path is found that has the
-vertex in it, dedup the vertices in the path, order it by the identifier. Each path output from this `map` step
-represents a connected component.
-<5> The connected component list is duplicative given the nature of the paths in "p", but now that the vertices within
-the paths are ordered, a final `dedup` will make the list of connective components unique.
-
-NOTE: This is a nice example of where running smaller pieces of a large Gremlin statement make it easier to see what
-is happening at each step. Consider running this example one line at a time (or perhaps even in a step at a time) to
-see the output at each point.
-
-While the above approach returns results nicely, the traversal doesn't appear to work with OLAP. A less efficient
-approach, but one more suited for OLAP execution looks quite similar but does not use `dedup` as heavily (thus
-`GraphComputer` is forced to analyze far more paths):
+<1> The initial emit() step allows for output of isolated vertices, in addition to the discovery of
+components as described in (2).
+
+<2> The entire component to which the first returned vertex belongs, is visited. To allow for components of any
+structure, a repeat loop is applied that only stops for a particular branch of the component when it detects a cyclic
+path. Collection `'a'` is used to keep track of visited vertices, for both subtraversals within a component
+and new traversals resulting from the `g.V()` linear scan.
+
+<3> While `'a'` nicely keeps track of vertices already visited, the actual components need to be extracted from the
+path information of surviving traversers. The `path().unfold().limit(1)` closure provides the starting vertex
+of surviving traversers, which can be used to group the components.
+
+<4> This clause collects the unique vertices from all paths with the same starting vertex, thus from the same
+weak component.
+
+<5> The values of the groupby map contain the lists of vertices making up the requested components.
+
+This algorithm completes in linear time with the number of vertices and edges, because a traversal is started for each
+vertex and each edge with its associated out-vertex is visited exactly once.
+
+
+==== Large graphs
+
+Large graphs require an OLAP solution with a custom VertexProgram that can be run using a graph implementation's
+GraphComputer, in particular `SparkGraphComputer` on a `HadoopGraph`. The OLAP solution also runs faster for most
+medium-sized graphs, that is when these graph have a 'natural' structure with a limited maximum path length.
+
+The TinkerPop library of vertex programs contains the `WeakComponentsVertexProgram` which can be run in the same
+way as the link:http://tinkerpop.apache.org/docs/x.y.z/reference/#peerpressurevertexprogram[PeerPressureVertexProgram]:
[gremlin-groovy,existing]
----
-g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
- aggregate("p").by(path()).cap("p").unfold().limit(local, 1).
- map(__.as("v").select("p").unfold().
- filter(unfold().where(eq("v"))).
- unfold().dedup().order().by(id).fold()
- ).toSet()
+result = g.getGraph().compute().
+ program(WeakComponentsVertexProgram.build().maxIterations(100).create()).
+ mapReduce(ClusterPopulationMapReduce.build().create()).
+ mapReduce(ClusterCountMapReduce.build().create()).
+ submit().get()
+result.memory().clusterPopulation
+gResult = result.graph().traversal()
+gResult.V().valueMap(true)
----
+
+The vertex program has interconnected vertices exchange id's and store the lowest id until no vertex receives a
+lower id. This algorithm is commonly applied in
+link:https://en.wikipedia.org/wiki/Bulk_synchronous_parallel[bulk synchronous parallel] systems, e.g. in
+link:https://spark.apache.org/graphx[Apache Spark GraphX].
+
+==== Scalability
+
+ToDo:
+ - limits and run time regime 1
+ - test of friendster graph regime 3
+ - discuss: link:http://www.vldb.org/pvldb/vol7/p1821-yan.pdf[http://www.vldb.org/pvldb/vol7/p1821-yan.pdf]
\ No newline at end of file
[11/12] tinkerpop git commit: TINKERPOP-1967 Rewrote
connectedComponent() test to be more GLV friendly
Posted by sp...@apache.org.
TINKERPOP-1967 Rewrote connectedComponent() test to be more GLV friendly
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/2cc2aed5
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/2cc2aed5
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/2cc2aed5
Branch: refs/heads/master
Commit: 2cc2aed50d95f0f943fb012b421d4bcd0761a24d
Parents: cbae0a3
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Thu Aug 9 11:25:55 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 11:25:55 2018 -0400
----------------------------------------------------------------------
.../step/map/ConnectedComponentTest.java | 70 +++++++++-----------
1 file changed, 30 insertions(+), 40 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/2cc2aed5/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index 3133382..a542bed 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -19,13 +19,15 @@
package org.apache.tinkerpop.gremlin.process.traversal.step.map;
import org.apache.tinkerpop.gremlin.LoadGraphWith;
-import org.apache.tinkerpop.gremlin.process.computer.clustering.connected.ConnectedComponentVertexProgram;
import org.apache.tinkerpop.gremlin.process.AbstractGremlinProcessTest;
import org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ConnectedComponent;
import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils;
import org.junit.Test;
+import java.util.Map;
+
import static org.apache.tinkerpop.gremlin.LoadGraphWith.GraphData.MODERN;
import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.bothE;
import static org.junit.Assert.assertEquals;
@@ -35,55 +37,43 @@ import static org.junit.Assert.assertEquals;
*/
public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest {
- public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent();
+ public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_hasXcomponentX();
- public abstract Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent();
+ public abstract Traversal<Vertex, Map<String,Object>> get_g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX();
- public abstract Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent();
+ public abstract Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent_hasXcomponentX();
- public abstract Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+ public abstract Traversal<Vertex, Map<String,Object>> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX_project_byXnameX_byXclusterX();
@Test
@LoadGraphWith(MODERN)
- public void g_V_connectedComponent() {
- final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent();
+ public void g_V_connectedComponent_hasXcomponentX() {
+ final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent_hasXcomponentX();
printTraversalForm(traversal);
- int counter = 0;
- while (traversal.hasNext()) {
- final Vertex vertex = traversal.next();
- counter++;
- assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
- }
- assertEquals(6, counter);
+ assertEquals(6, IteratorUtils.count(traversal));
}
@Test
@LoadGraphWith(MODERN)
- public void g_V_dedup_connectedComponent() {
- final Traversal<Vertex, Vertex> traversal = get_g_V_dedup_connectedComponent();
+ public void g_V_dedup_connectedComponent_hasXcomponentX() {
+ final Traversal<Vertex, Vertex> traversal = get_g_V_dedup_connectedComponent_hasXcomponentX();
printTraversalForm(traversal);
- int counter = 0;
- while (traversal.hasNext()) {
- final Vertex vertex = traversal.next();
- counter++;
- assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
- }
- assertEquals(6, counter);
+ assertEquals(6, IteratorUtils.count(traversal));
}
@Test
@LoadGraphWith(MODERN)
- public void g_V_hasLabelXsoftwareX_connectedComponent() {
- final Traversal<Vertex, Vertex> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent();
+ public void g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX() {
+ final Traversal<Vertex, Map<String,Object>> traversal = get_g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX();
printTraversalForm(traversal);
int counter = 0;
while (traversal.hasNext()) {
- final Vertex vertex = traversal.next();
- final String name = vertex.value("name");
+ final Map<String,Object> m = traversal.next();
+ final String name = m.get("name").toString();
switch (name) {
case "lop":
case "ripple":
- assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+ assertEquals(convertToVertexId("marko").toString(), m.get("component"));
break;
}
counter++;
@@ -93,21 +83,21 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
@Test
@LoadGraphWith(MODERN)
- public void g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX() {
- final Traversal<Vertex, Vertex> traversal = get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX();
+ public void g_V_connectedComponent_withXEDGES_bothEXknowsXX_withXPROPERTY_NAME_clusterX_project_byXnameX_byXclusterX() {
+ final Traversal<Vertex, Map<String,Object>> traversal = get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX_project_byXnameX_byXclusterX();
printTraversalForm(traversal);
int counter = 0;
while (traversal.hasNext()) {
- final Vertex vertex = traversal.next();
- final String name = vertex.value("name");
+ final Map<String,Object> m = traversal.next();
+ final String name = m.get("name").toString();
switch (name) {
case "marko":
case "vadas":
case "josh":
- assertEquals("1", vertex.value("cluster"));
+ assertEquals(convertToVertexId("marko").toString(), m.get("cluster"));
break;
case "peter":
- assertEquals("6", vertex.value("cluster"));
+ assertEquals(convertToVertexId("peter").toString(), m.get("cluster"));
break;
}
counter++;
@@ -117,22 +107,22 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
public static class Traversals extends ConnectedComponentTest {
@Override
- public Traversal<Vertex, Vertex> get_g_V_connectedComponent() {
+ public Traversal<Vertex, Vertex> get_g_V_connectedComponent_hasXcomponentX() {
return g.V().connectedComponent();
}
@Override
- public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent() {
+ public Traversal<Vertex, Vertex> get_g_V_dedup_connectedComponent_hasXcomponentX() {
return g.V().dedup().connectedComponent();
}
@Override
- public Traversal<Vertex, Vertex> get_g_V_hasLabelXsoftwareX_connectedComponent() {
- return g.V().hasLabel("software").connectedComponent();
+ public Traversal<Vertex, Map<String,Object>> get_g_V_hasLabelXsoftwareX_connectedComponent_project_byXnameX_byXcomponentX() {
+ return g.V().hasLabel("software").connectedComponent().project("name","component").by("name").by(ConnectedComponent.component);
}
@Override
- public Traversal<Vertex, Vertex> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX() {
- return g.V().hasLabel("person").connectedComponent().with(ConnectedComponent.edges, bothE("knows")).with(ConnectedComponent.propertyName, "cluster");
+ public Traversal<Vertex, Map<String,Object>> get_g_V_connectedComponent_withXedges_bothEXknowsXX_withXpropertyName_clusterX_project_byXnameX_byXclusterX() {
+ return g.V().hasLabel("person").connectedComponent().with(ConnectedComponent.edges, bothE("knows")).with(ConnectedComponent.propertyName, "cluster").project("name","cluster").by("name").by("cluster");
}
}
}
[03/12] tinkerpop git commit: TINKERPOP-1967 fixed up halted
traversers
Posted by sp...@apache.org.
TINKERPOP-1967 fixed up halted traversers
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/b2cb1874
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/b2cb1874
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/b2cb1874
Branch: refs/heads/master
Commit: b2cb187470794bfca352c8a9c7d0c444d102e46b
Parents: 8954c27
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jul 30 10:51:35 2018 -0400
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Thu Aug 9 10:54:41 2018 -0400
----------------------------------------------------------------------
.../ConnectedComponentVertexProgram.java | 42 +++++++++++++++-----
.../ConnectedComponentVertexProgramStep.java | 26 ++++++++++--
.../step/map/ConnectedComponentTest.java | 2 +-
3 files changed, 57 insertions(+), 13 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/b2cb1874/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
index de718f1..82907eb 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/clustering/connected/ConnectedComponentVertexProgram.java
@@ -32,7 +32,10 @@ import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexPr
import org.apache.tinkerpop.gremlin.process.computer.util.AbstractVertexProgramBuilder;
import org.apache.tinkerpop.gremlin.process.traversal.Operator;
import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.Traverser;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
+import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.IndexedTraverserSet;
+import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.TraverserSet;
import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
import org.apache.tinkerpop.gremlin.structure.Direction;
import org.apache.tinkerpop.gremlin.structure.Edge;
@@ -40,6 +43,8 @@ import org.apache.tinkerpop.gremlin.structure.Graph;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.apache.tinkerpop.gremlin.structure.VertexProperty;
+import java.util.Arrays;
+import java.util.Collection;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
@@ -61,12 +66,14 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
private static final String VOTE_TO_HALT = "gremlin.connectedComponentVertexProgram.voteToHalt";
private static final Set<MemoryComputeKey> MEMORY_COMPUTE_KEYS = Collections.singleton(MemoryComputeKey.of(VOTE_TO_HALT, Operator.and, false, true));
+
private MessageScope.Local<?> scope = MessageScope.Local.of(__::bothE);
private Set<MessageScope> scopes;
private String property = COMPONENT;
- private boolean hasHalted = false;
private PureTraversal<Vertex, Edge> edgeTraversal = null;
private Configuration configuration;
+ private TraverserSet<Vertex> haltedTraversers;
+ private IndexedTraverserSet<Vertex, Vertex> haltedTraversersIndex;
private ConnectedComponentVertexProgram() {}
@@ -85,7 +92,12 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
scopes = new HashSet<>(Collections.singletonList(scope));
this.property = configuration.getString(PROPERTY, COMPONENT);
- this.hasHalted = configuration.getBoolean(HAS_HALTED, false);
+
+ this.haltedTraversers = TraversalVertexProgram.loadHaltedTraversers(configuration);
+ this.haltedTraversersIndex = new IndexedTraverserSet<>(v -> v);
+ for (final Traverser.Admin<Vertex> traverser : this.haltedTraversers) {
+ this.haltedTraversersIndex.add(traverser.split());
+ }
}
@Override
@@ -104,6 +116,8 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
@Override
public void execute(final Vertex vertex, final Messenger<String> messenger, final Memory memory) {
if (memory.isInitialIteration()) {
+ copyHaltedTraversersFromMemory(vertex);
+
// on the first pass, just initialize the component to its own id then pass it to all adjacent vertices
// for evaluation
vertex.property(VertexProperty.Cardinality.single, property, vertex.id().toString());
@@ -113,7 +127,7 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
// halting traversers. only want to send messages from traversers that are still hanging about after
// the filter. the unfiltered vertices can only react to messages sent to them. of course, this can
// lead to weirdness in results.
- if (vertex.edges(Direction.BOTH).hasNext() && !(hasHalted && !vertex.property(TraversalVertexProgram.HALTED_TRAVERSERS).isPresent())) {
+ if (vertex.edges(Direction.BOTH).hasNext()) {
// since there was message passing we don't want to halt on the first round. this should only trigger
// a single pass finish if the graph is completely disconnected (technically, it won't even really
// work in cases where halted traversers come into play
@@ -148,7 +162,9 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
@Override
public Set<VertexComputeKey> getVertexComputeKeys() {
- return new HashSet<>(Collections.singletonList(VertexComputeKey.of(property, false)));
+ return new HashSet<>(Arrays.asList(
+ VertexComputeKey.of(property, false),
+ VertexComputeKey.of(TraversalVertexProgram.HALTED_TRAVERSERS, false)));
}
@Override
@@ -158,6 +174,10 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
@Override
public boolean terminate(final Memory memory) {
+ if (memory.isInitialIteration() && this.haltedTraversersIndex != null) {
+ this.haltedTraversersIndex.clear();
+ }
+
final boolean voteToHalt = memory.<Boolean>get(VOTE_TO_HALT);
if (voteToHalt) {
return true;
@@ -206,6 +226,15 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
};
}
+ private void copyHaltedTraversersFromMemory(final Vertex vertex) {
+ final Collection<Traverser.Admin<Vertex>> traversers = this.haltedTraversersIndex.get(vertex);
+ if (traversers != null) {
+ final TraverserSet<Vertex> newHaltedTraversers = new TraverserSet<>();
+ newHaltedTraversers.addAll(traversers);
+ vertex.property(VertexProperty.Cardinality.single, TraversalVertexProgram.HALTED_TRAVERSERS, newHaltedTraversers);
+ }
+ }
+
public static ConnectedComponentVertexProgram.Builder build() {
return new ConnectedComponentVertexProgram.Builder();
}
@@ -216,11 +245,6 @@ public class ConnectedComponentVertexProgram implements VertexProgram<String> {
super(ConnectedComponentVertexProgram.class);
}
- public ConnectedComponentVertexProgram.Builder hasHalted(final boolean hasHalted) {
- this.configuration.setProperty(HAS_HALTED, hasHalted);
- return this;
- }
-
public ConnectedComponentVertexProgram.Builder edges(final Traversal.Admin<Vertex, Edge> edgeTraversal) {
PureTraversal.storeState(this.configuration, EDGE_TRAVERSAL, edgeTraversal);
return this;
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/b2cb1874/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
index edeb497..c222cfa 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ConnectedComponentVertexProgramStep.java
@@ -29,11 +29,16 @@ import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
import org.apache.tinkerpop.gremlin.process.traversal.step.Configuring;
import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent;
import org.apache.tinkerpop.gremlin.process.traversal.step.util.Parameters;
+import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.TraverserSet;
import org.apache.tinkerpop.gremlin.process.traversal.util.PureTraversal;
import org.apache.tinkerpop.gremlin.structure.Edge;
import org.apache.tinkerpop.gremlin.structure.Graph;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.apache.tinkerpop.gremlin.structure.util.StringFactory;
+import org.apache.tinkerpop.gremlin.util.Serializer;
+
+import java.io.IOException;
+import java.util.Base64;
/**
* @author Stephen Mallette (http://stephen.genoprime.com)
@@ -84,10 +89,25 @@ public final class ConnectedComponentVertexProgramStep extends VertexProgramStep
public ConnectedComponentVertexProgram generateProgram(final Graph graph, final Memory memory) {
final Traversal.Admin<Vertex, Edge> detachedTraversal = this.edgeTraversal.getPure();
detachedTraversal.setStrategies(TraversalStrategies.GlobalCache.getStrategies(graph.getClass()));
- return ConnectedComponentVertexProgram.build().
- hasHalted(memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS)).
+
+ final ConnectedComponentVertexProgram.Builder builder = ConnectedComponentVertexProgram.build().
edges(detachedTraversal).
- property(this.clusterProperty).create(graph);
+ property(this.clusterProperty);
+
+ if (memory.exists(TraversalVertexProgram.HALTED_TRAVERSERS)) {
+ final TraverserSet<?> haltedTraversers = memory.get(TraversalVertexProgram.HALTED_TRAVERSERS);
+ if (!haltedTraversers.isEmpty()) {
+ Object haltedTraversersValue;
+ try {
+ haltedTraversersValue = Base64.getEncoder().encodeToString(Serializer.serializeObject(haltedTraversers));
+ } catch (final IOException ignored) {
+ haltedTraversersValue = haltedTraversers;
+ }
+ builder.configure(TraversalVertexProgram.HALTED_TRAVERSERS, haltedTraversersValue);
+ }
+ }
+
+ return builder.create(graph);
}
@Override
http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/b2cb1874/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
----------------------------------------------------------------------
diff --git a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
index 25e618a..8b1904f 100644
--- a/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
+++ b/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ConnectedComponentTest.java
@@ -67,7 +67,7 @@ public abstract class ConnectedComponentTest extends AbstractGremlinProcessTest
switch (name) {
case "lop":
case "ripple":
- assertEquals("3", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
+ assertEquals("1", vertex.value(ConnectedComponentVertexProgram.COMPONENT));
break;
}
counter++;