You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by dk...@apache.org on 2015/11/04 20:09:03 UTC

[1/9] incubator-tinkerpop git commit: TINKERPOP-911: Lets SparkGraphComputer Set ThreadLocal Properties

Repository: incubator-tinkerpop
Updated Branches:
  refs/heads/TINKERPOP3-904 f2b908333 -> c4fcae6a9


TINKERPOP-911: Lets SparkGraphComputer Set ThreadLocal Properties

When running in persistent context mode, the configuration for the Spark
Context is always inherited from the origin configuration. This can be
overridden for select properties using setLocalProperty. By enabling
this for SparkGraphComputer a user will be able to take advantage of the
FairScheduler system with spark or add custom Spark JobGroupIDs to jobs in the
same JVM.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/82229d0b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/82229d0b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/82229d0b

Branch: refs/heads/TINKERPOP3-904
Commit: 82229d0bef3da7bb125e9b31fc5ef559a8868af0
Parents: aadd422
Author: Russell Spitzer <Ru...@gmail.com>
Authored: Mon Nov 2 15:22:45 2015 -0800
Committer: Russell Spitzer <Ru...@gmail.com>
Committed: Mon Nov 2 18:55:36 2015 -0800

----------------------------------------------------------------------
 .../process/computer/SparkGraphComputer.java    |  32 ++++++
 .../process/computer/LocalPropertyTest.java     | 102 +++++++++++++++++++
 2 files changed, 134 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/82229d0b/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkGraphComputer.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkGraphComputer.java b/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkGraphComputer.java
index b25073b..6425aab 100644
--- a/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkGraphComputer.java
+++ b/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/process/computer/SparkGraphComputer.java
@@ -135,6 +135,7 @@ public final class SparkGraphComputer extends AbstractHadoopGraphComputer {
             JavaSparkContext sparkContext = null;
             try {
                 sparkContext = new JavaSparkContext(SparkContext.getOrCreate(sparkConfiguration));
+                updateLocalConfiguration(sparkContext, sparkConfiguration);
                 // add the project jars to the cluster
                 this.loadJars(sparkContext, hadoopConfiguration);
                 // create a message-passing friendly rdd from the input rdd
@@ -246,6 +247,37 @@ public final class SparkGraphComputer extends AbstractHadoopGraphComputer {
         }
     }
 
+    /**
+     * When using a persistent context the running Context's configuration will override a passed
+     * in configuration. Spark allows us to override these inherited properties via
+     * SparkContext.setLocalProperty
+     */
+    private void updateLocalConfiguration(final JavaSparkContext sparkContext, final SparkConf sparkConfiguration) {
+        /*
+         * While we could enumerate over the entire SparkConfiguration and copy into the Thread
+         * Local properties of the Spark Context this could cause adverse effects with future
+         * versions of Spark. Since the api for setting multiple local properties at once is
+         * restricted as private, we will only set those properties we know can effect SparkGraphComputer
+         * Execution rather than applying the entire configuration.
+         */
+        final String[] validPropertyNames = {
+            "spark.job.description",
+            "spark.jobGroup.id",
+            "spark.job.interruptOnCancel",
+            "spark.scheduler.pool"
+        };
+
+        for (String propertyName: validPropertyNames){
+            if (sparkConfiguration.contains(propertyName)){
+                String propertyValue = sparkConfiguration.get(propertyName);
+                this.logger.info("Setting Thread Local SparkContext Property - "
+                        + propertyName + " : " + propertyValue);
+
+                sparkContext.setLocalProperty(propertyName, sparkConfiguration.get(propertyName));
+            }
+        }
+    }
+
     public static void main(final String[] args) throws Exception {
         final FileConfiguration configuration = new PropertiesConfiguration(args[0]);
         new SparkGraphComputer(HadoopGraph.open(configuration)).program(VertexProgram.createVertexProgram(HadoopGraph.open(configuration), configuration)).submit().get();

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/82229d0b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java
new file mode 100644
index 0000000..57c3526
--- /dev/null
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.spark.process.computer;
+
+import org.apache.commons.configuration.BaseConfiguration;
+import org.apache.commons.configuration.Configuration;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.api.java.JavaSparkStatusTracker;
+import org.apache.tinkerpop.gremlin.hadoop.Constants;
+import org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph;
+import org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat;
+import org.apache.tinkerpop.gremlin.hadoop.structure.util.ConfUtil;
+import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
+import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+import org.apache.tinkerpop.gremlin.process.traversal.engine.ComputerTraversalEngine;
+import org.apache.tinkerpop.gremlin.spark.structure.io.PersistedInputRDD;
+import org.apache.tinkerpop.gremlin.spark.structure.io.PersistedOutputRDD;
+import org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer;
+import org.apache.tinkerpop.gremlin.structure.Graph;
+import org.apache.tinkerpop.gremlin.structure.util.GraphFactory;
+import org.junit.Test;
+
+import java.util.UUID;
+
+import static org.junit.Assert.assertTrue;
+
+public class LocalPropertyTest
+{
+
+
+    @Test
+    public void shouldSetThreadLocalProperties() throws Exception
+    {
+        final String testName = "ThreadLocalProperties";
+        final String rddName = "target/test-output/" + UUID.randomUUID();
+        final Configuration configuration = new BaseConfiguration();
+        configuration.setProperty("spark.master", "local[4]");
+        configuration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName());
+        configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
+        configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
+        configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName());
+        configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD, PersistedOutputRDD.class.getCanonicalName());
+        configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, rddName);
+        configuration.setProperty(Constants.GREMLIN_HADOOP_JARS_IN_DISTRIBUTED_CACHE, false);
+        configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, true);
+        configuration.setProperty("spark.jobGroup.id", "22");
+        Graph graph = GraphFactory.open(configuration);
+        graph.compute(SparkGraphComputer.class)
+                .result(GraphComputer.ResultGraph.NEW)
+                .persist(GraphComputer.Persist.EDGES)
+                .program(TraversalVertexProgram.build()
+                        .traversal(GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(SparkGraphComputer.class)),
+                                "gremlin-groovy",
+                                "g.V()").create(graph)).submit().get();
+        ////////
+        SparkConf sparkConfiguration = new SparkConf();
+        sparkConfiguration.setAppName(testName);
+        ConfUtil.makeHadoopConfiguration(configuration).forEach(entry -> sparkConfiguration.set(entry.getKey(), entry.getValue()));
+        JavaSparkContext sparkContext = new JavaSparkContext(SparkContext.getOrCreate(sparkConfiguration));
+        JavaSparkStatusTracker statusTracker = sparkContext.statusTracker();
+        assertTrue(statusTracker.getJobIdsForGroup("22").length >= 1);
+        assertTrue(PersistedInputRDD.getPersistedRDD(sparkContext, rddName).isPresent());
+        ///////
+        configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_INPUT_RDD, PersistedInputRDD.class.getCanonicalName());
+        configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, rddName);
+        configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD, null);
+        configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, null);
+        configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, false);
+        configuration.setProperty("spark.jobGroup.id", "44");
+        graph = GraphFactory.open(configuration);
+        graph.compute(SparkGraphComputer.class)
+                .result(GraphComputer.ResultGraph.NEW)
+                .persist(GraphComputer.Persist.NOTHING)
+                .program(TraversalVertexProgram.build()
+                        .traversal(GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(SparkGraphComputer.class)),
+                                "gremlin-groovy",
+                                "g.V()").create(graph)).submit().get();
+        ///////
+        assertTrue(statusTracker.getJobIdsForGroup("44").length >= 1);
+    }
+}
+


[7/9] incubator-tinkerpop git commit: fixed LocalPropertyTest to use KryoSerializer instead of GryoSerializer.

Posted by dk...@apache.org.
fixed LocalPropertyTest to use KryoSerializer instead of GryoSerializer.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/2edefa60
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/2edefa60
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/2edefa60

Branch: refs/heads/TINKERPOP3-904
Commit: 2edefa604c30c5f93d51f0f624241e4e80ac0725
Parents: ebec095 caf0e2a
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Wed Nov 4 11:39:13 2015 -0700
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Wed Nov 4 11:39:13 2015 -0700

----------------------------------------------------------------------
 CHANGELOG.asciidoc                              |   1 +
 docs/src/implementations.asciidoc               |   9 ++
 .../process/computer/SparkGraphComputer.java    |  32 ++++++
 .../process/computer/LocalPropertyTest.java     | 100 +++++++++++++++++++
 4 files changed, 142 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/2edefa60/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --cc CHANGELOG.asciidoc
index 31671c0,331ad93..085bed7
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@@ -25,9 -25,7 +25,10 @@@ image::https://raw.githubusercontent.co
  TinkerPop 3.1.0 (NOT OFFICIALLY RELEASED YET)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
 -* Added the ability to set thread local properties in `SparkGraphComputer` when using a persistent context
 +* Integrated `NumberHelper` in `SumStep`, `MinStep`, `MaxStep` and `MeanStep` (local and global step variants).
 +* `CountMatchAlgorithm`, in OLAP, now biases traversal selection towards those traversals that start at the current traverser location to reduce message passing.
 +* Fixed a file stream bug in Hadoop OLTP that showed up if the streamed file was more than 2G of data.
++* Added the ability to set thread local properties in `SparkGraphComputer` when using a persistent context.
  * Bumped to Neo4j 2.3.0.
  * Added `PersistedInputRDD` and `PersistedOutputRDD` which enables `SparkGraphComputer` to store the graph RDD in the context between jobs (no HDFS serialization required).
  * Renamed the `public static String` configuration variable names of TinkerGraph (deprecated old variables).

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/2edefa60/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java
----------------------------------------------------------------------
diff --cc spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java
index 0000000,57c3526..e0fe796
mode 000000,100644..100644
--- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/LocalPropertyTest.java
@@@ -1,0 -1,102 +1,100 @@@
+ /*
+  * Licensed to the Apache Software Foundation (ASF) under one
+  * or more contributor license agreements.  See the NOTICE file
+  * distributed with this work for additional information
+  * regarding copyright ownership.  The ASF licenses this file
+  * to you under the Apache License, Version 2.0 (the
+  * "License"); you may not use this file except in compliance
+  * with the License.  You may obtain a copy of the License at
+  *
+  * http://www.apache.org/licenses/LICENSE-2.0
+  *
+  * Unless required by applicable law or agreed to in writing,
+  * software distributed under the License is distributed on an
+  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  * KIND, either express or implied.  See the License for the
+  * specific language governing permissions and limitations
+  * under the License.
+  */
+ 
+ package org.apache.tinkerpop.gremlin.spark.process.computer;
+ 
+ import org.apache.commons.configuration.BaseConfiguration;
+ import org.apache.commons.configuration.Configuration;
+ import org.apache.spark.SparkConf;
+ import org.apache.spark.SparkContext;
+ import org.apache.spark.api.java.JavaSparkContext;
+ import org.apache.spark.api.java.JavaSparkStatusTracker;
++import org.apache.spark.serializer.KryoSerializer;
+ import org.apache.tinkerpop.gremlin.hadoop.Constants;
+ import org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph;
+ import org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat;
+ import org.apache.tinkerpop.gremlin.hadoop.structure.util.ConfUtil;
+ import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
+ import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
+ import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
+ import org.apache.tinkerpop.gremlin.process.traversal.engine.ComputerTraversalEngine;
+ import org.apache.tinkerpop.gremlin.spark.structure.io.PersistedInputRDD;
+ import org.apache.tinkerpop.gremlin.spark.structure.io.PersistedOutputRDD;
 -import org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer;
+ import org.apache.tinkerpop.gremlin.structure.Graph;
+ import org.apache.tinkerpop.gremlin.structure.util.GraphFactory;
+ import org.junit.Test;
+ 
+ import java.util.UUID;
+ 
+ import static org.junit.Assert.assertTrue;
+ 
 -public class LocalPropertyTest
 -{
++public class LocalPropertyTest {
+ 
+ 
+     @Test
 -    public void shouldSetThreadLocalProperties() throws Exception
 -    {
++    public void shouldSetThreadLocalProperties() throws Exception {
+         final String testName = "ThreadLocalProperties";
+         final String rddName = "target/test-output/" + UUID.randomUUID();
+         final Configuration configuration = new BaseConfiguration();
+         configuration.setProperty("spark.master", "local[4]");
 -        configuration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName());
++        configuration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
+         configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
+         configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
+         configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName());
+         configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD, PersistedOutputRDD.class.getCanonicalName());
+         configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, rddName);
+         configuration.setProperty(Constants.GREMLIN_HADOOP_JARS_IN_DISTRIBUTED_CACHE, false);
+         configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, true);
+         configuration.setProperty("spark.jobGroup.id", "22");
+         Graph graph = GraphFactory.open(configuration);
+         graph.compute(SparkGraphComputer.class)
+                 .result(GraphComputer.ResultGraph.NEW)
+                 .persist(GraphComputer.Persist.EDGES)
+                 .program(TraversalVertexProgram.build()
+                         .traversal(GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(SparkGraphComputer.class)),
+                                 "gremlin-groovy",
+                                 "g.V()").create(graph)).submit().get();
+         ////////
+         SparkConf sparkConfiguration = new SparkConf();
+         sparkConfiguration.setAppName(testName);
+         ConfUtil.makeHadoopConfiguration(configuration).forEach(entry -> sparkConfiguration.set(entry.getKey(), entry.getValue()));
+         JavaSparkContext sparkContext = new JavaSparkContext(SparkContext.getOrCreate(sparkConfiguration));
+         JavaSparkStatusTracker statusTracker = sparkContext.statusTracker();
+         assertTrue(statusTracker.getJobIdsForGroup("22").length >= 1);
+         assertTrue(PersistedInputRDD.getPersistedRDD(sparkContext, rddName).isPresent());
+         ///////
+         configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_INPUT_RDD, PersistedInputRDD.class.getCanonicalName());
+         configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, rddName);
+         configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD, null);
+         configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, null);
+         configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, false);
+         configuration.setProperty("spark.jobGroup.id", "44");
+         graph = GraphFactory.open(configuration);
+         graph.compute(SparkGraphComputer.class)
+                 .result(GraphComputer.ResultGraph.NEW)
+                 .persist(GraphComputer.Persist.NOTHING)
+                 .program(TraversalVertexProgram.build()
+                         .traversal(GraphTraversalSource.build().engine(ComputerTraversalEngine.build().computer(SparkGraphComputer.class)),
+                                 "gremlin-groovy",
+                                 "g.V()").create(graph)).submit().get();
+         ///////
+         assertTrue(statusTracker.getJobIdsForGroup("44").length >= 1);
+     }
+ }
+ 


[6/9] incubator-tinkerpop git commit: relaxed a test condition to use only KryoSerializer as it seems that GryoSerializer is causing problems on certain OSs. CTR.

Posted by dk...@apache.org.
relaxed a test condition to use only KryoSerializer as it seems that GryoSerializer is causing problems on certain OSs. CTR.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/ebec0953
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/ebec0953
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/ebec0953

Branch: refs/heads/TINKERPOP3-904
Commit: ebec0953d86ea748e8b1cacebb921642f19954a5
Parents: b338f96
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Wed Nov 4 09:39:23 2015 -0700
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Wed Nov 4 09:39:23 2015 -0700

----------------------------------------------------------------------
 .../gremlin/spark/structure/io/InputOutputRDDTest.java |  3 ++-
 .../gremlin/spark/structure/io/InputRDDTest.java       |  3 ++-
 .../gremlin/spark/structure/io/OutputRDDTest.java      |  5 +++--
 .../structure/io/PersistedInputOutputRDDTest.java      | 13 ++++++-------
 4 files changed, 13 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/ebec0953/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputOutputRDDTest.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputOutputRDDTest.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputOutputRDDTest.java
index 1feeb61..3691aba 100644
--- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputOutputRDDTest.java
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputOutputRDDTest.java
@@ -20,6 +20,7 @@ package org.apache.tinkerpop.gremlin.spark.structure.io;
 
 import org.apache.commons.configuration.BaseConfiguration;
 import org.apache.commons.configuration.Configuration;
+import org.apache.spark.serializer.KryoSerializer;
 import org.apache.tinkerpop.gremlin.hadoop.Constants;
 import org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph;
 import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
@@ -40,7 +41,7 @@ public class InputOutputRDDTest {
     public void shouldReadFromWriteToArbitraryRDD() throws Exception {
         final Configuration configuration = new BaseConfiguration();
         configuration.setProperty("spark.master", "local[4]");
-        configuration.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
+        configuration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
         configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
         configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_INPUT_RDD, ExampleInputRDD.class.getCanonicalName());
         configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_OUTPUT_RDD, ExampleOutputRDD.class.getCanonicalName());

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/ebec0953/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputRDDTest.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputRDDTest.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputRDDTest.java
index e8bd44a..5ba3b12 100644
--- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputRDDTest.java
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/InputRDDTest.java
@@ -20,6 +20,7 @@ package org.apache.tinkerpop.gremlin.spark.structure.io;
 
 import org.apache.commons.configuration.BaseConfiguration;
 import org.apache.commons.configuration.Configuration;
+import org.apache.spark.serializer.KryoSerializer;
 import org.apache.tinkerpop.gremlin.hadoop.Constants;
 import org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph;
 import org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat;
@@ -40,7 +41,7 @@ public class InputRDDTest {
     public void shouldReadFromArbitraryRDD() {
         final Configuration configuration = new BaseConfiguration();
         configuration.setProperty("spark.master", "local[4]");
-        configuration.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
+        configuration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
         configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
         configuration.setProperty(Constants.GREMLIN_SPARK_GRAPH_INPUT_RDD, ExampleInputRDD.class.getCanonicalName());
         configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_OUTPUT_FORMAT, GryoOutputFormat.class.getCanonicalName());

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/ebec0953/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/OutputRDDTest.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/OutputRDDTest.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/OutputRDDTest.java
index 43dcdbd..10eecb3 100644
--- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/OutputRDDTest.java
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/OutputRDDTest.java
@@ -20,6 +20,7 @@ package org.apache.tinkerpop.gremlin.spark.structure.io;
 
 import org.apache.commons.configuration.BaseConfiguration;
 import org.apache.commons.configuration.Configuration;
+import org.apache.spark.serializer.KryoSerializer;
 import org.apache.tinkerpop.gremlin.hadoop.Constants;
 import org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph;
 import org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat;
@@ -27,8 +28,8 @@ import org.apache.tinkerpop.gremlin.process.computer.GraphComputer;
 import org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram;
 import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
 import org.apache.tinkerpop.gremlin.process.traversal.engine.ComputerTraversalEngine;
-import org.apache.tinkerpop.gremlin.spark.process.computer.SparkHadoopGraphProvider;
 import org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer;
+import org.apache.tinkerpop.gremlin.spark.process.computer.SparkHadoopGraphProvider;
 import org.apache.tinkerpop.gremlin.structure.Graph;
 import org.apache.tinkerpop.gremlin.structure.util.GraphFactory;
 import org.junit.Test;
@@ -42,7 +43,7 @@ public class OutputRDDTest {
     public void shouldWriteToArbitraryRDD() throws Exception {
         final Configuration configuration = new BaseConfiguration();
         configuration.setProperty("spark.master", "local[4]");
-        configuration.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
+        configuration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
         configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
         configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
         configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName());

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/ebec0953/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java
----------------------------------------------------------------------
diff --git a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java
index 6ee7aaa..6aeb864 100644
--- a/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java
+++ b/spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedInputOutputRDDTest.java
@@ -24,6 +24,7 @@ import org.apache.commons.configuration.Configuration;
 import org.apache.spark.SparkConf;
 import org.apache.spark.SparkContext;
 import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.serializer.KryoSerializer;
 import org.apache.tinkerpop.gremlin.hadoop.Constants;
 import org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph;
 import org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat;
@@ -36,9 +37,6 @@ import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSo
 import org.apache.tinkerpop.gremlin.process.traversal.engine.ComputerTraversalEngine;
 import org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer;
 import org.apache.tinkerpop.gremlin.spark.process.computer.SparkHadoopGraphProvider;
-import org.apache.tinkerpop.gremlin.spark.structure.io.PersistedInputRDD;
-import org.apache.tinkerpop.gremlin.spark.structure.io.PersistedOutputRDD;
-import org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer;
 import org.apache.tinkerpop.gremlin.structure.Graph;
 import org.apache.tinkerpop.gremlin.structure.io.IoCore;
 import org.apache.tinkerpop.gremlin.structure.util.GraphFactory;
@@ -55,11 +53,12 @@ import static org.junit.Assert.*;
 public class PersistedInputOutputRDDTest {
 
     @Test
+
     public void shouldNotPersistRDDAcrossJobs() throws Exception {
         final String rddName = "target/test-output/" + UUID.randomUUID();
         final Configuration configuration = new BaseConfiguration();
         configuration.setProperty("spark.master", "local[4]");
-        configuration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName());
+        configuration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
         configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
         configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
         configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName());
@@ -88,7 +87,7 @@ public class PersistedInputOutputRDDTest {
         final String rddName = "target/test-output/" + UUID.randomUUID();
         final Configuration configuration = new BaseConfiguration();
         configuration.setProperty("spark.master", "local[4]");
-        configuration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName());
+        configuration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
         configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
         configuration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
         configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName());
@@ -130,7 +129,7 @@ public class PersistedInputOutputRDDTest {
         final String rddName = "target/test-output/" + UUID.randomUUID().toString();
         final Configuration readConfiguration = new BaseConfiguration();
         readConfiguration.setProperty("spark.master", "local[4]");
-        readConfiguration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName());
+        readConfiguration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
         readConfiguration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
         readConfiguration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName());
         readConfiguration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));
@@ -177,7 +176,7 @@ public class PersistedInputOutputRDDTest {
         final String rddName = "target/test-output/" + UUID.randomUUID().toString();
         final Configuration readConfiguration = new BaseConfiguration();
         readConfiguration.setProperty("spark.master", "local[4]");
-        readConfiguration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName());
+        readConfiguration.setProperty("spark.serializer", KryoSerializer.class.getCanonicalName());
         readConfiguration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
         readConfiguration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_INPUT_FORMAT, GryoInputFormat.class.getCanonicalName());
         readConfiguration.setProperty(Constants.GREMLIN_HADOOP_INPUT_LOCATION, SparkHadoopGraphProvider.PATHS.get("tinkerpop-modern.kryo"));


[8/9] incubator-tinkerpop git commit: added a note for BulkLoader implementers

Posted by dk...@apache.org.
added a note for BulkLoader implementers


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/77fac190
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/77fac190
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/77fac190

Branch: refs/heads/TINKERPOP3-904
Commit: 77fac190dfedd107a38086e201334298e0bd3829
Parents: f2b9083
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Wed Nov 4 20:07:27 2015 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Wed Nov 4 20:07:27 2015 +0100

----------------------------------------------------------------------
 docs/src/the-graphcomputer.asciidoc | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/77fac190/docs/src/the-graphcomputer.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/the-graphcomputer.asciidoc b/docs/src/the-graphcomputer.asciidoc
index 4fd0e8c..af65f07 100644
--- a/docs/src/the-graphcomputer.asciidoc
+++ b/docs/src/the-graphcomputer.asciidoc
@@ -368,6 +368,10 @@ will work for the most use-cases, but has one limitation though: It doesn't supp
 `IncrementalBulkLoader` will handle every property as a single-valued property. A custom `BulkLoader` implementation
 has to be used if the default behavior is not sufficient.
 
+NOTE: A custom `BulkLoader` implementation for incremental loading should use `GraphTraversal` methods to create/update
+elements (e.g. `g.addV()` instead of `graph.addVertex()`). This way the `BulkLoaderVertexProgram` is able to efficiently
+track changes in the underlying graph and can apply several optimization techniques.
+
 [[traversalvertexprogram]]
 TraversalVertexProgram
 ~~~~~~~~~~~~~~~~~~~~~~


[4/9] incubator-tinkerpop git commit: TINKERPOP-911: Documentation for Thread Local Properties for Spark

Posted by dk...@apache.org.
TINKERPOP-911: Documentation for Thread Local Properties for Spark


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/caf0e2a4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/caf0e2a4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/caf0e2a4

Branch: refs/heads/TINKERPOP3-904
Commit: caf0e2a434126efdb882680e8fa0bbd752cde989
Parents: 82229d0
Author: Russell Spitzer <Ru...@gmail.com>
Authored: Tue Nov 3 09:18:14 2015 -0800
Committer: Russell Spitzer <Ru...@gmail.com>
Committed: Tue Nov 3 09:18:14 2015 -0800

----------------------------------------------------------------------
 CHANGELOG.asciidoc                | 1 +
 docs/src/implementations.asciidoc | 9 +++++++++
 2 files changed, 10 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/caf0e2a4/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc
index e79a0a0..331ad93 100644
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@ -25,6 +25,7 @@ image::https://raw.githubusercontent.com/apache/incubator-tinkerpop/master/docs/
 TinkerPop 3.1.0 (NOT OFFICIALLY RELEASED YET)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+* Added the ability to set thread local properties in `SparkGraphComputer` when using a persistent context
 * Bumped to Neo4j 2.3.0.
 * Added `PersistedInputRDD` and `PersistedOutputRDD` which enables `SparkGraphComputer` to store the graph RDD in the context between jobs (no HDFS serialization required).
 * Renamed the `public static String` configuration variable names of TinkerGraph (deprecated old variables).

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/caf0e2a4/docs/src/implementations.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/implementations.asciidoc b/docs/src/implementations.asciidoc
index 7792665..b5c2c4d 100644
--- a/docs/src/implementations.asciidoc
+++ b/docs/src/implementations.asciidoc
@@ -1206,6 +1206,15 @@ Note that `gremlin.spark.persistContext` should be set to `true` or else the per
 The persisted RDD is named by the `gremlin.hadoop.outputLocation` configuration (i.e. named in `SparkContext.getPersistedRDDs()`).
 Finally, `PersistedInputRDD` is used with respective  `gremlin.hadoop.inputLocation` to retrieve the persisted RDD from the `SparkContext`.
 
+When using a persistent `Spark Context` the configuration used by the original Spark Configuration will be inherited by all threaded
+references to that Spark Context. The exception to this rule are those properties which have a specific thread local effect.
+
+.Thread Local Properties
+. spark.jobGroup.id
+. spark.job.description
+. spark.job.interruptOnCancel
+. spark.scheduler.pool
+
 Loading with BulkLoaderVertexProgram
 ++++++++++++++++++++++++++++++++++++
 


[3/9] incubator-tinkerpop git commit: clean up tweaks to MatchStep optimization.

Posted by dk...@apache.org.
clean up tweaks to MatchStep optimization.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/f644fb42
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/f644fb42
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/f644fb42

Branch: refs/heads/TINKERPOP3-904
Commit: f644fb42b9f5e0f9216deb3f5e3088293e3ca13a
Parents: 956d797
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Tue Nov 3 10:05:25 2015 -0700
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Tue Nov 3 10:05:25 2015 -0700

----------------------------------------------------------------------
 CHANGELOG.asciidoc                                  |  2 ++
 .../process/traversal/step/map/MatchStep.java       | 16 +++++++---------
 .../process/traversal/step/map/MatchStepTest.java   |  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/f644fb42/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc
index e79a0a0..e4eb554 100644
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@ -25,6 +25,8 @@ image::https://raw.githubusercontent.com/apache/incubator-tinkerpop/master/docs/
 TinkerPop 3.1.0 (NOT OFFICIALLY RELEASED YET)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+* `CountMatchAlgorithm`, in OLAP, now biases traversal selection towards those traversals that start at the current traverser location to reduce message passing.
+* Fixed a file stream bug in Hadoop OLTP that showed up if the streamed file was more than 2G of data.
 * Bumped to Neo4j 2.3.0.
 * Added `PersistedInputRDD` and `PersistedOutputRDD` which enables `SparkGraphComputer` to store the graph RDD in the context between jobs (no HDFS serialization required).
 * Renamed the `public static String` configuration variable names of TinkerGraph (deprecated old variables).

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/f644fb42/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
index 1ed202f..38da656 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
@@ -67,7 +67,6 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
 
     public enum TraversalType {WHERE_PREDICATE, WHERE_TRAVERSAL, MATCH_TRAVERSAL}
 
-    private TraversalEngine.Type traversalEngineType;
     private List<Traversal.Admin<Object, Object>> matchTraversals = new ArrayList<>();
     private boolean first = true;
     private Set<String> matchStartLabels = new HashSet<>();
@@ -155,12 +154,6 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
         }
     }
 
-    @Override
-    public void onEngine(final TraversalEngine engine) {
-        super.onEngine(engine);
-        this.traversalEngineType = engine.getType();
-    }
-
     public ConnectiveStep.Connective getConnective() {
         return this.connective;
     }
@@ -213,6 +206,8 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
     }
 
     public MatchAlgorithm getMatchAlgorithm() {
+        if(null == this.matchAlgorithm)
+            this.initializeMatchAlgorithm(this.traverserStepIdAndLabelsSetByChild ? TraversalEngine.Type.COMPUTER : TraversalEngine.Type.STANDARD);
         return this.matchAlgorithm;
     }
 
@@ -224,7 +219,6 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
             clone.matchTraversals.add(clone.integrateChild(traversal.clone()));
         }
         if (this.dedups != null) clone.dedups = new HashSet<>();
-        clone.initializeMatchAlgorithm(this.traversalEngineType);
         return clone;
     }
 
@@ -304,7 +298,7 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
             Traverser.Admin traverser = null;
             if (this.first) {
                 this.first = false;
-                this.initializeMatchAlgorithm(this.traversalEngineType);
+                this.initializeMatchAlgorithm(TraversalEngine.Type.STANDARD);
             } else {
                 for (final Traversal.Admin<?, ?> matchTraversal : this.matchTraversals) {
                     if (matchTraversal.hasNext()) {
@@ -339,6 +333,10 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
     @Override
     protected Iterator<Traverser<Map<String, E>>> computerAlgorithm() throws NoSuchElementException {
         while (true) {
+            if (this.first) {
+                this.first = false;
+                this.initializeMatchAlgorithm(TraversalEngine.Type.COMPUTER);
+            }
             final Traverser.Admin traverser = this.starts.next();
             final Path path = traverser.path();
             if (!this.matchStartLabels.stream().filter(path::hasLabel).findAny().isPresent())

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/f644fb42/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
index 529e717..4f7c231 100644
--- a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
+++ b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
@@ -314,7 +314,7 @@ public class MatchStepTest extends StepTest {
 
     @Test
     public void testComputerAwareCountMatchAlgorithm() {
-        // MAKE SURE THE SORT ORDER CHANGES AS MORE RESULTS ARE RETURNED BY ONE OR THE OTHER TRAVERSAL
+        // MAKE SURE OLAP JOBS ARE BIASED TOWARDS STAR GRAPH DATA
         final Consumer doNothing = s -> {
         };
         Traversal.Admin<?, ?> traversal = __.match(


[9/9] incubator-tinkerpop git commit: Merge branch 'master' into TINKERPOP3-904

Posted by dk...@apache.org.
Merge branch 'master' into TINKERPOP3-904


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/c4fcae6a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/c4fcae6a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/c4fcae6a

Branch: refs/heads/TINKERPOP3-904
Commit: c4fcae6a942fa67b457d14ef6ac10d98b732164b
Parents: 77fac19 2edefa6
Author: Daniel Kuppitz <da...@hotmail.com>
Authored: Wed Nov 4 20:08:20 2015 +0100
Committer: Daniel Kuppitz <da...@hotmail.com>
Committed: Wed Nov 4 20:08:20 2015 +0100

----------------------------------------------------------------------
 CHANGELOG.asciidoc                              |   3 +
 docs/src/implementations.asciidoc               |   9 ++
 .../process/traversal/step/map/MatchStep.java   |  46 ++++++---
 .../traversal/step/map/MatchStepTest.java       |  82 ++++++++++++++-
 .../process/computer/SparkGraphComputer.java    |  32 ++++++
 .../process/computer/LocalPropertyTest.java     | 100 +++++++++++++++++++
 .../spark/structure/io/InputOutputRDDTest.java  |   3 +-
 .../spark/structure/io/InputRDDTest.java        |   3 +-
 .../spark/structure/io/OutputRDDTest.java       |   5 +-
 .../io/PersistedInputOutputRDDTest.java         |  13 ++-
 10 files changed, 270 insertions(+), 26 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/c4fcae6a/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --cc CHANGELOG.asciidoc
index 8d74212,085bed7..9f0e18e
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@@ -25,8 -25,10 +25,11 @@@ image::https://raw.githubusercontent.co
  TinkerPop 3.1.0 (NOT OFFICIALLY RELEASED YET)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
 +* Optimized `BulkLoaderVertexProgram`. It now uses `EventStrategy` to monitor what the underlying `BulkLoader` implementation does (e.g. whether it creates a new vertex or returns an existing).
  * Integrated `NumberHelper` in `SumStep`, `MinStep`, `MaxStep` and `MeanStep` (local and global step variants).
+ * `CountMatchAlgorithm`, in OLAP, now biases traversal selection towards those traversals that start at the current traverser location to reduce message passing.
+ * Fixed a file stream bug in Hadoop OLTP that showed up if the streamed file was more than 2G of data.
+ * Added the ability to set thread local properties in `SparkGraphComputer` when using a persistent context.
  * Bumped to Neo4j 2.3.0.
  * Added `PersistedInputRDD` and `PersistedOutputRDD` which enables `SparkGraphComputer` to store the graph RDD in the context between jobs (no HDFS serialization required).
  * Renamed the `public static String` configuration variable names of TinkerGraph (deprecated old variables).


[2/9] incubator-tinkerpop git commit: MatchStep is smart to bias patterns towards the local star graph to reduce inter-machine communication.

Posted by dk...@apache.org.
MatchStep is smart to bias patterns towards the local star graph to reduce inter-machine communication.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/956d7977
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/956d7977
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/956d7977

Branch: refs/heads/TINKERPOP3-904
Commit: 956d7977f4507db8d51b0468604f3b214e0681b3
Parents: aadd422
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Tue Nov 3 09:16:39 2015 -0700
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Tue Nov 3 09:16:39 2015 -0700

----------------------------------------------------------------------
 .../process/traversal/step/map/MatchStep.java   | 48 ++++++++----
 .../traversal/step/map/MatchStepTest.java       | 82 +++++++++++++++++++-
 2 files changed, 115 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/956d7977/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
index 724ab8a..1ed202f 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
@@ -22,6 +22,7 @@ import org.apache.tinkerpop.gremlin.process.traversal.Path;
 import org.apache.tinkerpop.gremlin.process.traversal.Pop;
 import org.apache.tinkerpop.gremlin.process.traversal.Step;
 import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.TraversalEngine;
 import org.apache.tinkerpop.gremlin.process.traversal.Traverser;
 import org.apache.tinkerpop.gremlin.process.traversal.step.Scoping;
 import org.apache.tinkerpop.gremlin.process.traversal.step.TraversalParent;
@@ -64,8 +65,9 @@ import java.util.stream.Stream;
  */
 public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>> implements TraversalParent, Scoping {
 
-    public static enum TraversalType {WHERE_PREDICATE, WHERE_TRAVERSAL, MATCH_TRAVERSAL}
+    public enum TraversalType {WHERE_PREDICATE, WHERE_TRAVERSAL, MATCH_TRAVERSAL}
 
+    private TraversalEngine.Type traversalEngineType;
     private List<Traversal.Admin<Object, Object>> matchTraversals = new ArrayList<>();
     private boolean first = true;
     private Set<String> matchStartLabels = new HashSet<>();
@@ -153,6 +155,12 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
         }
     }
 
+    @Override
+    public void onEngine(final TraversalEngine engine) {
+        super.onEngine(engine);
+        this.traversalEngineType = engine.getType();
+    }
+
     public ConnectiveStep.Connective getConnective() {
         return this.connective;
     }
@@ -216,7 +224,7 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
             clone.matchTraversals.add(clone.integrateChild(traversal.clone()));
         }
         if (this.dedups != null) clone.dedups = new HashSet<>();
-        clone.initializeMatchAlgorithm();
+        clone.initializeMatchAlgorithm(this.traversalEngineType);
         return clone;
     }
 
@@ -281,13 +289,13 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
         return bindings;
     }
 
-    private void initializeMatchAlgorithm() {
+    private void initializeMatchAlgorithm(final TraversalEngine.Type traversalEngineType) {
         try {
             this.matchAlgorithm = this.matchAlgorithmClass.getConstructor().newInstance();
         } catch (final NoSuchMethodException | IllegalAccessException | InvocationTargetException | InstantiationException e) {
             throw new IllegalStateException(e.getMessage(), e);
         }
-        this.matchAlgorithm.initialize(this.matchTraversals);
+        this.matchAlgorithm.initialize(traversalEngineType, this.matchTraversals);
     }
 
     @Override
@@ -296,7 +304,7 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
             Traverser.Admin traverser = null;
             if (this.first) {
                 this.first = false;
-                this.initializeMatchAlgorithm();
+                this.initializeMatchAlgorithm(this.traversalEngineType);
             } else {
                 for (final Traversal.Admin<?, ?> matchTraversal : this.matchTraversals) {
                     if (matchTraversal.hasNext()) {
@@ -420,7 +428,7 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
                 if (null != this.selectKey)
                     this.scopeKeys.add(this.selectKey);
                 if (this.getNextStep() instanceof WhereTraversalStep || this.getNextStep() instanceof WherePredicateStep)
-                   this.scopeKeys.addAll(((Scoping) this.getNextStep()).getScopeKeys());
+                    this.scopeKeys.addAll(((Scoping) this.getNextStep()).getScopeKeys());
                 this.scopeKeys = Collections.unmodifiableSet(this.scopeKeys);
             }
             return this.scopeKeys;
@@ -555,7 +563,7 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
         public static Function<List<Traversal.Admin<Object, Object>>, IllegalStateException> UNMATCHABLE_PATTERN = traversals -> new IllegalStateException("The provided match pattern is unsolvable: " + traversals);
 
 
-        public void initialize(final List<Traversal.Admin<Object, Object>> traversals);
+        public void initialize(final TraversalEngine.Type traversalEngineType, final List<Traversal.Admin<Object, Object>> traversals);
 
         public default void recordStart(final Traverser.Admin<Object> traverser, final Traversal.Admin<Object, Object> traversal) {
 
@@ -571,7 +579,7 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
         private List<Traversal.Admin<Object, Object>> traversals;
 
         @Override
-        public void initialize(final List<Traversal.Admin<Object, Object>> traversals) {
+        public void initialize(final TraversalEngine.Type traversalEngineType, final List<Traversal.Admin<Object, Object>> traversals) {
             this.traversals = traversals;
         }
 
@@ -589,14 +597,26 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
 
         protected List<Bundle> bundles;
         protected int counter = 0;
+        protected boolean onComputer;
 
-        @Override
-        public void initialize(final List<Traversal.Admin<Object, Object>> traversals) {
+        public void initialize(final TraversalEngine.Type traversalEngineType, final List<Traversal.Admin<Object, Object>> traversals) {
+            this.onComputer = traversalEngineType.equals(TraversalEngine.Type.COMPUTER);
             this.bundles = traversals.stream().map(Bundle::new).collect(Collectors.toList());
         }
 
         @Override
         public Traversal.Admin<Object, Object> apply(final Traverser.Admin<Object> traverser) {
+            // optimization to favor processing StarGraph local objects first to limit message passing (GraphComputer only)
+            // TODO: generalize this for future MatchAlgorithms (given that 3.2.0 will focus on RealTimeStrategy, it will probably go there)
+            if (this.onComputer) {
+                final List<Set<String>> labels = traverser.path().labels();
+                final Set<String> lastLabels = labels.get(labels.size() - 1);
+                Collections.sort(this.bundles,
+                        Comparator.<Bundle>comparingLong(b -> Helper.getStartLabels(b.traversal).stream().filter(startLabel -> !lastLabels.contains(startLabel)).count()).
+                                thenComparingInt(b -> b.traversalType.ordinal()).
+                                thenComparingDouble(b -> b.multiplicity));
+            }
+
             Bundle startLabelsBundle = null;
             for (final Bundle bundle : this.bundles) {
                 if (!Helper.hasExecutedTraversal(traverser, bundle.traversal) && Helper.hasStartLabels(traverser, bundle.traversal)) {
@@ -618,9 +638,11 @@ public final class MatchStep<S, E> extends ComputerAwareStep<S, Map<String, E>>
         @Override
         public void recordEnd(final Traverser.Admin<Object> traverser, final Traversal.Admin<Object, Object> traversal) {
             this.getBundle(traversal).incrementEndCount();
-            if (this.counter < 200 || this.counter % 250 == 0) // aggressively sort for the first 200 results -- after that, sort every 250
-                Collections.sort(this.bundles, Comparator.<Bundle>comparingInt(b -> b.traversalType.ordinal()).thenComparingDouble(b -> b.multiplicity));
-            this.counter++;
+            if (!this.onComputer) {  // if on computer, sort on a per traverser-basis with bias towards local star graph
+                if (this.counter < 200 || this.counter % 250 == 0) // aggressively sort for the first 200 results -- after that, sort every 250
+                    Collections.sort(this.bundles, Comparator.<Bundle>comparingInt(b -> b.traversalType.ordinal()).thenComparingDouble(b -> b.multiplicity));
+                this.counter++;
+            }
         }
 
         protected Bundle getBundle(final Traversal.Admin<Object, Object> traversal) {

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/956d7977/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
index 9c7cef1..529e717 100644
--- a/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
+++ b/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStepTest.java
@@ -20,19 +20,25 @@ package org.apache.tinkerpop.gremlin.process.traversal.step.map;
 
 import org.apache.tinkerpop.gremlin.process.traversal.P;
 import org.apache.tinkerpop.gremlin.process.traversal.Traversal;
+import org.apache.tinkerpop.gremlin.process.traversal.TraversalEngine;
+import org.apache.tinkerpop.gremlin.process.traversal.Traverser;
 import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__;
 import org.apache.tinkerpop.gremlin.process.traversal.step.StepTest;
 import org.apache.tinkerpop.gremlin.process.traversal.step.filter.CoinStep;
 import org.apache.tinkerpop.gremlin.process.traversal.step.filter.ConnectiveStep;
 import org.apache.tinkerpop.gremlin.process.traversal.step.filter.WherePredicateStep;
 import org.apache.tinkerpop.gremlin.process.traversal.step.filter.WhereTraversalStep;
+import org.apache.tinkerpop.gremlin.process.traversal.step.util.EmptyStep;
+import org.apache.tinkerpop.gremlin.process.traversal.traverser.B_LP_O_P_S_SE_SL_TraverserGenerator;
 import org.apache.tinkerpop.gremlin.process.traversal.traverser.util.EmptyTraverser;
 import org.apache.tinkerpop.gremlin.structure.T;
 import org.junit.Test;
 
 import java.util.Arrays;
+import java.util.Collections;
 import java.util.HashSet;
 import java.util.List;
+import java.util.function.Consumer;
 
 import static org.apache.tinkerpop.gremlin.process.traversal.P.eq;
 import static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.*;
@@ -186,7 +192,7 @@ public class MatchStepTest extends StepTest {
         // MAKE SURE THE SORT ORDER CHANGES AS MORE RESULTS ARE RETURNED BY ONE OR THE OTHER TRAVERSAL
         Traversal.Admin<?, ?> traversal = __.match(as("a").out().as("b"), as("c").in().as("d")).asAdmin();
         MatchStep.CountMatchAlgorithm countMatchAlgorithm = new MatchStep.CountMatchAlgorithm();
-        countMatchAlgorithm.initialize(((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren());
+        countMatchAlgorithm.initialize(TraversalEngine.Type.STANDARD, ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren());
         Traversal.Admin<Object, Object> firstPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(0);
         Traversal.Admin<Object, Object> secondPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(1);
         //
@@ -232,7 +238,7 @@ public class MatchStepTest extends StepTest {
         ///////  MAKE SURE WHERE PREDICATE TRAVERSALS ARE ALWAYS FIRST AS THEY ARE SIMPLY .hasNext() CHECKS
         traversal = __.match(as("a").out().as("b"), as("c").in().as("d"), where("a", P.eq("b"))).asAdmin();
         countMatchAlgorithm = new MatchStep.CountMatchAlgorithm();
-        countMatchAlgorithm.initialize(((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren());
+        countMatchAlgorithm.initialize(TraversalEngine.Type.STANDARD, ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren());
         assertEquals(3, countMatchAlgorithm.bundles.size());
         firstPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(0);
         secondPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(1);
@@ -307,6 +313,78 @@ public class MatchStepTest extends StepTest {
     }
 
     @Test
+    public void testComputerAwareCountMatchAlgorithm() {
+        // MAKE SURE THE SORT ORDER CHANGES AS MORE RESULTS ARE RETURNED BY ONE OR THE OTHER TRAVERSAL
+        final Consumer doNothing = s -> {
+        };
+        Traversal.Admin<?, ?> traversal = __.match(
+                as("a").sideEffect(doNothing).as("b"),    // 1
+                as("b").sideEffect(doNothing).as("c"),    // 2
+                as("a").sideEffect(doNothing).as("d"),    // 5
+                as("c").sideEffect(doNothing).as("e"),    // 4
+                as("c").sideEffect(doNothing).as("f"))    // 3
+                .asAdmin();
+        traversal.applyStrategies(); // necessary to enure step ids are unique
+        MatchStep.CountMatchAlgorithm countMatchAlgorithm = new MatchStep.CountMatchAlgorithm();
+        countMatchAlgorithm.initialize(TraversalEngine.Type.COMPUTER, ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren());
+        Traversal.Admin<Object, Object> firstPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(0);
+        Traversal.Admin<Object, Object> secondPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(1);
+        Traversal.Admin<Object, Object> thirdPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(2);
+        Traversal.Admin<Object, Object> forthPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(3);
+        Traversal.Admin<Object, Object> fifthPattern = ((MatchStep<?, ?>) traversal.getStartStep()).getGlobalChildren().get(4);
+        countMatchAlgorithm.bundles.stream().forEach(bundle -> assertEquals(0.0d, bundle.multiplicity, 0.0d));
+        assertEquals(MatchStep.TraversalType.MATCH_TRAVERSAL, countMatchAlgorithm.getBundle(firstPattern).traversalType);
+        assertEquals(MatchStep.TraversalType.MATCH_TRAVERSAL, countMatchAlgorithm.getBundle(secondPattern).traversalType);
+        assertEquals(MatchStep.TraversalType.MATCH_TRAVERSAL, countMatchAlgorithm.getBundle(thirdPattern).traversalType);
+        assertEquals(MatchStep.TraversalType.MATCH_TRAVERSAL, countMatchAlgorithm.getBundle(forthPattern).traversalType);
+        assertEquals(MatchStep.TraversalType.MATCH_TRAVERSAL, countMatchAlgorithm.getBundle(fifthPattern).traversalType);
+        assertEquals(firstPattern, countMatchAlgorithm.bundles.get(0).traversal);
+        assertEquals(secondPattern, countMatchAlgorithm.bundles.get(1).traversal);
+        assertEquals(thirdPattern, countMatchAlgorithm.bundles.get(2).traversal);
+        assertEquals(forthPattern, countMatchAlgorithm.bundles.get(3).traversal);
+        assertEquals(fifthPattern, countMatchAlgorithm.bundles.get(4).traversal);
+        // MAKE THE SECOND PATTERN EXPENSIVE
+        countMatchAlgorithm.recordStart(EmptyTraverser.instance(), secondPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), secondPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), secondPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), secondPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), secondPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), secondPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), secondPattern);
+        // MAKE THE THIRD PATTERN MORE EXPENSIVE THAN FORTH
+        countMatchAlgorithm.recordStart(EmptyTraverser.instance(), thirdPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), thirdPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), thirdPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), thirdPattern);
+        // MAKE THE FORTH PATTERN EXPENSIVE
+        countMatchAlgorithm.recordStart(EmptyTraverser.instance(), forthPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), forthPattern);
+        countMatchAlgorithm.recordEnd(EmptyTraverser.instance(), forthPattern);
+        //
+        Traverser.Admin traverser = B_LP_O_P_S_SE_SL_TraverserGenerator.instance().generate(1, EmptyStep.instance(), 1l);
+        traverser.addLabels(Collections.singleton("a"));
+        assertEquals(firstPattern, countMatchAlgorithm.apply(traverser));
+        traverser = traverser.split(1, EmptyStep.instance());
+        traverser.addLabels(new HashSet<>(Arrays.asList("b", firstPattern.getStartStep().getId())));
+        //
+        assertEquals(secondPattern, countMatchAlgorithm.apply(traverser));
+        traverser = traverser.split(1, EmptyStep.instance());
+        traverser.addLabels(new HashSet<>(Arrays.asList("c", secondPattern.getStartStep().getId())));
+        //
+        assertEquals(fifthPattern, countMatchAlgorithm.apply(traverser));
+        traverser = traverser.split(1, EmptyStep.instance());
+        traverser.addLabels(new HashSet<>(Arrays.asList("f", fifthPattern.getStartStep().getId())));
+        //
+        assertEquals(forthPattern, countMatchAlgorithm.apply(traverser));
+        traverser = traverser.split(1, EmptyStep.instance());
+        traverser.addLabels(new HashSet<>(Arrays.asList("e", forthPattern.getStartStep().getId())));
+        //
+        assertEquals(thirdPattern, countMatchAlgorithm.apply(traverser));
+        traverser = traverser.split(1, EmptyStep.instance());
+        traverser.addLabels(new HashSet<>(Arrays.asList("d", thirdPattern.getStartStep().getId())));
+    }
+
+    @Test
     public void shouldCalculateStartLabelCorrectly() {
         Traversal.Admin<?, ?> traversal = match(
                 where(and(


[5/9] incubator-tinkerpop git commit: MatchStep CountMatchAlgorithm is smart to bias patterns that don't reqiure message passing in OLAP.

Posted by dk...@apache.org.
MatchStep CountMatchAlgorithm is smart to bias patterns that don't reqiure message passing in OLAP.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/b338f962
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/b338f962
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/b338f962

Branch: refs/heads/TINKERPOP3-904
Commit: b338f9625fdadcb991ac8d5414641f190add24c6
Parents: 558c04e f644fb4
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Wed Nov 4 06:42:40 2015 -0700
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Wed Nov 4 06:42:40 2015 -0700

----------------------------------------------------------------------
 CHANGELOG.asciidoc                              |  2 +
 .../process/traversal/step/map/MatchStep.java   | 46 +++++++----
 .../traversal/step/map/MatchStepTest.java       | 82 +++++++++++++++++++-
 3 files changed, 115 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/b338f962/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --cc CHANGELOG.asciidoc
index 5ad89e5,e4eb554..31671c0
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@@ -25,7 -25,8 +25,9 @@@ image::https://raw.githubusercontent.co
  TinkerPop 3.1.0 (NOT OFFICIALLY RELEASED YET)
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
 +* Integrated `NumberHelper` in `SumStep`, `MinStep`, `MaxStep` and `MeanStep` (local and global step variants).
+ * `CountMatchAlgorithm`, in OLAP, now biases traversal selection towards those traversals that start at the current traverser location to reduce message passing.
+ * Fixed a file stream bug in Hadoop OLTP that showed up if the streamed file was more than 2G of data.
  * Bumped to Neo4j 2.3.0.
  * Added `PersistedInputRDD` and `PersistedOutputRDD` which enables `SparkGraphComputer` to store the graph RDD in the context between jobs (no HDFS serialization required).
  * Renamed the `public static String` configuration variable names of TinkerGraph (deprecated old variables).