You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by ok...@apache.org on 2015/06/08 22:00:16 UTC

[1/2] incubator-tinkerpop git commit: added barrier() step to docs and tweaks to Neo4j docs.

Repository: incubator-tinkerpop
Updated Branches:
  refs/heads/master e17070a74 -> ce6b2e612


added barrier() step to docs and tweaks to Neo4j docs.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/ce6b2e61
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/ce6b2e61
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/ce6b2e61

Branch: refs/heads/master
Commit: ce6b2e6123c8f3999c4bcdb3950dd05fcdcf5ced
Parents: 13f12db
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Mon Jun 8 13:59:55 2015 -0600
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Mon Jun 8 14:00:05 2015 -0600

----------------------------------------------------------------------
 docs/src/implementations.asciidoc | 22 ++++++++--------
 docs/src/the-traversal.asciidoc   | 47 +++++++++++++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/ce6b2e61/docs/src/implementations.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/implementations.asciidoc b/docs/src/implementations.asciidoc
index 9cc9709..8275671 100644
--- a/docs/src/implementations.asciidoc
+++ b/docs/src/implementations.asciidoc
@@ -593,24 +593,22 @@ graph.io(graphml()).readGraph('data/grateful-dead.xml')
 g = graph.traversal()
 g.tx().commit()
 clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()}  <1>
-clock(1000) {g.V().has('name','Garcia').iterate()} <2>
-graph.cypher("CREATE INDEX ON :artist(name)") <3>
+graph.cypher("CREATE INDEX ON :artist(name)") <2>
 g.tx().commit()
-Thread.sleep(5000) <4>
-clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} <5>
-clock(1000) {g.V().has('name','Garcia').iterate()} <6>
-graph.cypher("DROP INDEX ON :artist(name)") <7>
+Thread.sleep(5000) <3>
+clock(1000) {g.V().hasLabel('artist').has('name','Garcia').iterate()} <4>
+clock(1000) {g.V().has('name','Garcia').iterate()} <5>
+graph.cypher("DROP INDEX ON :artist(name)") <6>
 g.tx().commit()
 graph.close()
 ----
 
 <1> Find all artists whose name is Garcia which does a linear scan of the artist vertex-label partition.
-<2> Find all vertices whose name is Garcia which requires a linear scan of all the data in the graph.
-<3> Create an index for all artist vertices on their name property.
-<4> Neo4j indices are eventually consistent so this stalls to give the index to populate itself.
-<5> Find all artists whose name is Garcia which uses the pre-defined schema index.
-<6> Find all vertices whose name is Garcia which requires a linear scan of all the data in the graph.
-<7> Drop the created index.
+<2> Create an index for all artist vertices on their name property.
+<3> Neo4j indices are eventually consistent so this stalls to give the index time to populate itself.
+<4> Find all artists whose name is Garcia which uses the pre-defined schema index.
+<5> Find all vertices whose name is Garcia which requires a linear scan of all the data in the graph.
+<6> Drop the created index.
 
 Multi/Meta-Properties
 ~~~~~~~~~~~~~~~~~~~~~

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/ce6b2e61/docs/src/the-traversal.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/the-traversal.asciidoc b/docs/src/the-traversal.asciidoc
index 5eaf3f4..902f55a 100644
--- a/docs/src/the-traversal.asciidoc
+++ b/docs/src/the-traversal.asciidoc
@@ -248,6 +248,51 @@ g.V().hasLabel('software').as('a','b','c').
      by(__.in('created').values('name').fold())
 ----
 
+[[barrier-step]]
+Barrier Step
+~~~~~~~~~~~~
+
+The `barrier()`-step turns the the lazy traversal pipeline into a bulk-synchronous pipeline (*barrier*). This step is useful in the following situations:
+  . When everything prior to `barrier()` needs to be executed before moving onto the steps after the `barrier()` (i.e. ordering).
+  . When "stalling" the traversal may lead to a "bulking optimization" in traversals that repeatedly touch many of the same elements (i.e. optimizing).
+
+[gremlin-groovy,modern]
+----
+g.V().sideEffect{println "first: ${it}"}.sideEffect{println "second: ${it}"}.iterate()
+g.V().sideEffect{println "first: ${it}"}.barrier().sideEffect{println "second: ${it}"}.iterate()
+----
+
+The theory behind a "bulking optimization" is simple. If there are one million traversers are vertex 1, then there is no need to calculate one million `both()`-computations. Instead, represent those one million traversers as a single traverser with a `Traverser.bulk()` equal to one million and execute `both()` once. A bulking optimization example is made more salient on a larger graph. Therefore, the example below leverages the <<grateful-dead,Grateful Dead graph>>.
+
+[gremlin-groovy]
+----
+graph = TinkerGraph.open()
+graph.io(graphml()).readGraph('data/grateful-dead.xml')
+g = graph.traversal(standard())
+clockWithResult(1){g.V().both().both().both().count().next()} <1>
+clockWithResult(1){g.V().repeat(both()).times(3).count().next()} <2>
+clockWithResult(1){g.V().both().barrier().both().barrier().both().barrier().count().next()} <3>
+----
+
+<1> A non-bulking traversal where each traverser is processed.
+<2> Each traverser entering `repeat()` has its recursion bulked.
+<3> A bulking traversal where implicit traversers are not processed.
+
+If `barrier()` is provided an integer argument, then the barrier will only hold `n`-number of unique traversers in its barrier before draining the aggregated traversers to the next step. This is useful in the aforementioned bulking optimization scenario, but reduces the risk of an out-of-memory exception.
+
+The non-default `LazyBarrierStrategy` inserts `barrier()`-steps in a traversal where appropriate in order to gain the "bulking optimization."
+
+[gremlin-groovy]
+----
+graph = TinkerGraph.open()
+graph.io(graphml()).readGraph('data/grateful-dead.xml')
+g = graph.traversal(GraphTraversalSource.build().with(LazyBarrierStrategy.instance()).engine(StandardTraversalEngine.build()))
+clockWithResult(1){g.V().both().both().both().count().next()}
+g.V().both().both().both().count().iterate().toString()  <1>
+----
+
+<1> With `LazyBarrierStrategy` activated, `barrier()` steps are automatically inserted where appropriate.
+
 [[by-step]]
 By Step
 ~~~~~~~
@@ -1504,7 +1549,7 @@ A Note on Barrier Steps
 
 image:barrier.png[width=165,float=right] Gremlin is primarily a link:http://en.wikipedia.org/wiki/Lazy_evaluation[lazy], stream processing language. This means that Gremlin fully processes (to the best of its abilities) any traversers currently in the traversal pipeline before getting more data from the start/head of the traversal. However, there are numerous situations in which a completely lazy computation is not possible (or impractical). When a computation is not lazy, a "barrier step" exists. There are three types of barriers:
 
-  . `CollectingBarrierStep`: All of the traversers prior to the step are put into a collection and then processed in some way (e.g. ordered) prior to the collection being "drained" one-by-one to the next step. Examples include: <<order-step,`order()`>>, <<sample-step,`sample()`>>, <<aggregate-step,`aggregate()`>>.
+  . `CollectingBarrierStep`: All of the traversers prior to the step are put into a collection and then processed in some way (e.g. ordered) prior to the collection being "drained" one-by-one to the next step. Examples include: <<order-step,`order()`>>, <<sample-step,`sample()`>>, <<aggregate-step,`aggregate()`>>, <<barrier-step,`barrier()`>>.
   . `ReducingBarrierStep`: All of the traversers prior to the step are processed by a reduce function and once all the previous traversers are processed, a single "reduced value" traverser is emitted to the next step. Examples include: <<fold-step,`fold()`>>, <<count-step,`count()`>>, <<sum-step,`sum()`>>, <<max-step,`max()`>>, <<min-step,`min()`>>.
   . `SupplyingBarrierStep`: All of the traversers prior to the step are iterated (no processing) and then some provided supplier yields a single traverser to continue to the next step. Examples include: <<cap-step,`cap()`>>.
 


[2/2] incubator-tinkerpop git commit: removed unneeded OLAP warnings for various steps in the-traversal as now ComputerVerificationStrategy will explain to the user that the step is not possible in certain situations.

Posted by ok...@apache.org.
removed unneeded OLAP warnings for various steps in the-traversal as now ComputerVerificationStrategy will explain to the user that the step is not possible in certain situations.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/13f12dbf
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/13f12dbf
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/13f12dbf

Branch: refs/heads/master
Commit: 13f12dbfb22cef185fa808b064d515185a2bf051
Parents: e17070a
Author: Marko A. Rodriguez <ok...@gmail.com>
Authored: Mon Jun 8 13:05:56 2015 -0600
Committer: Marko A. Rodriguez <ok...@gmail.com>
Committed: Mon Jun 8 14:00:05 2015 -0600

----------------------------------------------------------------------
 docs/src/the-traversal.asciidoc | 6 ------
 1 file changed, 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/13f12dbf/docs/src/the-traversal.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/the-traversal.asciidoc b/docs/src/the-traversal.asciidoc
index ca573a9..5eaf3f4 100644
--- a/docs/src/the-traversal.asciidoc
+++ b/docs/src/the-traversal.asciidoc
@@ -411,8 +411,6 @@ g.V().valueMap(true, 'name')
 g.V().dedup().by(label).values('name')
 ----
 
-WARNING: The `dedup()`-step does not have a correlate in <<traversalvertexprogram,Gremlin OLAP>> when used mid-traversal. When in mid-traversal de-duplication only occurs at the the current processing vertex and thus, is not a global operation as it in Gremlin OLTP. When `dedup()` is an end step, the resultant traversers are de-duplicated by `TraverserMapReduce`.
-
 [[drop-step]]
 Drop Step
 ~~~~~~~~~
@@ -476,8 +474,6 @@ The three projection parameters available to `group()` via `by()` are:
 . Value-projection: What feature of the group to store in the key-list?
 . Reduce-projection: What feature of the key-list to ultimately return?
 
-WARNING: The `group()`-step does not have a correlate in <<traversalvertexprogram,Gremlin OLAP>> when used mid-traversal. When in mid-traversal grouping only occurs at the the current processing vertex and thus, is not a global operation as it in Gremlin OLTP. However, `GroupMapReduce` provides unified groups at the end of the traversal computation.
-
 [[groupcount-step]]
 GroupCount Step
 ~~~~~~~~~~~~~~~
@@ -507,8 +503,6 @@ g.V().repeat(both().groupCount('m').by(label)).times(10).cap('m')
 
 The above is interesting in that it demonstrates the use of referencing the internal `Map<Object,Long>` of `groupCount()` with a string variable. Given that `groupCount()` is a sideEffect-step, it simply passes the object it received to its output. Internal to `groupCount()`, the object's count is incremented.
 
-WARNING: The `groupCount()`-step does not have a correlate in <<traversalvertexprogram,Gremlin OLAP>> when used mid-traversal. When in mid-traversal grouping only occurs at the the current processing vertex and thus, is not a global operation as it in Gremlin OLTP. However, `GroupCountMapReduce` provides unified groups at the end of the traversal computation.
-
 [[has-step]]
 Has Step
 ~~~~~~~~