You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Marko A. Rodriguez (JIRA)" <ji...@apache.org> on 2016/04/08 19:16:25 UTC

[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

    [ https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232505#comment-15232505 ] 

Marko A. Rodriguez commented on TINKERPOP-1254:
-----------------------------------------------

We might want to go down the road of {{Traverser.setPath()}} ([~pietermartin-tinkerpop]. I never liked {{Traverser.addLabels()}}, but put it there because of traverser species that don't support labels. However, for those that don't support path, we could return {{EmptyPath.instance()}} on {{getPath()}} and do nothing on {{setPath()}}.

If we do this, then we can deprecate {{Traverser.addLabels()}} (and also make it a default "helper" method) and have the above work as follows.

{code}
// Traverser.addLabels(Set<String> labels) has a default implementation of below.
if(!labels.isEmpty()) {
  traverser.setPath(traverser.getPath().extend(labels))
}
// Traverser.removeLabels(Set<String> labels) would have a default implementation of below.
// though if we deprecate, Traverser.addLabels(), don't even create Traverser.removeLabels().
if(!labels.isEmpty()) {
  traverser.setPath(traverser.getPath().retract(labels))
}
{code}

Okay. This is clean and nice and backwards compatible blah blah. The question then becomes, what does the implementation of {{Path.retract()}} look like? This is where things get complicated, but it shouldn't be too hard. For {{MutablePath}}, simply go through each path entry and if a label is in the retracted labels, remove it. If the labels for that entry are now empty, remove the entry. This is simple {{List}} manipulations. For {{ImmutablePath}}, things are not so easy. {{ImmutablePath}} subpaths are shared amongst traversers where "real" data structure behind the scenes is a tree. This is used in OLAP/shared-memory where we greatly reduce object creation by reusing path segments amongst traversers. For OLAP, this reuse is not possible because there is no shared-memory (hence, {{MutablePath}}).

So what is the algorithm for {{ImmutablePath.retract(Set<String> labels)}}? I think we have to walk the path sequence and if a label is found, then you have to branch from that segment and remove the label. You then have to copy all the segments post that section over to the new branch... hard to describe. However, in short, just make sure that by "mutating" the {{ImmutablePath}}, the path information that is shared by familial traversers is not mutated as well.

Finally, for traverser species that do not support path-calculations, {{EmptyPath.instance(}} retract/extend methods do nothing but simply return {{this}}.

> Support dropping traverser path information when it is no longer needed.
> ------------------------------------------------------------------------
>
>                 Key: TINKERPOP-1254
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be "bulked." There are various reasons why two traversers at the same object can not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That is, when the history of the traverser is required, the probability of two traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to remove historic information from a traverser so it can get bulked. How does one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at which point, at {{both()}}, they are no longer required. It would be smart to support:
> {code}
> traverser.dropLabels(Set<String>)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between {{where()}} and {{both()}} called {{PathPruneStep}} which would be a {{SideEffectStep}}. The strategy would know which labels were no longer needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set<String> dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser<S> traverser) {
>     final Traverser<S> start = this.starts.next();
>     if(this.dropPath) start.dropPath();
>     else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)