You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Marko A. Rodriguez (JIRA)" <ji...@apache.org> on 2015/06/11 15:59:00 UTC

[jira] [Commented] (TINKERPOP3-700) WhereStep should "MatchStep" and ConjunctionP should use the BudgetAlgorithm

    [ https://issues.apache.org/jira/browse/TINKERPOP3-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581945#comment-14581945 ] 

Marko A. Rodriguez commented on TINKERPOP3-700:
-----------------------------------------------

Suppose the following pattern that needs to be matched and assume that we come with {{a}}-bound to a vertex.

{code}
a--knows-->b     // t1
a--worksFor-->c  // t2
b--worksFor-->c  // t3
{code}
 
{code}
select('a').out('knows').is(eq('b'))
select('a').out('worksFor').is(eq('c'))
select('b').out('worksFor').is(eq('c'))
{code}
 
This is what we have now for {{WhereStep}} (i.e. {{SelectOneStep .... IsStep}}). However, with {{match()}}, we can't always assume that the right hand side variable will be bound, thus, we need a new step that is a slight variation on {{IsStep}} -- basically if the variable is bound, {{eq()}}-check, else just let it flow through. The following {{as('b')}} will bind the variable.

{code} 
select('a').as('a','t1').out('knows').isOrAllow('b').as('b')
select('a').as('a','t2').out('worksFor').isOrAllow('c').as('c')
select('b').as('b','t3').out('worksFor').isOrAllow('c').as('c')
{code}

Now we use standard steps to effect a {{match()}}. Notice how the path history of the traverser IS the "{{Map<String,Object>}}"-binding. That is the crucial concept -- no external data structures, its all with respect to the {{Traverser}} (thus, no state is maintained outside of the {{Traverser}}).
 
{code}
choose(traverser -> { 
    if(traverser.path().hasLabel('a') && !traverser.path().hasLabel('t1')) return 't1';
    if(traverser.path().hasLabel('a') && !traverser.path().hasLabel('t2')) return 't2';
    if(traverser.path().hasLabel('b') && !traverser.path().hasLabel('t3')) return 't3';)
  .option('t1', select('a').as('a','t1').out('knows').isOrAllow('b').as('b'))
  .option('t2', select('a').as('a','t2').out('worksFor').isOrAllow('c').as('c'))
  .option('t3', select('b').as('b','t3').out('worksFor').isOrAllow('c').as('c'))
 {code}
 
Next, given that this is an "and"-match, we need to make sure that we try each {{option()}} to make sure we get a full binding. So, {{until}} we have tried all options, {{repeat}}.

{code} 
until(traverser -> traverser.path().hasLabel('t1') && hasLabel('t2') && hasLabel('t3')).
  repeat(
    choose(traverser -> { 
        if(traverser.path().hasLabel('a') && !traverser.path().hasLabel('t1')) return 't1';
        if(traverser.path().hasLabel('a') && !traverser.path().hasLabel('t2')) return 't2';
        if(traverser.path().hasLabel('b') && !traverser.path().hasLabel('t3')) return 't3';)
      .option('t1', select('a').as('a','t1').out('knows').isOrAllow('b').as('b'))
      .option('t2', select('a').as('a','t2').out('worksFor').isOrAllow('c').as('c'))
      .option('t3', select('b').as('b','t3').out('worksFor').isOrAllow('c').as('c')))
{code}

What is emitted from the {{repeat}}-step is simply a {{Traverser}} that "survived" the trip. That is, was able to make it through the pattern un-filtered. However, what this traverser has in him is his {{Path}} which IS the "{{Map<String,Object>}}". As it stands, what we have effected is a "where()" with pattern matching. If you want a "match()", simply {{select()}}.
 
{code}
until(traverser -> traverser.path().hasLabel('t1') && hasLabel('t2') && hasLabel('t3')).
  repeat(
    choose(traverser -> { 
        if(traverser.path().hasLabel('a') && !traverser.path().hasLabel('t1')) return 't1';
        if(traverser.path().hasLabel('a') && !traverser.path().hasLabel('t2')) return 't2';
        if(traverser.path().hasLabel('b') && !traverser.path().hasLabel('t3')) return 't3';)
      .option('t1', select('a').as('a','t1').out('knows').isOrAllow('b').as('b'))
      .option('t2', select('a').as('a','t2').out('worksFor').isOrAllow('c').as('c'))
      .option('t3', select('b').as('b','t3').out('worksFor').isOrAllow('c').as('c'))).
select('a','b','c')
{code}

Finally, the budget algorithm is now just the cleverness of your {{choose()}}-predicate. Which path to take based on statistics over time of how fast certain {{option()s}} are taking. i.e. try and filter fast! Moreover, or-ing is also a function of your {{choose()}}-predicate. Instead of {{t1 && t2 && ...}}, just do {{t1 || t2 || t3}}... etc.


> WhereStep should "MatchStep" and ConjunctionP should use the BudgetAlgorithm
> ----------------------------------------------------------------------------
>
>                 Key: TINKERPOP3-700
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-700
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>            Reporter: Marko A. Rodriguez
>
> {code}
> g.V.as('a').where(a.knows.b
>                   a.knows.c
>                   b.knows.c)
> {code}
> The above can be written as an OrP of the form:
> {code}
> g.V.as('a').or(select(a).knows.b.select(b).knows.c.select(a).knows.where(eq(c)),
>                select(a).knows.c.select(a).knows.b.select(b).knows.where(eq(c)));
> {code}
> In essence, the where-statements are rewritten in terms of every possible permutation. When these permutations are put into an OrP (via or(traversals…)), then if any branch returns a result, then the original 'a' is emitted (as {{WhereStep}} is a {{FilterStep}}). If {{OrP}} is under the BudgetAlgorithm, then {{OrP}} will "thread between" its traversals until a value is yielded. *And given that all permutations are the same semantics -- if one fails, they all fail!*
> What is nice about this, is that arbitrary nesting comes "for free."
> {code}
> g.V.as('a').where(a.knows.b
>                   a.uncle.b
>                     and(a.worksFor.c
>                           b.worksFor.c))
> {code}
> This is rewritten as:
> {code}
> g.V.as('a').or(select(a).knows.b.or(select(a).worksFor.c.select(b).worksFor.where(eq(c)),select(b).worksFor.c.select(a).worksFor.where(eq(c))).select(a).uncle.where(eq(b)),
>                select(a).uncle.b.select(a).knows.where(eq(b)).or(select(a).worksFor.c.select(b).worksFor.where(eq(c)),select(b).worksFor.c.select(a).worksFor.where(eq(c))))
> {code}
> *IMPORTANT* This is not a "match" in the {{MatchStep}} sense as it doesn't return all permutations that bind, it only filters based on a single match.
> What is interesting about this approach:
>   1. The rewrite algorithm seems simple as its just concatenation given {{select()}}-projections and {{where(eq)}}-tails.
>   2. The cool thing about the rewrite in all possible permutations is that if any one {{FastNoSuchElementException}}, its booted from the {{ConjunctionP}} analysis.
>   3. {{ConjunctionP}} has the BudgetAlgorithm and thus can be used for ANY step that has conjunctions -- {{HasStep}}, {{IsStep}}, etc.
>   4. It uses the path data structure to maintain the variable bindings. {{WhereStep}} has no state! Its all about {{OrP}}. 
>   5. Given that the path data is the variable bindings, then this also works for OLAP as the traverser contains all the information it needs (no central location of analysis!)
> 	- However, you would only pick one permutation to do as `or()` does not exist in OLAP. 
> 	- and with one permutation, {{where().select()}} is then {{MatchStep}} which would then work in OLAP!
> 		- thus, Gremlin OLAP can rewrite {{match()}} to the {{where().select()}} form and TADA!
>   
> *IMPORTANT* 4 and 5 above are pretty insane consequences. And if any, we should at least use this realization to make {{match()}} work in OLAP.
> Next, realize that how {{where()}} should work is that if an {{as()}} is NOT in the path data structure, then its a variable bindings for rewrite. Moreover, if you don't provide a start {{as()}}, it is assumed to be the incoming object (currently how {{where()}} works). For example:
> {code}
> g.V.where(knows.b
>           knows.c
>           b.knows.c)
> {code}
> This is rewritten as:
> {code}
> g.V.or(x.select(x).knows.b.or(select(b).worksFor.where(eq(a)),select(x).worksFor.where(eq(b))).select(x).uncle.where(eq(b)),
>        x.select(x).uncle.b.select(x).knows.where(eq(b)).or(select(b).worksFor.where(eq(x)),select(x).worksFor.where(eq(b))))  
> {code}
> To be sure, the {{as('a').select('a')}} fragments can of course be optimized out to just {{as('a')}}.
>           



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)