You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2015/06/18 00:05:33 UTC

XMatchStep -- A New MatchStep for GA?

Hello,

For the last week or so I've been working on XMatchStep w/ Kuppitz. What spurned this was my burning desire to get and/or/not settled by GA and where()/match() are the culprits that leverage it the most. In fact, along with XMatchStep, WhereStep has gotten a massive revamping for the better. Anywho, first off, XMatchStep:

..simple, elegant -- easy, breezy, cover girl:

	https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java

It is now at a point where it is passing all the MatchTest tests  (save one -- whose expected results are up to debate) and is comparable in performance to the original MatchStep developed by Josh Shinavier (cc:d) which leverages Matthias Bröcheler's budget match algorithm.

There are a few things about XMatchStep that make it "better" than the current/original MatchStep:

	1. Works for both OLAP and OLTP.
	2. Supports nested AND/OR'ing.
	3. Supports pluggable "match algorithms." (i.e. pattern execution plan)
	4. Maintains no runtime state (fully functional).
	5. Uses the more recent advances in TinkerPop's Step framework.
		e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
	6. Supports a broader range of legal patterns.
	7. Supports local barrier patterns (i.e. count(), min(), max(), etc.).
	8. Supports sideEffects (i.e. as('a').within("x"))
	9. Supports pulling out HasContainers for GraphStep index lookups.
	10. Supports not'ing patterns [coming soon].

Below are the runtimes for some pattern matches over the Grateful Dead graph. The xmatch()-times specified are using CountMatchAlgorithm which is basically "dynamically sort the match pattern execution plan according to each pattern's set reduction abilities."

	MatchStep vs. XMatchStep: https://gist.github.com/okram/d49e1abf48fdc18f77f9

	CountMatchAlgorithm: https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451

As it stands, it seems that the simple CountMatchAlgorithm is sufficient to perform better than MatchStep's budget match implementation. However, as we do more queries and learn more tricks we can always add more match algorithms. Easy breezy -- yes, cover girl.

In sum total, I would like to replace MatchStep with XMatchStep for GA. If anyone has any thoughts/arguments on the matter, please state them.

Thank you,
Marko.

http://markorodriguez.com


Re: XMatchStep -- A New MatchStep for GA?

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi,

I just realized that the examples provided in the gist don't show AND/OR nesting. Here is an example for those that are interested over the TinkerPop toy graph:

g.V().xmatch("a",
    where("a", P.neq("c")),
    as("a").out("created").as("b"),
    or(
      as("a").out("knows").has("name", "vadas"),
      as("a").in("knows").and().as("a").has(T.label,"person")
    ),
    as("b").in("created").as("c"),
    as("b").in("created").count().is(P.gt(1)))
     .select().by("name")

There result being:

{a=marko, b=lop, c=josh}
{a=marko, b=lop, c=peter}
{a=josh, b=lop, c=marko}
{a=josh, b=lop, c=peter}

There are a couple of interesting things to note. First, look how where("a",P.neq("c")) is the first pattern, but is not executed first as it is unsolvable without "c" being bound. I'm just showing that patterns can be in any arbitrary order though its best to give them in an order that is legal and efficient as it will take CountMatchAlgorithm a few clock-cylces to work out which patterns are doing the largest set reductions.

Second, note that both the prefix-notation OR and the infix-notation AND are legal and compile as expected (bolded above). Realize that without nesting, XMatchStep assumes a single nest of AND -- that is, the fist level is always AND.

	[TinkerGraphStep([],vertex)@[a], XMatchStep(AND,[[XMatchStartStep, WhereStep(global,a,neq(v[6])), XMatchEndStep], [XMatchStartStep(a), VertexStep(OUT,[created],vertex), XMatchEndStep(b)], [XMatchStep(OR,[[XMatchStartStep(a), VertexStep(OUT,[knows],vertex), HasStep([name.eq(vadas)]), XMatchEndStep], [XMatchStep(AND,[[XMatchStartStep(a), VertexStep(IN,[knows],vertex), XMatchEndStep], [XMatchStartStep(a), HasStep([~label.eq(person)]), XMatchEndStep]]), XMatchEndStep]]), XMatchEndStep], [XMatchStartStep(b), VertexStep(IN,[created],vertex), XMatchEndStep(c)], [XMatchStartStep(b), TraversalFlatMapStep([VertexStep(IN,[created],edge), RangeGlobalStep(0,2), CountGlobalStep, IsStep(gt(1))]), XMatchEndStep]]), SelectStep(local,[value(name)])]
 
	** I bolded Daniel's strategy work on count() optimization :)

Finally, I didn't show this in the last email, but this also works in OLAP:

GraphTraversalSource g = TinkerFactory.createModern().traversal(computer());
g.V().xmatch("a",
    where("a", P.neq("c")),
    as("a").out("created").as("b"),
    or(
      as("a").out("knows").has("name", "vadas"),
      as("a").in("knows").and().as("a").has(T.label,"person")
    ),
    as("b").in("created").as("c"),
    as("b").in("created").count().is(P.gt(1)))
     .select().by("name")


{a=v[1], b=v[3], c=v[4]}
{a=v[1], b=v[3], c=v[6]}
{a=v[4], b=v[3], c=v[1]}
{a=v[4], b=v[3], c=v[6]}

Realize we can't do "by("name")" as that requires reference to the vertex to get its properties and in OLAP, we don't have that…… There are a few solutions to getting data from path elements but it hasn't shook itself out yet.

Enjoy,
Marko.

http://markorodriguez.com

On Jun 17, 2015, at 4:05 PM, Marko Rodriguez <ok...@gmail.com> wrote:

> Hello,
> 
> For the last week or so I've been working on XMatchStep w/ Kuppitz. What spurned this was my burning desire to get and/or/not settled by GA and where()/match() are the culprits that leverage it the most. In fact, along with XMatchStep, WhereStep has gotten a massive revamping for the better. Anywho, first off, XMatchStep:
> 
> ..simple, elegant -- easy, breezy, cover girl:
> 
> 	https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
> 
> It is now at a point where it is passing all the MatchTest tests  (save one -- whose expected results are up to debate) and is comparable in performance to the original MatchStep developed by Josh Shinavier (cc:d) which leverages Matthias Bröcheler's budget match algorithm.
> 
> There are a few things about XMatchStep that make it "better" than the current/original MatchStep:
> 
> 	1. Works for both OLAP and OLTP.
> 	2. Supports nested AND/OR'ing.
> 	3. Supports pluggable "match algorithms." (i.e. pattern execution plan)
> 	4. Maintains no runtime state (fully functional).
> 	5. Uses the more recent advances in TinkerPop's Step framework.
> 		e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
> 	6. Supports a broader range of legal patterns.
> 	7. Supports local barrier patterns (i.e. count(), min(), max(), etc.).
> 	8. Supports sideEffects (i.e. as('a').within("x"))
> 	9. Supports pulling out HasContainers for GraphStep index lookups.
> 	10. Supports not'ing patterns [coming soon].
> 
> Below are the runtimes for some pattern matches over the Grateful Dead graph. The xmatch()-times specified are using CountMatchAlgorithm which is basically "dynamically sort the match pattern execution plan according to each pattern's set reduction abilities."
> 
> 	MatchStep vs. XMatchStep: https://gist.github.com/okram/d49e1abf48fdc18f77f9
> 
> 	CountMatchAlgorithm: https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
> 
> As it stands, it seems that the simple CountMatchAlgorithm is sufficient to perform better than MatchStep's budget match implementation. However, as we do more queries and learn more tricks we can always add more match algorithms. Easy breezy -- yes, cover girl.
> 
> In sum total, I would like to replace MatchStep with XMatchStep for GA. If anyone has any thoughts/arguments on the matter, please state them.
> 
> Thank you,
> Marko.
> 
> http://markorodriguez.com
> 


Re: XMatchStep -- A New MatchStep for GA?

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi,

One more thing. I did the full integration test suite for TinkerPop and note that the MatchTest tests worked as expected over both SparkGraphComputer and GiraphGraphComputer (OLAP) --- as well as HadoopGraph (OLTP).

Thanks,
Marko.

http://markorodriguez.com

On Jun 18, 2015, at 8:56 AM, Marko Rodriguez <ok...@gmail.com> wrote:

> Hi everyone,
> 
> @Josh -- thanks for the review. Appreciate it. Also, if you would like to collaborate on adding BudgetMatchAlgorithm, that would be sweet. Here is the current MatchAlgorithm interface. Note that for OLAP, the MatchAlgorithm's state is not global to the graph, but local to the subgraph partition (i.e. the worker).
> 	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java#L402-L422 (basically recordStart and recordEnd are called before and after a traverser is pushed into a pattern).
> 
> @Matthias -- yea. if someone has a schema and knows the best way to execute the patterns for their data, then they can easily write their own MatchAlgorithm and use it. This gets back to the work Pieter Martin is wanting to do around using schema data to optimize traversals. We need to make a MatchAlgorithmStrategy that allows the user to define which strategy to use for the GraphTraversalSource -- right now, its just hardcoded to use CountMatchAlgorithm.
> 
> @Stephen/everyone -- So in master/, I deleted Josh's MatchStep and replaced it with XMatchStep which is now called MatchStep :).
> 	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
> All the test cases pass for both OLTP and OLAP except for these two peculiar situations:
> 	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchTest.java#L190
> 		- Do we treat "predicate patterns" as filters only? The problem, this would require predicate patterns to NOT leave the StarGraph in OLAP. Decisions point. I will work with Kuppitz to see what he thinks is best. There is another related decision to be had around OR semantics --- to make it work in both OLAP and OLTP the semantics of OR are sorta crazy --- basically, "try all paths independently." Those that continue, continue…… eek?! This is the same problem of "predicate patterns." Without having a global blackboard, there is no way to know if a particular OR branch (or predicate pattern) failed….the traverser simply dies and the OR is never the wiser.
> 	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchTest.java#L308
> 		- The where()-clause when outside of match() is unable to get path metadata (i.e. name.equals("Garcia"). The solution is to fold where() into match() but I haven't updated MatchWhereStrategy yet… will do. 
> 
> Besides those two, all tests just passed and more that can now be solved (e.g. the one's that expected an illegal argument exception for being "cyclic."). 
> 
> I have some "play queries" in TinkerGraphTest that use AND/OR nesting and I will add those to MatchTest now.
> 
> Thanks everyone,
> Marko.
> 
> http://markorodriguez.com
> 
> On Jun 18, 2015, at 7:20 AM, Marko Rodriguez <ok...@gmail.com> wrote:
> 
>> Hi Stephen,
>> 
>> There are 3 failing tests. 2 of them are because MatchStep expects a IllegalArgumentException to be thrown because MatchStep can't solve the pattern -- XMatch can. The other one is because of different number of results --- again, debatable on what to actually return as both XMatch and Match return the "right" results, just if the duplicates should be allowed or not.
>> 
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Jun 18, 2015, at 4:44 AM, Stephen Mallette <sp...@gmail.com> wrote:
>> 
>>> I wanted to see what kind of test coverage existed over XMatch so I quickly
>>> swapped xmatch for match.  I'm not sure if additional work needs to be
>>> done, but several tests don't pass under xmatch - perhaps the answer there
>>> is trivial when you dig into it.  In any case, as it stands with the
>>> failing tests, XMatchStep has reasonable coverage, though a few more
>>> targeted tests to cover those areas would give me more confidence in a +1
>>> for GA.
>>> 
>>> On Wed, Jun 17, 2015 at 9:37 PM, Matthias Broecheler <me...@matthiasb.com>
>>> wrote:
>>> 
>>>> +1 - I like the consolidation around the different traversal concepts and
>>>> the resulting simplification where execution is more orthogonal from
>>>> optimization.
>>>> 
>>>> On Wed, Jun 17, 2015 at 5:03 PM Joshua Shinavier <jo...@fortytwo.net>
>>>> wrote:
>>>> 
>>>>> No objection.  The code is simpler, and it seems like a good course
>>>>> correction w.r.t. traversal optimization.  Might be possible to drop in
>>>>> something close to the current MatchStep (*) as an alternative if
>>>> desired.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Josh
>>>>> 
>>>>> *) basically, keep track of the inputs and outputs of each component
>>>>> traversal of the match, build a plan which minimizes the likely cost of
>>>>> running the next element through the plan, keep updating the plan
>>>>> 
>>>>> 
>>>>> On Wed, Jun 17, 2015 at 6:05 PM, Marko Rodriguez <ok...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> For the last week or so I've been working on XMatchStep w/ Kuppitz.
>>>> What
>>>>>> spurned this was my burning desire to get and/or/not settled by GA and
>>>>>> where()/match() are the culprits that leverage it the most. In fact,
>>>>> along
>>>>>> with XMatchStep, WhereStep has gotten a massive revamping for the
>>>> better.
>>>>>> Anywho, first off, XMatchStep:
>>>>>> 
>>>>>> ..simple, elegant -- easy, breezy, cover girl:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
>>>>>> 
>>>>>> It is now at a point where it is passing all the MatchTest tests  (save
>>>>>> one -- whose expected results are up to debate) and is comparable in
>>>>>> performance to the original MatchStep developed by Josh Shinavier
>>>> (cc:d)
>>>>>> which leverages Matthias Bröcheler's budget match algorithm.
>>>>>> 
>>>>>> There are a few things about XMatchStep that make it "better" than the
>>>>>> current/original MatchStep:
>>>>>> 
>>>>>>        1. Works for both OLAP and OLTP.
>>>>>>        2. Supports nested AND/OR'ing.
>>>>>>        3. Supports pluggable "match algorithms." (i.e. pattern
>>>> execution
>>>>>> plan)
>>>>>>        4. Maintains no runtime state (fully functional).
>>>>>>        5. Uses the more recent advances in TinkerPop's Step framework.
>>>>>>                e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
>>>>>>        6. Supports a broader range of legal patterns.
>>>>>>        7. Supports local barrier patterns (i.e. count(), min(), max(),
>>>>>> etc.).
>>>>>>        8. Supports sideEffects (i.e. as('a').within("x"))
>>>>>>        9. Supports pulling out HasContainers for GraphStep index
>>>>> lookups.
>>>>>>        10. Supports not'ing patterns [coming soon].
>>>>>> 
>>>>>> Below are the runtimes for some pattern matches over the Grateful Dead
>>>>>> graph. The xmatch()-times specified are using CountMatchAlgorithm which
>>>>> is
>>>>>> basically "dynamically sort the match pattern execution plan according
>>>> to
>>>>>> each pattern's set reduction abilities."
>>>>>> 
>>>>>>        MatchStep vs. XMatchStep:
>>>>>> https://gist.github.com/okram/d49e1abf48fdc18f77f9
>>>>>> 
>>>>>>        CountMatchAlgorithm:
>>>>>> 
>>>>> 
>>>> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
>>>>>> 
>>>>>> As it stands, it seems that the simple CountMatchAlgorithm is
>>>> sufficient
>>>>>> to perform better than MatchStep's budget match implementation.
>>>> However,
>>>>> as
>>>>>> we do more queries and learn more tricks we can always add more match
>>>>>> algorithms. Easy breezy -- yes, cover girl.
>>>>>> 
>>>>>> In sum total, I would like to replace MatchStep with XMatchStep for GA.
>>>>> If
>>>>>> anyone has any thoughts/arguments on the matter, please state them.
>>>>>> 
>>>>>> Thank you,
>>>>>> Marko.
>>>>>> 
>>>>>> http://markorodriguez.com
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> 


Re: XMatchStep -- A New MatchStep for GA?

Posted by Joshua Shinavier <jo...@fortytwo.net>.
On Thu, Jun 18, 2015 at 10:56 AM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi everyone,
>
> @Josh -- thanks for the review. Appreciate it. Also, if you would like to
> collaborate on adding BudgetMatchAlgorithm, that would be sweet. Here is
> the current MatchAlgorithm interface. Note that for OLAP, the
> MatchAlgorithm's state is not global to the graph, but local to the
> subgraph partition (i.e. the worker).
>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java#L402-L422
> (basically recordStart and recordEnd are called before and after a
> traverser is pushed into a pattern).
>


I can't contribute much until August, yet I will note that old MatchStep
(adapted to the OLAP-compatible framework) may still be relevant if
optimization is a priority.  MatchStep was based on the overall idea of
BudgetMatch, but whereas BudgetMatch is very specifically designed for
pattern matching over labeled graphs, MatchStep also took into account the
other ways in which Gremlin traversals branch and filter, which have a
bearing on optimization.  recordStart() and recordEnd() seem to point in
that direction.

Josh




>
> @Matthias -- yea. if someone has a schema and knows the best way to
> execute the patterns for their data, then they can easily write their own
> MatchAlgorithm and use it. This gets back to the work Pieter Martin is
> wanting to do around using schema data to optimize traversals. We need to
> make a MatchAlgorithmStrategy that allows the user to define which strategy
> to use for the GraphTraversalSource -- right now, its just hardcoded to use
> CountMatchAlgorithm.
>
> @Stephen/everyone -- So in master/, I deleted Josh's MatchStep and
> replaced it with XMatchStep which is now called MatchStep :).
>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
> All the test cases pass for both OLTP and OLAP except for these two
> peculiar situations:
>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchTest.java#L190
>                 - Do we treat "predicate patterns" as filters only? The
> problem, this would require predicate patterns to NOT leave the StarGraph
> in OLAP. Decisions point. I will work with Kuppitz to see what he thinks is
> best. There is another related decision to be had around OR semantics ---
> to make it work in both OLAP and OLTP the semantics of OR are sorta crazy
> --- basically, "try all paths independently." Those that continue,
> continue…… eek?! This is the same problem of "predicate patterns." Without
> having a global blackboard, there is no way to know if a particular OR
> branch (or predicate pattern) failed….the traverser simply dies and the OR
> is never the wiser.
>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchTest.java#L308
>                 - The where()-clause when outside of match() is unable to
> get path metadata (i.e. name.equals("Garcia"). The solution is to fold
> where() into match() but I haven't updated MatchWhereStrategy yet… will do.
>
> Besides those two, all tests just passed and more that can now be solved
> (e.g. the one's that expected an illegal argument exception for being
> "cyclic.").
>
> I have some "play queries" in TinkerGraphTest that use AND/OR nesting and
> I will add those to MatchTest now.
>
> Thanks everyone,
> Marko.
>
> http://markorodriguez.com
>
> On Jun 18, 2015, at 7:20 AM, Marko Rodriguez <ok...@gmail.com> wrote:
>
> > Hi Stephen,
> >
> > There are 3 failing tests. 2 of them are because MatchStep expects a
> IllegalArgumentException to be thrown because MatchStep can't solve the
> pattern -- XMatch can. The other one is because of different number of
> results --- again, debatable on what to actually return as both XMatch and
> Match return the "right" results, just if the duplicates should be allowed
> or not.
> >
> > Marko.
> >
> > http://markorodriguez.com
> >
> > On Jun 18, 2015, at 4:44 AM, Stephen Mallette <sp...@gmail.com>
> wrote:
> >
> >> I wanted to see what kind of test coverage existed over XMatch so I
> quickly
> >> swapped xmatch for match.  I'm not sure if additional work needs to be
> >> done, but several tests don't pass under xmatch - perhaps the answer
> there
> >> is trivial when you dig into it.  In any case, as it stands with the
> >> failing tests, XMatchStep has reasonable coverage, though a few more
> >> targeted tests to cover those areas would give me more confidence in a
> +1
> >> for GA.
> >>
> >> On Wed, Jun 17, 2015 at 9:37 PM, Matthias Broecheler <me...@matthiasb.com>
> >> wrote:
> >>
> >>> +1 - I like the consolidation around the different traversal concepts
> and
> >>> the resulting simplification where execution is more orthogonal from
> >>> optimization.
> >>>
> >>> On Wed, Jun 17, 2015 at 5:03 PM Joshua Shinavier <jo...@fortytwo.net>
> >>> wrote:
> >>>
> >>>> No objection.  The code is simpler, and it seems like a good course
> >>>> correction w.r.t. traversal optimization.  Might be possible to drop
> in
> >>>> something close to the current MatchStep (*) as an alternative if
> >>> desired.
> >>>>
> >>>> Best,
> >>>>
> >>>> Josh
> >>>>
> >>>> *) basically, keep track of the inputs and outputs of each component
> >>>> traversal of the match, build a plan which minimizes the likely cost
> of
> >>>> running the next element through the plan, keep updating the plan
> >>>>
> >>>>
> >>>> On Wed, Jun 17, 2015 at 6:05 PM, Marko Rodriguez <
> okrammarko@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> For the last week or so I've been working on XMatchStep w/ Kuppitz.
> >>> What
> >>>>> spurned this was my burning desire to get and/or/not settled by GA
> and
> >>>>> where()/match() are the culprits that leverage it the most. In fact,
> >>>> along
> >>>>> with XMatchStep, WhereStep has gotten a massive revamping for the
> >>> better.
> >>>>> Anywho, first off, XMatchStep:
> >>>>>
> >>>>> ..simple, elegant -- easy, breezy, cover girl:
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
> >>>>>
> >>>>> It is now at a point where it is passing all the MatchTest tests
> (save
> >>>>> one -- whose expected results are up to debate) and is comparable in
> >>>>> performance to the original MatchStep developed by Josh Shinavier
> >>> (cc:d)
> >>>>> which leverages Matthias Bröcheler's budget match algorithm.
> >>>>>
> >>>>> There are a few things about XMatchStep that make it "better" than
> the
> >>>>> current/original MatchStep:
> >>>>>
> >>>>>        1. Works for both OLAP and OLTP.
> >>>>>        2. Supports nested AND/OR'ing.
> >>>>>        3. Supports pluggable "match algorithms." (i.e. pattern
> >>> execution
> >>>>> plan)
> >>>>>        4. Maintains no runtime state (fully functional).
> >>>>>        5. Uses the more recent advances in TinkerPop's Step
> framework.
> >>>>>                e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
> >>>>>        6. Supports a broader range of legal patterns.
> >>>>>        7. Supports local barrier patterns (i.e. count(), min(),
> max(),
> >>>>> etc.).
> >>>>>        8. Supports sideEffects (i.e. as('a').within("x"))
> >>>>>        9. Supports pulling out HasContainers for GraphStep index
> >>>> lookups.
> >>>>>        10. Supports not'ing patterns [coming soon].
> >>>>>
> >>>>> Below are the runtimes for some pattern matches over the Grateful
> Dead
> >>>>> graph. The xmatch()-times specified are using CountMatchAlgorithm
> which
> >>>> is
> >>>>> basically "dynamically sort the match pattern execution plan
> according
> >>> to
> >>>>> each pattern's set reduction abilities."
> >>>>>
> >>>>>        MatchStep vs. XMatchStep:
> >>>>> https://gist.github.com/okram/d49e1abf48fdc18f77f9
> >>>>>
> >>>>>        CountMatchAlgorithm:
> >>>>>
> >>>>
> >>>
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
> >>>>>
> >>>>> As it stands, it seems that the simple CountMatchAlgorithm is
> >>> sufficient
> >>>>> to perform better than MatchStep's budget match implementation.
> >>> However,
> >>>> as
> >>>>> we do more queries and learn more tricks we can always add more match
> >>>>> algorithms. Easy breezy -- yes, cover girl.
> >>>>>
> >>>>> In sum total, I would like to replace MatchStep with XMatchStep for
> GA.
> >>>> If
> >>>>> anyone has any thoughts/arguments on the matter, please state them.
> >>>>>
> >>>>> Thank you,
> >>>>> Marko.
> >>>>>
> >>>>> http://markorodriguez.com
> >>>>>
> >>>>>
> >>>>
> >>>
> >
>
>

Re: XMatchStep -- A New MatchStep for GA?

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi everyone,

@Josh -- thanks for the review. Appreciate it. Also, if you would like to collaborate on adding BudgetMatchAlgorithm, that would be sweet. Here is the current MatchAlgorithm interface. Note that for OLAP, the MatchAlgorithm's state is not global to the graph, but local to the subgraph partition (i.e. the worker).
	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java#L402-L422 (basically recordStart and recordEnd are called before and after a traverser is pushed into a pattern).

@Matthias -- yea. if someone has a schema and knows the best way to execute the patterns for their data, then they can easily write their own MatchAlgorithm and use it. This gets back to the work Pieter Martin is wanting to do around using schema data to optimize traversals. We need to make a MatchAlgorithmStrategy that allows the user to define which strategy to use for the GraphTraversalSource -- right now, its just hardcoded to use CountMatchAlgorithm.

@Stephen/everyone -- So in master/, I deleted Josh's MatchStep and replaced it with XMatchStep which is now called MatchStep :).
	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java
All the test cases pass for both OLTP and OLAP except for these two peculiar situations:
	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchTest.java#L190
		- Do we treat "predicate patterns" as filters only? The problem, this would require predicate patterns to NOT leave the StarGraph in OLAP. Decisions point. I will work with Kuppitz to see what he thinks is best. There is another related decision to be had around OR semantics --- to make it work in both OLAP and OLTP the semantics of OR are sorta crazy --- basically, "try all paths independently." Those that continue, continue…… eek?! This is the same problem of "predicate patterns." Without having a global blackboard, there is no way to know if a particular OR branch (or predicate pattern) failed….the traverser simply dies and the OR is never the wiser.
	https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchTest.java#L308
		- The where()-clause when outside of match() is unable to get path metadata (i.e. name.equals("Garcia"). The solution is to fold where() into match() but I haven't updated MatchWhereStrategy yet… will do. 

Besides those two, all tests just passed and more that can now be solved (e.g. the one's that expected an illegal argument exception for being "cyclic."). 

I have some "play queries" in TinkerGraphTest that use AND/OR nesting and I will add those to MatchTest now.

Thanks everyone,
Marko.

http://markorodriguez.com

On Jun 18, 2015, at 7:20 AM, Marko Rodriguez <ok...@gmail.com> wrote:

> Hi Stephen,
> 
> There are 3 failing tests. 2 of them are because MatchStep expects a IllegalArgumentException to be thrown because MatchStep can't solve the pattern -- XMatch can. The other one is because of different number of results --- again, debatable on what to actually return as both XMatch and Match return the "right" results, just if the duplicates should be allowed or not.
> 
> Marko.
> 
> http://markorodriguez.com
> 
> On Jun 18, 2015, at 4:44 AM, Stephen Mallette <sp...@gmail.com> wrote:
> 
>> I wanted to see what kind of test coverage existed over XMatch so I quickly
>> swapped xmatch for match.  I'm not sure if additional work needs to be
>> done, but several tests don't pass under xmatch - perhaps the answer there
>> is trivial when you dig into it.  In any case, as it stands with the
>> failing tests, XMatchStep has reasonable coverage, though a few more
>> targeted tests to cover those areas would give me more confidence in a +1
>> for GA.
>> 
>> On Wed, Jun 17, 2015 at 9:37 PM, Matthias Broecheler <me...@matthiasb.com>
>> wrote:
>> 
>>> +1 - I like the consolidation around the different traversal concepts and
>>> the resulting simplification where execution is more orthogonal from
>>> optimization.
>>> 
>>> On Wed, Jun 17, 2015 at 5:03 PM Joshua Shinavier <jo...@fortytwo.net>
>>> wrote:
>>> 
>>>> No objection.  The code is simpler, and it seems like a good course
>>>> correction w.r.t. traversal optimization.  Might be possible to drop in
>>>> something close to the current MatchStep (*) as an alternative if
>>> desired.
>>>> 
>>>> Best,
>>>> 
>>>> Josh
>>>> 
>>>> *) basically, keep track of the inputs and outputs of each component
>>>> traversal of the match, build a plan which minimizes the likely cost of
>>>> running the next element through the plan, keep updating the plan
>>>> 
>>>> 
>>>> On Wed, Jun 17, 2015 at 6:05 PM, Marko Rodriguez <ok...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> For the last week or so I've been working on XMatchStep w/ Kuppitz.
>>> What
>>>>> spurned this was my burning desire to get and/or/not settled by GA and
>>>>> where()/match() are the culprits that leverage it the most. In fact,
>>>> along
>>>>> with XMatchStep, WhereStep has gotten a massive revamping for the
>>> better.
>>>>> Anywho, first off, XMatchStep:
>>>>> 
>>>>> ..simple, elegant -- easy, breezy, cover girl:
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
>>>>> 
>>>>> It is now at a point where it is passing all the MatchTest tests  (save
>>>>> one -- whose expected results are up to debate) and is comparable in
>>>>> performance to the original MatchStep developed by Josh Shinavier
>>> (cc:d)
>>>>> which leverages Matthias Bröcheler's budget match algorithm.
>>>>> 
>>>>> There are a few things about XMatchStep that make it "better" than the
>>>>> current/original MatchStep:
>>>>> 
>>>>>        1. Works for both OLAP and OLTP.
>>>>>        2. Supports nested AND/OR'ing.
>>>>>        3. Supports pluggable "match algorithms." (i.e. pattern
>>> execution
>>>>> plan)
>>>>>        4. Maintains no runtime state (fully functional).
>>>>>        5. Uses the more recent advances in TinkerPop's Step framework.
>>>>>                e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
>>>>>        6. Supports a broader range of legal patterns.
>>>>>        7. Supports local barrier patterns (i.e. count(), min(), max(),
>>>>> etc.).
>>>>>        8. Supports sideEffects (i.e. as('a').within("x"))
>>>>>        9. Supports pulling out HasContainers for GraphStep index
>>>> lookups.
>>>>>        10. Supports not'ing patterns [coming soon].
>>>>> 
>>>>> Below are the runtimes for some pattern matches over the Grateful Dead
>>>>> graph. The xmatch()-times specified are using CountMatchAlgorithm which
>>>> is
>>>>> basically "dynamically sort the match pattern execution plan according
>>> to
>>>>> each pattern's set reduction abilities."
>>>>> 
>>>>>        MatchStep vs. XMatchStep:
>>>>> https://gist.github.com/okram/d49e1abf48fdc18f77f9
>>>>> 
>>>>>        CountMatchAlgorithm:
>>>>> 
>>>> 
>>> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
>>>>> 
>>>>> As it stands, it seems that the simple CountMatchAlgorithm is
>>> sufficient
>>>>> to perform better than MatchStep's budget match implementation.
>>> However,
>>>> as
>>>>> we do more queries and learn more tricks we can always add more match
>>>>> algorithms. Easy breezy -- yes, cover girl.
>>>>> 
>>>>> In sum total, I would like to replace MatchStep with XMatchStep for GA.
>>>> If
>>>>> anyone has any thoughts/arguments on the matter, please state them.
>>>>> 
>>>>> Thank you,
>>>>> Marko.
>>>>> 
>>>>> http://markorodriguez.com
>>>>> 
>>>>> 
>>>> 
>>> 
> 


Re: XMatchStep -- A New MatchStep for GA?

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi Stephen,

There are 3 failing tests. 2 of them are because MatchStep expects a IllegalArgumentException to be thrown because MatchStep can't solve the pattern -- XMatch can. The other one is because of different number of results --- again, debatable on what to actually return as both XMatch and Match return the "right" results, just if the duplicates should be allowed or not.

Marko.

http://markorodriguez.com

On Jun 18, 2015, at 4:44 AM, Stephen Mallette <sp...@gmail.com> wrote:

> I wanted to see what kind of test coverage existed over XMatch so I quickly
> swapped xmatch for match.  I'm not sure if additional work needs to be
> done, but several tests don't pass under xmatch - perhaps the answer there
> is trivial when you dig into it.  In any case, as it stands with the
> failing tests, XMatchStep has reasonable coverage, though a few more
> targeted tests to cover those areas would give me more confidence in a +1
> for GA.
> 
> On Wed, Jun 17, 2015 at 9:37 PM, Matthias Broecheler <me...@matthiasb.com>
> wrote:
> 
>> +1 - I like the consolidation around the different traversal concepts and
>> the resulting simplification where execution is more orthogonal from
>> optimization.
>> 
>> On Wed, Jun 17, 2015 at 5:03 PM Joshua Shinavier <jo...@fortytwo.net>
>> wrote:
>> 
>>> No objection.  The code is simpler, and it seems like a good course
>>> correction w.r.t. traversal optimization.  Might be possible to drop in
>>> something close to the current MatchStep (*) as an alternative if
>> desired.
>>> 
>>> Best,
>>> 
>>> Josh
>>> 
>>> *) basically, keep track of the inputs and outputs of each component
>>> traversal of the match, build a plan which minimizes the likely cost of
>>> running the next element through the plan, keep updating the plan
>>> 
>>> 
>>> On Wed, Jun 17, 2015 at 6:05 PM, Marko Rodriguez <ok...@gmail.com>
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> For the last week or so I've been working on XMatchStep w/ Kuppitz.
>> What
>>>> spurned this was my burning desire to get and/or/not settled by GA and
>>>> where()/match() are the culprits that leverage it the most. In fact,
>>> along
>>>> with XMatchStep, WhereStep has gotten a massive revamping for the
>> better.
>>>> Anywho, first off, XMatchStep:
>>>> 
>>>> ..simple, elegant -- easy, breezy, cover girl:
>>>> 
>>>> 
>>>> 
>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
>>>> 
>>>> It is now at a point where it is passing all the MatchTest tests  (save
>>>> one -- whose expected results are up to debate) and is comparable in
>>>> performance to the original MatchStep developed by Josh Shinavier
>> (cc:d)
>>>> which leverages Matthias Bröcheler's budget match algorithm.
>>>> 
>>>> There are a few things about XMatchStep that make it "better" than the
>>>> current/original MatchStep:
>>>> 
>>>>        1. Works for both OLAP and OLTP.
>>>>        2. Supports nested AND/OR'ing.
>>>>        3. Supports pluggable "match algorithms." (i.e. pattern
>> execution
>>>> plan)
>>>>        4. Maintains no runtime state (fully functional).
>>>>        5. Uses the more recent advances in TinkerPop's Step framework.
>>>>                e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
>>>>        6. Supports a broader range of legal patterns.
>>>>        7. Supports local barrier patterns (i.e. count(), min(), max(),
>>>> etc.).
>>>>        8. Supports sideEffects (i.e. as('a').within("x"))
>>>>        9. Supports pulling out HasContainers for GraphStep index
>>> lookups.
>>>>        10. Supports not'ing patterns [coming soon].
>>>> 
>>>> Below are the runtimes for some pattern matches over the Grateful Dead
>>>> graph. The xmatch()-times specified are using CountMatchAlgorithm which
>>> is
>>>> basically "dynamically sort the match pattern execution plan according
>> to
>>>> each pattern's set reduction abilities."
>>>> 
>>>>        MatchStep vs. XMatchStep:
>>>> https://gist.github.com/okram/d49e1abf48fdc18f77f9
>>>> 
>>>>        CountMatchAlgorithm:
>>>> 
>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
>>>> 
>>>> As it stands, it seems that the simple CountMatchAlgorithm is
>> sufficient
>>>> to perform better than MatchStep's budget match implementation.
>> However,
>>> as
>>>> we do more queries and learn more tricks we can always add more match
>>>> algorithms. Easy breezy -- yes, cover girl.
>>>> 
>>>> In sum total, I would like to replace MatchStep with XMatchStep for GA.
>>> If
>>>> anyone has any thoughts/arguments on the matter, please state them.
>>>> 
>>>> Thank you,
>>>> Marko.
>>>> 
>>>> http://markorodriguez.com
>>>> 
>>>> 
>>> 
>> 


Re: XMatchStep -- A New MatchStep for GA?

Posted by Stephen Mallette <sp...@gmail.com>.
I wanted to see what kind of test coverage existed over XMatch so I quickly
swapped xmatch for match.  I'm not sure if additional work needs to be
done, but several tests don't pass under xmatch - perhaps the answer there
is trivial when you dig into it.  In any case, as it stands with the
failing tests, XMatchStep has reasonable coverage, though a few more
targeted tests to cover those areas would give me more confidence in a +1
for GA.

On Wed, Jun 17, 2015 at 9:37 PM, Matthias Broecheler <me...@matthiasb.com>
wrote:

> +1 - I like the consolidation around the different traversal concepts and
> the resulting simplification where execution is more orthogonal from
> optimization.
>
> On Wed, Jun 17, 2015 at 5:03 PM Joshua Shinavier <jo...@fortytwo.net>
> wrote:
>
> > No objection.  The code is simpler, and it seems like a good course
> > correction w.r.t. traversal optimization.  Might be possible to drop in
> > something close to the current MatchStep (*) as an alternative if
> desired.
> >
> > Best,
> >
> > Josh
> >
> > *) basically, keep track of the inputs and outputs of each component
> > traversal of the match, build a plan which minimizes the likely cost of
> > running the next element through the plan, keep updating the plan
> >
> >
> > On Wed, Jun 17, 2015 at 6:05 PM, Marko Rodriguez <ok...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > For the last week or so I've been working on XMatchStep w/ Kuppitz.
> What
> > > spurned this was my burning desire to get and/or/not settled by GA and
> > > where()/match() are the culprits that leverage it the most. In fact,
> > along
> > > with XMatchStep, WhereStep has gotten a massive revamping for the
> better.
> > > Anywho, first off, XMatchStep:
> > >
> > > ..simple, elegant -- easy, breezy, cover girl:
> > >
> > >
> > >
> >
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
> > >
> > > It is now at a point where it is passing all the MatchTest tests  (save
> > > one -- whose expected results are up to debate) and is comparable in
> > > performance to the original MatchStep developed by Josh Shinavier
> (cc:d)
> > > which leverages Matthias Bröcheler's budget match algorithm.
> > >
> > > There are a few things about XMatchStep that make it "better" than the
> > > current/original MatchStep:
> > >
> > >         1. Works for both OLAP and OLTP.
> > >         2. Supports nested AND/OR'ing.
> > >         3. Supports pluggable "match algorithms." (i.e. pattern
> execution
> > > plan)
> > >         4. Maintains no runtime state (fully functional).
> > >         5. Uses the more recent advances in TinkerPop's Step framework.
> > >                 e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
> > >         6. Supports a broader range of legal patterns.
> > >         7. Supports local barrier patterns (i.e. count(), min(), max(),
> > > etc.).
> > >         8. Supports sideEffects (i.e. as('a').within("x"))
> > >         9. Supports pulling out HasContainers for GraphStep index
> > lookups.
> > >         10. Supports not'ing patterns [coming soon].
> > >
> > > Below are the runtimes for some pattern matches over the Grateful Dead
> > > graph. The xmatch()-times specified are using CountMatchAlgorithm which
> > is
> > > basically "dynamically sort the match pattern execution plan according
> to
> > > each pattern's set reduction abilities."
> > >
> > >         MatchStep vs. XMatchStep:
> > > https://gist.github.com/okram/d49e1abf48fdc18f77f9
> > >
> > >         CountMatchAlgorithm:
> > >
> >
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
> > >
> > > As it stands, it seems that the simple CountMatchAlgorithm is
> sufficient
> > > to perform better than MatchStep's budget match implementation.
> However,
> > as
> > > we do more queries and learn more tricks we can always add more match
> > > algorithms. Easy breezy -- yes, cover girl.
> > >
> > > In sum total, I would like to replace MatchStep with XMatchStep for GA.
> > If
> > > anyone has any thoughts/arguments on the matter, please state them.
> > >
> > > Thank you,
> > > Marko.
> > >
> > > http://markorodriguez.com
> > >
> > >
> >
>

Re: XMatchStep -- A New MatchStep for GA?

Posted by Matthias Broecheler <me...@matthiasb.com>.
+1 - I like the consolidation around the different traversal concepts and
the resulting simplification where execution is more orthogonal from
optimization.

On Wed, Jun 17, 2015 at 5:03 PM Joshua Shinavier <jo...@fortytwo.net> wrote:

> No objection.  The code is simpler, and it seems like a good course
> correction w.r.t. traversal optimization.  Might be possible to drop in
> something close to the current MatchStep (*) as an alternative if desired.
>
> Best,
>
> Josh
>
> *) basically, keep track of the inputs and outputs of each component
> traversal of the match, build a plan which minimizes the likely cost of
> running the next element through the plan, keep updating the plan
>
>
> On Wed, Jun 17, 2015 at 6:05 PM, Marko Rodriguez <ok...@gmail.com>
> wrote:
>
> > Hello,
> >
> > For the last week or so I've been working on XMatchStep w/ Kuppitz. What
> > spurned this was my burning desire to get and/or/not settled by GA and
> > where()/match() are the culprits that leverage it the most. In fact,
> along
> > with XMatchStep, WhereStep has gotten a massive revamping for the better.
> > Anywho, first off, XMatchStep:
> >
> > ..simple, elegant -- easy, breezy, cover girl:
> >
> >
> >
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
> >
> > It is now at a point where it is passing all the MatchTest tests  (save
> > one -- whose expected results are up to debate) and is comparable in
> > performance to the original MatchStep developed by Josh Shinavier (cc:d)
> > which leverages Matthias Bröcheler's budget match algorithm.
> >
> > There are a few things about XMatchStep that make it "better" than the
> > current/original MatchStep:
> >
> >         1. Works for both OLAP and OLTP.
> >         2. Supports nested AND/OR'ing.
> >         3. Supports pluggable "match algorithms." (i.e. pattern execution
> > plan)
> >         4. Maintains no runtime state (fully functional).
> >         5. Uses the more recent advances in TinkerPop's Step framework.
> >                 e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
> >         6. Supports a broader range of legal patterns.
> >         7. Supports local barrier patterns (i.e. count(), min(), max(),
> > etc.).
> >         8. Supports sideEffects (i.e. as('a').within("x"))
> >         9. Supports pulling out HasContainers for GraphStep index
> lookups.
> >         10. Supports not'ing patterns [coming soon].
> >
> > Below are the runtimes for some pattern matches over the Grateful Dead
> > graph. The xmatch()-times specified are using CountMatchAlgorithm which
> is
> > basically "dynamically sort the match pattern execution plan according to
> > each pattern's set reduction abilities."
> >
> >         MatchStep vs. XMatchStep:
> > https://gist.github.com/okram/d49e1abf48fdc18f77f9
> >
> >         CountMatchAlgorithm:
> >
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
> >
> > As it stands, it seems that the simple CountMatchAlgorithm is sufficient
> > to perform better than MatchStep's budget match implementation. However,
> as
> > we do more queries and learn more tricks we can always add more match
> > algorithms. Easy breezy -- yes, cover girl.
> >
> > In sum total, I would like to replace MatchStep with XMatchStep for GA.
> If
> > anyone has any thoughts/arguments on the matter, please state them.
> >
> > Thank you,
> > Marko.
> >
> > http://markorodriguez.com
> >
> >
>

Re: XMatchStep -- A New MatchStep for GA?

Posted by Joshua Shinavier <jo...@fortytwo.net>.
No objection.  The code is simpler, and it seems like a good course
correction w.r.t. traversal optimization.  Might be possible to drop in
something close to the current MatchStep (*) as an alternative if desired.

Best,

Josh

*) basically, keep track of the inputs and outputs of each component
traversal of the match, build a plan which minimizes the likely cost of
running the next element through the plan, keep updating the plan


On Wed, Jun 17, 2015 at 6:05 PM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> For the last week or so I've been working on XMatchStep w/ Kuppitz. What
> spurned this was my burning desire to get and/or/not settled by GA and
> where()/match() are the culprits that leverage it the most. In fact, along
> with XMatchStep, WhereStep has gotten a massive revamping for the better.
> Anywho, first off, XMatchStep:
>
> ..simple, elegant -- easy, breezy, cover girl:
>
>
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java
>
> It is now at a point where it is passing all the MatchTest tests  (save
> one -- whose expected results are up to debate) and is comparable in
> performance to the original MatchStep developed by Josh Shinavier (cc:d)
> which leverages Matthias Bröcheler's budget match algorithm.
>
> There are a few things about XMatchStep that make it "better" than the
> current/original MatchStep:
>
>         1. Works for both OLAP and OLTP.
>         2. Supports nested AND/OR'ing.
>         3. Supports pluggable "match algorithms." (i.e. pattern execution
> plan)
>         4. Maintains no runtime state (fully functional).
>         5. Uses the more recent advances in TinkerPop's Step framework.
>                 e.g. Scoping, ComputerAwareStep, ConjunctionStep, etc.
>         6. Supports a broader range of legal patterns.
>         7. Supports local barrier patterns (i.e. count(), min(), max(),
> etc.).
>         8. Supports sideEffects (i.e. as('a').within("x"))
>         9. Supports pulling out HasContainers for GraphStep index lookups.
>         10. Supports not'ing patterns [coming soon].
>
> Below are the runtimes for some pattern matches over the Grateful Dead
> graph. The xmatch()-times specified are using CountMatchAlgorithm which is
> basically "dynamically sort the match pattern execution plan according to
> each pattern's set reduction abilities."
>
>         MatchStep vs. XMatchStep:
> https://gist.github.com/okram/d49e1abf48fdc18f77f9
>
>         CountMatchAlgorithm:
> https://github.com/apache/incubator-tinkerpop/blob/ebb0fd69821857ecefecf9222983c2bd0e189650/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/exp/XMatchStep.java#L451
>
> As it stands, it seems that the simple CountMatchAlgorithm is sufficient
> to perform better than MatchStep's budget match implementation. However, as
> we do more queries and learn more tricks we can always add more match
> algorithms. Easy breezy -- yes, cover girl.
>
> In sum total, I would like to replace MatchStep with XMatchStep for GA. If
> anyone has any thoughts/arguments on the matter, please state them.
>
> Thank you,
> Marko.
>
> http://markorodriguez.com
>
>