You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Julian Feinauer <j....@pragmaticminds.de> on 2019/01/09 17:13:22 UTC

Current State of MATCH_RECOGNIZE Implementation

Hi all,

as discussed in earlier exchanges (see [1]), I started to work on implementing MATCH_RECOGNIZE based on Julian (Hydes) work [2].
I think I made some progress and resolved some of the problems but I’m at a stage now where I’d need some advice from more seasoned Calcite dev’s on how to continue.

Basically, I improved the Matching (based on a DFA now) and ensured that we have the full path of symbols (not just rows) as base for the “Emitter”. The Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
I think we could handle them pretty easily as I introduced a “MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window around the current record (history and future). I think this should work here, as we have NO unbounded windows like for regular window functions.

So, basically the Matcher gets for each step a “Memory” Object which has a “get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) row.
So, an expression of the Form “PREV(UP.$4, $5)” should be converted to something like “row.get($5).$4”.
I have no real clue how to do this the “right” way, perhaps by a custom InputGetter which automatically introduces the “.get()”?
When this is implemented the Matcher should be finished (?).

Another thing the current implementation is missing is the ordering inside the partitions (which is similar to window functions). Do you think we can simply reuse the code from there?
Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW Function in the situation where one output row is generated for each input row, but I think this does not help us much as there  also is the other “MODE” where it outputs a single row for each (possibly arbitrary long) match.

Parallel to this mail I submitted a PR to merge my branch back to Julians work to have a common “checkpoint” for the next steps.
I would really value if someone could step in (Julian?) with either implement parts of the problems stated above or give me some hints on how to address this properly so that I can try to go further.

I don’t know what the usual way is but if it helps perhaps we can arrange a Screen sharing session or something to walk through the new code, if necessary.

Best
JulianF

[1] https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
[2] https://github.com/julianhyde/calcite/tree/1935-match-recognize

Re: Current State of MATCH_RECOGNIZE Implementation

Posted by Julian Hyde <jh...@apache.org>.

To generate the code you need, I think you need a sub-class of
InputGetter that is capable of looking at rows in the history. I
suggest

interface OffsetInputGetter extends InputGetter {
  InputGetter offset(int offset);
}

Thus, inputGetter.field(..., 1, ...) might generate "row.a", whereas
((OffsetInputGetter) inputGetter).offset(-2).field(..., 1, ...) might
generate "rows.get(rows.size() - 2).a".

Note that "row" is a variable holding the current row, and "rows" is a
variable holding all of the rows that went into the match. We already
have those variables in EnumerableMatch.implementEmitter, just no
means as yet to generate expressions from them.

Julian

On Wed, Jan 9, 2019 at 2:13 PM Julian Feinauer
<j....@pragmaticminds.de> wrote:
>
> Hi Julian,
>
> the (simple) example is not yet running.
> My main problem is how to implement the translation of a REX like “PREV(UP.$4, $5)” to an expression like “row.get($5).$4”.
> Then, simple queries should work.
>
> My main problem is that I do not know how to "swap" the PATTERN_INPUT_REF with the call to the PREV function during the RexToLix translation.
> If you give me a hint on how to implement this, I'll try to do that.
>
> JulianF
>
> Am 09.01.19, 19:23 schrieb "Julian Hyde" <jh...@apache.org>:
>
>     I saw your PR https://github.com/julianhyde/calcite/pull/16 <https://github.com/julianhyde/calcite/pull/16>. Can you please create a PR against Apache, then I’ll rebase it onto my 1935 branch.
>
>     Are we able to run a SQL query yet? My plan was to get a very basic query working end-to-end, then start adding features to the engine. I still think that’s a good plan.
>
>     Julian
>
>
>     > On Jan 9, 2019, at 9:13 AM, Julian Feinauer <j....@pragmaticminds.de> wrote:
>     >
>     > Hi all,
>     >
>     > as discussed in earlier exchanges (see [1]), I started to work on implementing MATCH_RECOGNIZE based on Julian (Hydes) work [2].
>     > I think I made some progress and resolved some of the problems but I’m at a stage now where I’d need some advice from more seasoned Calcite dev’s on how to continue.
>     >
>     > Basically, I improved the Matching (based on a DFA now) and ensured that we have the full path of symbols (not just rows) as base for the “Emitter”. The Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
>     > I think we could handle them pretty easily as I introduced a “MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window around the current record (history and future). I think this should work here, as we have NO unbounded windows like for regular window functions.
>     >
>     > So, basically the Matcher gets for each step a “Memory” Object which has a “get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) row.
>     > So, an expression of the Form “PREV(UP.$4, $5)” should be converted to something like “row.get($5).$4”.
>     > I have no real clue how to do this the “right” way, perhaps by a custom InputGetter which automatically introduces the “.get()”?
>     > When this is implemented the Matcher should be finished (?).
>     >
>     > Another thing the current implementation is missing is the ordering inside the partitions (which is similar to window functions). Do you think we can simply reuse the code from there?
>     > Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW Function in the situation where one output row is generated for each input row, but I think this does not help us much as there  also is the other “MODE” where it outputs a single row for each (possibly arbitrary long) match.
>     >
>     > Parallel to this mail I submitted a PR to merge my branch back to Julians work to have a common “checkpoint” for the next steps.
>     > I would really value if someone could step in (Julian?) with either implement parts of the problems stated above or give me some hints on how to address this properly so that I can try to go further.
>     >
>     > I don’t know what the usual way is but if it helps perhaps we can arrange a Screen sharing session or something to walk through the new code, if necessary.
>     >
>     > Best
>     > JulianF
>     >
>     > [1] https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
>     > [2] https://github.com/julianhyde/calcite/tree/1935-match-recognize
>
>
>

Re: Current State of MATCH_RECOGNIZE Implementation

Posted by Julian Feinauer <j....@pragmaticminds.de>.

Hi Julian,

the (simple) example is not yet running.
My main problem is how to implement the translation of a REX like “PREV(UP.$4, $5)” to an expression like “row.get($5).$4”.
Then, simple queries should work.

My main problem is that I do not know how to "swap" the PATTERN_INPUT_REF with the call to the PREV function during the RexToLix translation.
If you give me a hint on how to implement this, I'll try to do that.

JulianF

Am 09.01.19, 19:23 schrieb "Julian Hyde" <jh...@apache.org>:

    I saw your PR https://github.com/julianhyde/calcite/pull/16 <https://github.com/julianhyde/calcite/pull/16>. Can you please create a PR against Apache, then I’ll rebase it onto my 1935 branch.
    
    Are we able to run a SQL query yet? My plan was to get a very basic query working end-to-end, then start adding features to the engine. I still think that’s a good plan.
    
    Julian
    
    
    > On Jan 9, 2019, at 9:13 AM, Julian Feinauer <j....@pragmaticminds.de> wrote:
    > 
    > Hi all,
    > 
    > as discussed in earlier exchanges (see [1]), I started to work on implementing MATCH_RECOGNIZE based on Julian (Hydes) work [2].
    > I think I made some progress and resolved some of the problems but I’m at a stage now where I’d need some advice from more seasoned Calcite dev’s on how to continue.
    > 
    > Basically, I improved the Matching (based on a DFA now) and ensured that we have the full path of symbols (not just rows) as base for the “Emitter”. The Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
    > I think we could handle them pretty easily as I introduced a “MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window around the current record (history and future). I think this should work here, as we have NO unbounded windows like for regular window functions.
    > 
    > So, basically the Matcher gets for each step a “Memory” Object which has a “get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) row.
    > So, an expression of the Form “PREV(UP.$4, $5)” should be converted to something like “row.get($5).$4”.
    > I have no real clue how to do this the “right” way, perhaps by a custom InputGetter which automatically introduces the “.get()”?
    > When this is implemented the Matcher should be finished (?).
    > 
    > Another thing the current implementation is missing is the ordering inside the partitions (which is similar to window functions). Do you think we can simply reuse the code from there?
    > Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW Function in the situation where one output row is generated for each input row, but I think this does not help us much as there  also is the other “MODE” where it outputs a single row for each (possibly arbitrary long) match.
    > 
    > Parallel to this mail I submitted a PR to merge my branch back to Julians work to have a common “checkpoint” for the next steps.
    > I would really value if someone could step in (Julian?) with either implement parts of the problems stated above or give me some hints on how to address this properly so that I can try to go further.
    > 
    > I don’t know what the usual way is but if it helps perhaps we can arrange a Screen sharing session or something to walk through the new code, if necessary.
    > 
    > Best
    > JulianF
    > 
    > [1] https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
    > [2] https://github.com/julianhyde/calcite/tree/1935-match-recognize

Re: Current State of MATCH_RECOGNIZE Implementation

Posted by Julian Hyde <jh...@apache.org>.

I saw your PR https://github.com/julianhyde/calcite/pull/16 <https://github.com/julianhyde/calcite/pull/16>. Can you please create a PR against Apache, then I’ll rebase it onto my 1935 branch.

Are we able to run a SQL query yet? My plan was to get a very basic query working end-to-end, then start adding features to the engine. I still think that’s a good plan.

Julian


> On Jan 9, 2019, at 9:13 AM, Julian Feinauer <j....@pragmaticminds.de> wrote:
> 
> Hi all,
> 
> as discussed in earlier exchanges (see [1]), I started to work on implementing MATCH_RECOGNIZE based on Julian (Hydes) work [2].
> I think I made some progress and resolved some of the problems but I’m at a stage now where I’d need some advice from more seasoned Calcite dev’s on how to continue.
> 
> Basically, I improved the Matching (based on a DFA now) and ensured that we have the full path of symbols (not just rows) as base for the “Emitter”. The Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
> I think we could handle them pretty easily as I introduced a “MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window around the current record (history and future). I think this should work here, as we have NO unbounded windows like for regular window functions.
> 
> So, basically the Matcher gets for each step a “Memory” Object which has a “get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) row.
> So, an expression of the Form “PREV(UP.$4, $5)” should be converted to something like “row.get($5).$4”.
> I have no real clue how to do this the “right” way, perhaps by a custom InputGetter which automatically introduces the “.get()”?
> When this is implemented the Matcher should be finished (?).
> 
> Another thing the current implementation is missing is the ordering inside the partitions (which is similar to window functions). Do you think we can simply reuse the code from there?
> Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW Function in the situation where one output row is generated for each input row, but I think this does not help us much as there  also is the other “MODE” where it outputs a single row for each (possibly arbitrary long) match.
> 
> Parallel to this mail I submitted a PR to merge my branch back to Julians work to have a common “checkpoint” for the next steps.
> I would really value if someone could step in (Julian?) with either implement parts of the problems stated above or give me some hints on how to address this properly so that I can try to go further.
> 
> I don’t know what the usual way is but if it helps perhaps we can arrange a Screen sharing session or something to walk through the new code, if necessary.
> 
> Best
> JulianF
> 
> [1] https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
> [2] https://github.com/julianhyde/calcite/tree/1935-match-recognize