You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Rui Wang <am...@apache.org> on 2019/09/25 20:53:24 UTC

[DISCUSS] [CALCITE-3340] Table Function TUMBLE as an operator in SqlStdOperator

Hi community,

In order to support TUMBLE as a TABLE function (see CALCITE-3272), the
first step is to add a new operator in SqlStdOperatorTable(which is
CALCITE-3340).  Note that new operator will need a name, which is the
syntax used in a query. For example, GROUP BY TUMBE(...) will lead to a
TUMBLE operator created with a name "TUMBLE".

Thus, for the new operator for TUMBLE as built-in table function, we also
need to specify a name for it. And there are options: still use "TUMBLE",
or use a new name, e.g. "TUMBLE_TABLE".

For readers' convenience, start from here I will name GROUP BY TUMBLE as
the old operator, and new table function one as the new operator.

Option 1 that reuse "TUMBLE" as new operator's name:
This is called operator overloading in Calcite because Calcite seems two
operators are overloading by checking their names. It turns out, unlike
other built-in operators, old TUMBLE is not recognized by Parser directly:
Parser leaves it as UnresolvedFunction to validator. Then it also turns out
validator is not favor of overloaded built-in operators: overloaded
built-in operators will lead to "function not found" because more than one
built-in operators are found from operator table for TUMBLE based on an
operator name lookup approach. If more than one built-in operator is found
by validator, validator will just not match either one.

I am now convinced by many of my exploration, testing, code reading and
prototyping that the resolution for option 1, is to code *both* old and new
TUMBLE direclty in Parser, and overrite "SqlOperator,deriveType" for *both*
old and new operators.

You could check one of my prototype:
https://github.com/apache/calcite/pull/1457


For option 2 that use a new name for new operator's name, e.g.
"TUMBLE_TALBE"
Because validator treats two operators as operator overloading if they have
the same name, using another name is ok. It would be very straighforward to
let the new operator go through parse, validate, rel steps if using a new
name (I have proven it in an old verrsion of PR1457
<https://github.com/apache/calcite/pull/1457> but unfortunetly I did a
force push so that version was overwritten and it cannot be seen now).



For me it seems like option 2 makes sense and require less work. However I
do want to hear more opinions from community. Please let me know if you
have any thought.


-Rui

Re: [DISCUSS] [CALCITE-3340] Table Function TUMBLE as an operator in SqlStdOperator

Posted by Rui Wang <ru...@google.com.INVALID>.

There are discussions happened in PR#1457 and we ended with a consensus:
combine both option 1 and option 2.

More specifically, because the limited operator namespace, we won't
overload operators (won't have two operators called "TUMBLE") and will only
give "TUMBLE" for the new operator. For the old one, it will be renamed,
likely to "TUMBLE_old". Meanwhile, I will add the old TUMBLE to parser and
let parser directly create an operator without caring about operator name.
By doing so GROUP BY TUMBLE will still remain being supported in the future
versions, even it's internal name is already changed.


-Rui

On Wed, Sep 25, 2019 at 1:53 PM Rui Wang <am...@apache.org> wrote:

> Hi community,
>
> In order to support TUMBLE as a TABLE function (see CALCITE-3272), the
> first step is to add a new operator in SqlStdOperatorTable(which is
> CALCITE-3340).  Note that new operator will need a name, which is the
> syntax used in a query. For example, GROUP BY TUMBE(...) will lead to a
> TUMBLE operator created with a name "TUMBLE".
>
> Thus, for the new operator for TUMBLE as built-in table function, we also
> need to specify a name for it. And there are options: still use "TUMBLE",
> or use a new name, e.g. "TUMBLE_TABLE".
>
> For readers' convenience, start from here I will name GROUP BY TUMBLE as
> the old operator, and new table function one as the new operator.
>
> Option 1 that reuse "TUMBLE" as new operator's name:
> This is called operator overloading in Calcite because Calcite seems two
> operators are overloading by checking their names. It turns out, unlike
> other built-in operators, old TUMBLE is not recognized by Parser directly:
> Parser leaves it as UnresolvedFunction to validator. Then it also turns out
> validator is not favor of overloaded built-in operators: overloaded
> built-in operators will lead to "function not found" because more than one
> built-in operators are found from operator table for TUMBLE based on an
> operator name lookup approach. If more than one built-in operator is found
> by validator, validator will just not match either one.
>
> I am now convinced by many of my exploration, testing, code reading and
> prototyping that the resolution for option 1, is to code *both* old and
> new TUMBLE direclty in Parser, and overrite "SqlOperator,deriveType" for
> *both* old and new operators.
>
> You could check one of my prototype:
> https://github.com/apache/calcite/pull/1457
>
>
> For option 2 that use a new name for new operator's name, e.g.
> "TUMBLE_TALBE"
> Because validator treats two operators as operator overloading if they
> have the same name, using another name is ok. It would be very
> straighforward to let the new operator go through parse, validate, rel
> steps if using a new name (I have proven it in an old verrsion of PR1457
> <https://github.com/apache/calcite/pull/1457> but unfortunetly I did a
> force push so that version was overwritten and it cannot be seen now).
>
>
>
> For me it seems like option 2 makes sense and require less work. However I
> do want to hear more opinions from community. Please let me know if you
> have any thought.
>
>
> -Rui
>