You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Viliam Durina <vi...@hazelcast.com> on 2021/10/02 18:20:52 UTC

Re: [DISCUSS] Syntax upgrade about Session Window Table Function

Btw, the table argument, according to the sql standard, must be in
parentheses, like this:

SELECT *
FROM TABLE(SESSION(TABLE(input_table), ...

When doing a breaking change, we should also consider this.

Viliam

On Thu, 30 Sept 2021 at 18:11, Julian Hyde <jh...@gmail.com> wrote:

> Thanks for the examples. The PARTITION BY syntax is a clear improvement
> for the SESSION function and I think we should do it, even though it is
> breaking.
>
> I’ll make further comments against
> https://issues.apache.org/jira/browse/CALCITE-4337 <
> https://issues.apache.org/jira/browse/CALCITE-4337>.
>
> > On Sep 29, 2021, at 9:58 PM, JING ZHANG <be...@gmail.com> wrote:
> >
> > Hi Julian,
> > Thanks for your feedback, the suggestion is very helpful.
> > I've added the discussion content the CALCITE-4337
> > <https://issues.apache.org/jira/browse/CALCITE-4337> [1]. I would
> continue
> > later discussion in the JIRA case.
> > About an example of a query before and after the syntax change. I would
> use
> > the example in session table function document
> > <https://calcite.apache.org/docs/reference.html#session> [2].
> > Old syntax demo:
> >
> >> SELECT * FROM TABLE( SESSION( TABLE orders, DESCRIPTOR(rowtime),
> >> DESCRIPTOR(product), INTERVAL '20' MINUTE)); -- or with the named
> params --
> >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
> DATA
> >> => TABLE orders, TIMECOL => DESCRIPTOR(rowtime), KEY =>
> DESCRIPTOR(product
> >> ), SIZE => INTERVAL '20' MINUTE));
> >
> >
> > New syntax demo is as follows, the difference is use PARTITION BY clause
> to
> > replace KEY DESCRIPTOR.
> >
> >> SELECT * FROM TABLE( SESSION( TABLE orders PARTITION BY product,
> >> DESCRIPTOR(rowtime), INTERVAL '20' MINUTE)); -- or with the named
> params --
> >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
> DATA
> >> => TABLE orders PARTITION BY product, TIMECOL => DESCRIPTOR(rowtime),
> SIZE
> >> => INTERVAL '20' MINUTE));
> >
> >
> > Best,
> > JING ZHANG
> >
> > Julian Hyde <jh...@gmail.com> 于2021年9月30日周四 上午4:55写道:
> >
> >> Regarding changes to the syntax of the SESSION table function. I am open
> >> to this, even though it would be a breaking change. Can you give an
> example
> >> of a query before and after the syntax change?
> >>
> >> I would like to support the new PARTITIONED BY clause for table
> functions.
> >> I encourage you to make the change for table functions in general,
> before
> >> and separately from the change to the SESSION function and window
> functions.
> >>
> >> Please ensure that the discussion gets added to the JIRA case. It might
> be
> >> best if we continue discussion in the JIRA case.
> >>
> >> Julian
> >>
> >>
> >>> On Sep 28, 2021, at 10:28 PM, JING ZHANG <be...@gmail.com> wrote:
> >>>
> >>> Hi community,
> >>> I'm now working on CALCITE-4337
> >>> <https://issues.apache.org/jira/browse/CALCITE-4337> [1] which aims to
> >>> support PARTITION BY clause for table function argument.
> >>> I've submitted a pull request
> >>> <https://github.com/apache/calcite/pull/2524> [2],
> >>> thanks @Danny very much for review.
> >>> There are two points left which need more discussion. So I fire this
> >>> discussion in order to get more broader suggestions.
> >>> 1. SQL standard Polymorphic Table Functions
> >>> <
> >>
> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
> >>>
> >>> [3]
> >>> states:
> >>>
> >>>> Input tables have either row semantics or set semantics, as follows:
> >>>> a) Row semantics means that the the result of the PTF is decided on a
> >>>> row-by-row basis. As an extreme example, the DBMS could atomize the
> >> input
> >>>> table into individual rows, and send each single row to a different
> >> virtual
> >>>> processor.
> >>>> b) Set semantics means that the outcome of the function depends on how
> >> the
> >>>> data is partitioned. A partition may not be split across virtual
> >>>> processors, nor may a virtual processor handle more than one
> partition.
> >>>
> >>>
> >>> A SESSION window has an input table with set semantics which means it
> >>> requires a PARTITION BY clause.
> >>> The new syntax is conflict with current session window table function
> >>> syntax, please take a look at session table function
> >>> <https://calcite.apache.org/docs/reference.html#session> [4].
> >>> *Could we replace the old syntax directly, or take compatible into
> >>> consideration.*
> >>> 2. Based on SQL standard, only input tables with set semantics may be
> >>> partitioned while input table with row semantics may not be
> partitioned.
> >>> *Should we have separate branch in Parser.jj for set semantic input
> table
> >>> of table function(Currently, only input table of session window table
> >>> function has set semantics)*?
> >>>
> >>> Any suggestion is appreciated. Thanks in advanced.
> >>> [1] https://issues.apache.org/jira/browse/CALCITE-4337
> >>> [2] https://github.com/apache/calcite/pull/2524
> >>> [3]
> >>>
> >>
> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
> >>> [4] https://calcite.apache.org/docs/reference.html#session
> >>>
> >>> Best
> >>> JING ZHANG
> >>
> >>
>
>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Re: [DISCUSS] Syntax upgrade about Session Window Table Function

Posted by JING ZHANG <be...@gmail.com>.
Hi Julian,
Thanks for responding in CALCITE-4337
<https://issues.apache.org/jira/browse/CALCITE-4337>.

> I find this feature easier to understand via syntax. Therefore I propose
> that we should implement this via changes to the DDL parser in the server
> module, including parser changes in server/.../config.fmpp and adding tests
> to ServerParserTest.java. What do you think?



*I think it's a good idea to update `server/.../config.fmpp ` in order to
support define a TableFunction. *For example, we could use the following
 SQL to define a Table Function named `TopNplus`.


CREATE FUNCTION TopNplus (
    Input TABLE NO PASS THROUGH WITH SET SEMANTICS PRUNE WHEN EMPTY,
    Howmany INTEGER
) RETURNS TABLE
NOT DETERMINISTIC
READS SQL DATA

The DDL defines the following characteristics:
1. Function name
2. Function arguments
3. Input table characteristics:

   - Input table semantics: ROW SEMANTICS or SET SEMANTICS, SET SEMANTICS
   is default behavior.
   - prunability : The characteristics is required if input table semantics
   is SET semantic. PTF can generate a result row on empty input, the table is
   said to be "keep when empty". The alternative is called "prune when
   empty".  The prunability characteristic is not applied for ROW SEMANTICS
   input table.


   - PASS-THROUGH COLUMNS: "Pass-through" columns is a mechanism enabling
   the PTF to copy every column of an input row into columns of an output row.
   The alternative is "NO PASS THROUGH".


*Besides, I still have three questions. *
1. Should we only define user defined table function based in DDL? Do we
need to define existed builtin window table function (e.g Tumbling Window
TVF/HOP Window TVF/Session Window TVF) in DDL or not?
2. I think we should also change query parser in Core module (Parser.jj),
right? A user should write the following SQL query to use the `TopNplus`
function defined in the above example. So we need to update query parser in
order to allow `PARTITION BY` clause and `ORDER BY` clause.

SELECT S.Region, T.*
FROM TABLE ( TopNplus ( Input => TABLE (My.Sales) AS S
                                   PARTITION BY Region
                                   ORDER BY Sales DESC,
                        Howmany => 3
           )
) AS T

3. In addition, we also need validate whether PTF query is valid. For
example, throw exception if define `PARTITION BY` clause or `ORDER BY`
clause for input table parameter with ROW SEMANTICS.

*What do you think?*


Best,

JING ZHANG



JING ZHANG <be...@gmail.com> 于2021年10月18日周一 下午4:25写道:

> Hi Julian, Viliam,
> Thanks for advises in CALCITE-4337
> <https://issues.apache.org/jira/browse/CALCITE-4337>.
> Please have another look at CALCITE-4337
> <https://issues.apache.org/jira/browse/CALCITE-4337>.
> If there is no objections, I would continue develop work based on
> discussion conclusion.
>
> Best wishes,
> JING ZHANG
>
> JING ZHANG <be...@gmail.com> 于2021年10月12日周二 下午5:59写道:
>
>> Hi Julian,
>> Thanks very much for professional comments in CALCITE-4337.
>> I have checked the SQL standard and some database vendors behavior on PTF
>> in order to figure out your questions. I left the message in
>> https://issues.apache.org/jira/browse/CALCITE-4337.
>> Please correct me if I'm wrong. Thanks a lot.
>>
>> Best,
>> JING ZHANG
>>
>> JING ZHANG <be...@gmail.com> 于2021年10月11日周一 上午11:25写道:
>>
>>> Sorry for late reply because we were in a vocation holiday.
>>>
>>> @Julian
>>>
>>>> Thanks for the examples. The PARTITION BY syntax is a clear improvement
>>>> for the SESSION function and I think we should do it, even though it is
>>>> breaking.
>>>
>>> Thanks for great suggestion.
>>>
>>> I’ll make further comments against
>>>> https://issues.apache.org/jira/browse/CALCITE-4337 <
>>>> https://issues.apache.org/jira/browse/CALCITE-4337>
>>>>
>>> The further comments in JIRA is great and very professional. I need
>>> double check in the SQL standard for some points. Once I finish it, I would
>>> reply in the JIRA as soon as possible.
>>>
>>> @Viliam
>>>
>>>> the table argument, according to the sql standard, must be in
>>>> parentheses, like this:
>>>>
>>>> SELECT *
>>>> FROM TABLE(SESSION(TABLE(input_table), ...
>>>>
>>> Good point, I would keep it in mind.
>>>
>>> Best,
>>> JING ZHANG
>>>
>>>
>>> Viliam Durina <vi...@hazelcast.com> 于2021年10月3日周日 上午2:21写道:
>>>
>>>> Btw, the table argument, according to the sql standard, must be in
>>>> parentheses, like this:
>>>>
>>>> SELECT *
>>>> FROM TABLE(SESSION(TABLE(input_table), ...
>>>>
>>>> When doing a breaking change, we should also consider this.
>>>>
>>>> Viliam
>>>>
>>>> On Thu, 30 Sept 2021 at 18:11, Julian Hyde <jh...@gmail.com>
>>>> wrote:
>>>>
>>>> > Thanks for the examples. The PARTITION BY syntax is a clear
>>>> improvement
>>>> > for the SESSION function and I think we should do it, even though it
>>>> is
>>>> > breaking.
>>>> >
>>>> > I’ll make further comments against
>>>> > https://issues.apache.org/jira/browse/CALCITE-4337 <
>>>> > https://issues.apache.org/jira/browse/CALCITE-4337>.
>>>> >
>>>> > > On Sep 29, 2021, at 9:58 PM, JING ZHANG <be...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > Hi Julian,
>>>> > > Thanks for your feedback, the suggestion is very helpful.
>>>> > > I've added the discussion content the CALCITE-4337
>>>> > > <https://issues.apache.org/jira/browse/CALCITE-4337> [1]. I would
>>>> > continue
>>>> > > later discussion in the JIRA case.
>>>> > > About an example of a query before and after the syntax change. I
>>>> would
>>>> > use
>>>> > > the example in session table function document
>>>> > > <https://calcite.apache.org/docs/reference.html#session> [2].
>>>> > > Old syntax demo:
>>>> > >
>>>> > >> SELECT * FROM TABLE( SESSION( TABLE orders, DESCRIPTOR(rowtime),
>>>> > >> DESCRIPTOR(product), INTERVAL '20' MINUTE)); -- or with the named
>>>> > params --
>>>> > >> note: the DATA param must be the first SELECT * FROM TABLE(
>>>> SESSION(
>>>> > DATA
>>>> > >> => TABLE orders, TIMECOL => DESCRIPTOR(rowtime), KEY =>
>>>> > DESCRIPTOR(product
>>>> > >> ), SIZE => INTERVAL '20' MINUTE));
>>>> > >
>>>> > >
>>>> > > New syntax demo is as follows, the difference is use PARTITION BY
>>>> clause
>>>> > to
>>>> > > replace KEY DESCRIPTOR.
>>>> > >
>>>> > >> SELECT * FROM TABLE( SESSION( TABLE orders PARTITION BY product,
>>>> > >> DESCRIPTOR(rowtime), INTERVAL '20' MINUTE)); -- or with the named
>>>> > params --
>>>> > >> note: the DATA param must be the first SELECT * FROM TABLE(
>>>> SESSION(
>>>> > DATA
>>>> > >> => TABLE orders PARTITION BY product, TIMECOL =>
>>>> DESCRIPTOR(rowtime),
>>>> > SIZE
>>>> > >> => INTERVAL '20' MINUTE));
>>>> > >
>>>> > >
>>>> > > Best,
>>>> > > JING ZHANG
>>>> > >
>>>> > > Julian Hyde <jh...@gmail.com> 于2021年9月30日周四 上午4:55写道:
>>>> > >
>>>> > >> Regarding changes to the syntax of the SESSION table function. I
>>>> am open
>>>> > >> to this, even though it would be a breaking change. Can you give an
>>>> > example
>>>> > >> of a query before and after the syntax change?
>>>> > >>
>>>> > >> I would like to support the new PARTITIONED BY clause for table
>>>> > functions.
>>>> > >> I encourage you to make the change for table functions in general,
>>>> > before
>>>> > >> and separately from the change to the SESSION function and window
>>>> > functions.
>>>> > >>
>>>> > >> Please ensure that the discussion gets added to the JIRA case. It
>>>> might
>>>> > be
>>>> > >> best if we continue discussion in the JIRA case.
>>>> > >>
>>>> > >> Julian
>>>> > >>
>>>> > >>
>>>> > >>> On Sep 28, 2021, at 10:28 PM, JING ZHANG <be...@gmail.com>
>>>> wrote:
>>>> > >>>
>>>> > >>> Hi community,
>>>> > >>> I'm now working on CALCITE-4337
>>>> > >>> <https://issues.apache.org/jira/browse/CALCITE-4337> [1] which
>>>> aims to
>>>> > >>> support PARTITION BY clause for table function argument.
>>>> > >>> I've submitted a pull request
>>>> > >>> <https://github.com/apache/calcite/pull/2524> [2],
>>>> > >>> thanks @Danny very much for review.
>>>> > >>> There are two points left which need more discussion. So I fire
>>>> this
>>>> > >>> discussion in order to get more broader suggestions.
>>>> > >>> 1. SQL standard Polymorphic Table Functions
>>>> > >>> <
>>>> > >>
>>>> >
>>>> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
>>>> > >>>
>>>> > >>> [3]
>>>> > >>> states:
>>>> > >>>
>>>> > >>>> Input tables have either row semantics or set semantics, as
>>>> follows:
>>>> > >>>> a) Row semantics means that the the result of the PTF is decided
>>>> on a
>>>> > >>>> row-by-row basis. As an extreme example, the DBMS could atomize
>>>> the
>>>> > >> input
>>>> > >>>> table into individual rows, and send each single row to a
>>>> different
>>>> > >> virtual
>>>> > >>>> processor.
>>>> > >>>> b) Set semantics means that the outcome of the function depends
>>>> on how
>>>> > >> the
>>>> > >>>> data is partitioned. A partition may not be split across virtual
>>>> > >>>> processors, nor may a virtual processor handle more than one
>>>> > partition.
>>>> > >>>
>>>> > >>>
>>>> > >>> A SESSION window has an input table with set semantics which
>>>> means it
>>>> > >>> requires a PARTITION BY clause.
>>>> > >>> The new syntax is conflict with current session window table
>>>> function
>>>> > >>> syntax, please take a look at session table function
>>>> > >>> <https://calcite.apache.org/docs/reference.html#session> [4].
>>>> > >>> *Could we replace the old syntax directly, or take compatible into
>>>> > >>> consideration.*
>>>> > >>> 2. Based on SQL standard, only input tables with set semantics
>>>> may be
>>>> > >>> partitioned while input table with row semantics may not be
>>>> > partitioned.
>>>> > >>> *Should we have separate branch in Parser.jj for set semantic
>>>> input
>>>> > table
>>>> > >>> of table function(Currently, only input table of session window
>>>> table
>>>> > >>> function has set semantics)*?
>>>> > >>>
>>>> > >>> Any suggestion is appreciated. Thanks in advanced.
>>>> > >>> [1] https://issues.apache.org/jira/browse/CALCITE-4337
>>>> > >>> [2] https://github.com/apache/calcite/pull/2524
>>>> > >>> [3]
>>>> > >>>
>>>> > >>
>>>> >
>>>> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
>>>> > >>> [4] https://calcite.apache.org/docs/reference.html#session
>>>> > >>>
>>>> > >>> Best
>>>> > >>> JING ZHANG
>>>> > >>
>>>> > >>
>>>> >
>>>> >
>>>>
>>>> --
>>>> This message contains confidential information and is intended only for
>>>> the
>>>> individuals named. If you are not the named addressee you should not
>>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>>> immediately by e-mail if you have received this e-mail by mistake and
>>>> delete this e-mail from your system. E-mail transmission cannot be
>>>> guaranteed to be secure or error-free as information could be
>>>> intercepted,
>>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>>>> viruses.
>>>> The sender therefore does not accept liability for any errors or
>>>> omissions
>>>> in the contents of this message, which arise as a result of e-mail
>>>> transmission. If verification is required, please request a hard-copy
>>>> version. -Hazelcast
>>>>
>>>

Re: [DISCUSS] Syntax upgrade about Session Window Table Function

Posted by JING ZHANG <be...@gmail.com>.
Hi Julian, Viliam,
Thanks for advises in CALCITE-4337
<https://issues.apache.org/jira/browse/CALCITE-4337>.
Please have another look at CALCITE-4337
<https://issues.apache.org/jira/browse/CALCITE-4337>.
If there is no objections, I would continue develop work based on
discussion conclusion.

Best wishes,
JING ZHANG

JING ZHANG <be...@gmail.com> 于2021年10月12日周二 下午5:59写道:

> Hi Julian,
> Thanks very much for professional comments in CALCITE-4337.
> I have checked the SQL standard and some database vendors behavior on PTF
> in order to figure out your questions. I left the message in
> https://issues.apache.org/jira/browse/CALCITE-4337.
> Please correct me if I'm wrong. Thanks a lot.
>
> Best,
> JING ZHANG
>
> JING ZHANG <be...@gmail.com> 于2021年10月11日周一 上午11:25写道:
>
>> Sorry for late reply because we were in a vocation holiday.
>>
>> @Julian
>>
>>> Thanks for the examples. The PARTITION BY syntax is a clear improvement
>>> for the SESSION function and I think we should do it, even though it is
>>> breaking.
>>
>> Thanks for great suggestion.
>>
>> I’ll make further comments against
>>> https://issues.apache.org/jira/browse/CALCITE-4337 <
>>> https://issues.apache.org/jira/browse/CALCITE-4337>
>>>
>> The further comments in JIRA is great and very professional. I need
>> double check in the SQL standard for some points. Once I finish it, I would
>> reply in the JIRA as soon as possible.
>>
>> @Viliam
>>
>>> the table argument, according to the sql standard, must be in
>>> parentheses, like this:
>>>
>>> SELECT *
>>> FROM TABLE(SESSION(TABLE(input_table), ...
>>>
>> Good point, I would keep it in mind.
>>
>> Best,
>> JING ZHANG
>>
>>
>> Viliam Durina <vi...@hazelcast.com> 于2021年10月3日周日 上午2:21写道:
>>
>>> Btw, the table argument, according to the sql standard, must be in
>>> parentheses, like this:
>>>
>>> SELECT *
>>> FROM TABLE(SESSION(TABLE(input_table), ...
>>>
>>> When doing a breaking change, we should also consider this.
>>>
>>> Viliam
>>>
>>> On Thu, 30 Sept 2021 at 18:11, Julian Hyde <jh...@gmail.com>
>>> wrote:
>>>
>>> > Thanks for the examples. The PARTITION BY syntax is a clear improvement
>>> > for the SESSION function and I think we should do it, even though it is
>>> > breaking.
>>> >
>>> > I’ll make further comments against
>>> > https://issues.apache.org/jira/browse/CALCITE-4337 <
>>> > https://issues.apache.org/jira/browse/CALCITE-4337>.
>>> >
>>> > > On Sep 29, 2021, at 9:58 PM, JING ZHANG <be...@gmail.com>
>>> wrote:
>>> > >
>>> > > Hi Julian,
>>> > > Thanks for your feedback, the suggestion is very helpful.
>>> > > I've added the discussion content the CALCITE-4337
>>> > > <https://issues.apache.org/jira/browse/CALCITE-4337> [1]. I would
>>> > continue
>>> > > later discussion in the JIRA case.
>>> > > About an example of a query before and after the syntax change. I
>>> would
>>> > use
>>> > > the example in session table function document
>>> > > <https://calcite.apache.org/docs/reference.html#session> [2].
>>> > > Old syntax demo:
>>> > >
>>> > >> SELECT * FROM TABLE( SESSION( TABLE orders, DESCRIPTOR(rowtime),
>>> > >> DESCRIPTOR(product), INTERVAL '20' MINUTE)); -- or with the named
>>> > params --
>>> > >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
>>> > DATA
>>> > >> => TABLE orders, TIMECOL => DESCRIPTOR(rowtime), KEY =>
>>> > DESCRIPTOR(product
>>> > >> ), SIZE => INTERVAL '20' MINUTE));
>>> > >
>>> > >
>>> > > New syntax demo is as follows, the difference is use PARTITION BY
>>> clause
>>> > to
>>> > > replace KEY DESCRIPTOR.
>>> > >
>>> > >> SELECT * FROM TABLE( SESSION( TABLE orders PARTITION BY product,
>>> > >> DESCRIPTOR(rowtime), INTERVAL '20' MINUTE)); -- or with the named
>>> > params --
>>> > >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
>>> > DATA
>>> > >> => TABLE orders PARTITION BY product, TIMECOL =>
>>> DESCRIPTOR(rowtime),
>>> > SIZE
>>> > >> => INTERVAL '20' MINUTE));
>>> > >
>>> > >
>>> > > Best,
>>> > > JING ZHANG
>>> > >
>>> > > Julian Hyde <jh...@gmail.com> 于2021年9月30日周四 上午4:55写道:
>>> > >
>>> > >> Regarding changes to the syntax of the SESSION table function. I am
>>> open
>>> > >> to this, even though it would be a breaking change. Can you give an
>>> > example
>>> > >> of a query before and after the syntax change?
>>> > >>
>>> > >> I would like to support the new PARTITIONED BY clause for table
>>> > functions.
>>> > >> I encourage you to make the change for table functions in general,
>>> > before
>>> > >> and separately from the change to the SESSION function and window
>>> > functions.
>>> > >>
>>> > >> Please ensure that the discussion gets added to the JIRA case. It
>>> might
>>> > be
>>> > >> best if we continue discussion in the JIRA case.
>>> > >>
>>> > >> Julian
>>> > >>
>>> > >>
>>> > >>> On Sep 28, 2021, at 10:28 PM, JING ZHANG <be...@gmail.com>
>>> wrote:
>>> > >>>
>>> > >>> Hi community,
>>> > >>> I'm now working on CALCITE-4337
>>> > >>> <https://issues.apache.org/jira/browse/CALCITE-4337> [1] which
>>> aims to
>>> > >>> support PARTITION BY clause for table function argument.
>>> > >>> I've submitted a pull request
>>> > >>> <https://github.com/apache/calcite/pull/2524> [2],
>>> > >>> thanks @Danny very much for review.
>>> > >>> There are two points left which need more discussion. So I fire
>>> this
>>> > >>> discussion in order to get more broader suggestions.
>>> > >>> 1. SQL standard Polymorphic Table Functions
>>> > >>> <
>>> > >>
>>> >
>>> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
>>> > >>>
>>> > >>> [3]
>>> > >>> states:
>>> > >>>
>>> > >>>> Input tables have either row semantics or set semantics, as
>>> follows:
>>> > >>>> a) Row semantics means that the the result of the PTF is decided
>>> on a
>>> > >>>> row-by-row basis. As an extreme example, the DBMS could atomize
>>> the
>>> > >> input
>>> > >>>> table into individual rows, and send each single row to a
>>> different
>>> > >> virtual
>>> > >>>> processor.
>>> > >>>> b) Set semantics means that the outcome of the function depends
>>> on how
>>> > >> the
>>> > >>>> data is partitioned. A partition may not be split across virtual
>>> > >>>> processors, nor may a virtual processor handle more than one
>>> > partition.
>>> > >>>
>>> > >>>
>>> > >>> A SESSION window has an input table with set semantics which means
>>> it
>>> > >>> requires a PARTITION BY clause.
>>> > >>> The new syntax is conflict with current session window table
>>> function
>>> > >>> syntax, please take a look at session table function
>>> > >>> <https://calcite.apache.org/docs/reference.html#session> [4].
>>> > >>> *Could we replace the old syntax directly, or take compatible into
>>> > >>> consideration.*
>>> > >>> 2. Based on SQL standard, only input tables with set semantics may
>>> be
>>> > >>> partitioned while input table with row semantics may not be
>>> > partitioned.
>>> > >>> *Should we have separate branch in Parser.jj for set semantic input
>>> > table
>>> > >>> of table function(Currently, only input table of session window
>>> table
>>> > >>> function has set semantics)*?
>>> > >>>
>>> > >>> Any suggestion is appreciated. Thanks in advanced.
>>> > >>> [1] https://issues.apache.org/jira/browse/CALCITE-4337
>>> > >>> [2] https://github.com/apache/calcite/pull/2524
>>> > >>> [3]
>>> > >>>
>>> > >>
>>> >
>>> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
>>> > >>> [4] https://calcite.apache.org/docs/reference.html#session
>>> > >>>
>>> > >>> Best
>>> > >>> JING ZHANG
>>> > >>
>>> > >>
>>> >
>>> >
>>>
>>> --
>>> This message contains confidential information and is intended only for
>>> the
>>> individuals named. If you are not the named addressee you should not
>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>> immediately by e-mail if you have received this e-mail by mistake and
>>> delete this e-mail from your system. E-mail transmission cannot be
>>> guaranteed to be secure or error-free as information could be
>>> intercepted,
>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>>> viruses.
>>> The sender therefore does not accept liability for any errors or
>>> omissions
>>> in the contents of this message, which arise as a result of e-mail
>>> transmission. If verification is required, please request a hard-copy
>>> version. -Hazelcast
>>>
>>

Re: [DISCUSS] Syntax upgrade about Session Window Table Function

Posted by JING ZHANG <be...@gmail.com>.
Hi Julian,
Thanks very much for professional comments in CALCITE-4337.
I have checked the SQL standard and some database vendors behavior on PTF
in order to figure out your questions. I left the message in
https://issues.apache.org/jira/browse/CALCITE-4337.
Please correct me if I'm wrong. Thanks a lot.

Best,
JING ZHANG

JING ZHANG <be...@gmail.com> 于2021年10月11日周一 上午11:25写道:

> Sorry for late reply because we were in a vocation holiday.
>
> @Julian
>
>> Thanks for the examples. The PARTITION BY syntax is a clear improvement
>> for the SESSION function and I think we should do it, even though it is
>> breaking.
>
> Thanks for great suggestion.
>
> I’ll make further comments against
>> https://issues.apache.org/jira/browse/CALCITE-4337 <
>> https://issues.apache.org/jira/browse/CALCITE-4337>
>>
> The further comments in JIRA is great and very professional. I need double
> check in the SQL standard for some points. Once I finish it, I would reply
> in the JIRA as soon as possible.
>
> @Viliam
>
>> the table argument, according to the sql standard, must be in
>> parentheses, like this:
>>
>> SELECT *
>> FROM TABLE(SESSION(TABLE(input_table), ...
>>
> Good point, I would keep it in mind.
>
> Best,
> JING ZHANG
>
>
> Viliam Durina <vi...@hazelcast.com> 于2021年10月3日周日 上午2:21写道:
>
>> Btw, the table argument, according to the sql standard, must be in
>> parentheses, like this:
>>
>> SELECT *
>> FROM TABLE(SESSION(TABLE(input_table), ...
>>
>> When doing a breaking change, we should also consider this.
>>
>> Viliam
>>
>> On Thu, 30 Sept 2021 at 18:11, Julian Hyde <jh...@gmail.com>
>> wrote:
>>
>> > Thanks for the examples. The PARTITION BY syntax is a clear improvement
>> > for the SESSION function and I think we should do it, even though it is
>> > breaking.
>> >
>> > I’ll make further comments against
>> > https://issues.apache.org/jira/browse/CALCITE-4337 <
>> > https://issues.apache.org/jira/browse/CALCITE-4337>.
>> >
>> > > On Sep 29, 2021, at 9:58 PM, JING ZHANG <be...@gmail.com> wrote:
>> > >
>> > > Hi Julian,
>> > > Thanks for your feedback, the suggestion is very helpful.
>> > > I've added the discussion content the CALCITE-4337
>> > > <https://issues.apache.org/jira/browse/CALCITE-4337> [1]. I would
>> > continue
>> > > later discussion in the JIRA case.
>> > > About an example of a query before and after the syntax change. I
>> would
>> > use
>> > > the example in session table function document
>> > > <https://calcite.apache.org/docs/reference.html#session> [2].
>> > > Old syntax demo:
>> > >
>> > >> SELECT * FROM TABLE( SESSION( TABLE orders, DESCRIPTOR(rowtime),
>> > >> DESCRIPTOR(product), INTERVAL '20' MINUTE)); -- or with the named
>> > params --
>> > >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
>> > DATA
>> > >> => TABLE orders, TIMECOL => DESCRIPTOR(rowtime), KEY =>
>> > DESCRIPTOR(product
>> > >> ), SIZE => INTERVAL '20' MINUTE));
>> > >
>> > >
>> > > New syntax demo is as follows, the difference is use PARTITION BY
>> clause
>> > to
>> > > replace KEY DESCRIPTOR.
>> > >
>> > >> SELECT * FROM TABLE( SESSION( TABLE orders PARTITION BY product,
>> > >> DESCRIPTOR(rowtime), INTERVAL '20' MINUTE)); -- or with the named
>> > params --
>> > >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
>> > DATA
>> > >> => TABLE orders PARTITION BY product, TIMECOL => DESCRIPTOR(rowtime),
>> > SIZE
>> > >> => INTERVAL '20' MINUTE));
>> > >
>> > >
>> > > Best,
>> > > JING ZHANG
>> > >
>> > > Julian Hyde <jh...@gmail.com> 于2021年9月30日周四 上午4:55写道:
>> > >
>> > >> Regarding changes to the syntax of the SESSION table function. I am
>> open
>> > >> to this, even though it would be a breaking change. Can you give an
>> > example
>> > >> of a query before and after the syntax change?
>> > >>
>> > >> I would like to support the new PARTITIONED BY clause for table
>> > functions.
>> > >> I encourage you to make the change for table functions in general,
>> > before
>> > >> and separately from the change to the SESSION function and window
>> > functions.
>> > >>
>> > >> Please ensure that the discussion gets added to the JIRA case. It
>> might
>> > be
>> > >> best if we continue discussion in the JIRA case.
>> > >>
>> > >> Julian
>> > >>
>> > >>
>> > >>> On Sep 28, 2021, at 10:28 PM, JING ZHANG <be...@gmail.com>
>> wrote:
>> > >>>
>> > >>> Hi community,
>> > >>> I'm now working on CALCITE-4337
>> > >>> <https://issues.apache.org/jira/browse/CALCITE-4337> [1] which
>> aims to
>> > >>> support PARTITION BY clause for table function argument.
>> > >>> I've submitted a pull request
>> > >>> <https://github.com/apache/calcite/pull/2524> [2],
>> > >>> thanks @Danny very much for review.
>> > >>> There are two points left which need more discussion. So I fire this
>> > >>> discussion in order to get more broader suggestions.
>> > >>> 1. SQL standard Polymorphic Table Functions
>> > >>> <
>> > >>
>> >
>> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
>> > >>>
>> > >>> [3]
>> > >>> states:
>> > >>>
>> > >>>> Input tables have either row semantics or set semantics, as
>> follows:
>> > >>>> a) Row semantics means that the the result of the PTF is decided
>> on a
>> > >>>> row-by-row basis. As an extreme example, the DBMS could atomize the
>> > >> input
>> > >>>> table into individual rows, and send each single row to a different
>> > >> virtual
>> > >>>> processor.
>> > >>>> b) Set semantics means that the outcome of the function depends on
>> how
>> > >> the
>> > >>>> data is partitioned. A partition may not be split across virtual
>> > >>>> processors, nor may a virtual processor handle more than one
>> > partition.
>> > >>>
>> > >>>
>> > >>> A SESSION window has an input table with set semantics which means
>> it
>> > >>> requires a PARTITION BY clause.
>> > >>> The new syntax is conflict with current session window table
>> function
>> > >>> syntax, please take a look at session table function
>> > >>> <https://calcite.apache.org/docs/reference.html#session> [4].
>> > >>> *Could we replace the old syntax directly, or take compatible into
>> > >>> consideration.*
>> > >>> 2. Based on SQL standard, only input tables with set semantics may
>> be
>> > >>> partitioned while input table with row semantics may not be
>> > partitioned.
>> > >>> *Should we have separate branch in Parser.jj for set semantic input
>> > table
>> > >>> of table function(Currently, only input table of session window
>> table
>> > >>> function has set semantics)*?
>> > >>>
>> > >>> Any suggestion is appreciated. Thanks in advanced.
>> > >>> [1] https://issues.apache.org/jira/browse/CALCITE-4337
>> > >>> [2] https://github.com/apache/calcite/pull/2524
>> > >>> [3]
>> > >>>
>> > >>
>> >
>> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
>> > >>> [4] https://calcite.apache.org/docs/reference.html#session
>> > >>>
>> > >>> Best
>> > >>> JING ZHANG
>> > >>
>> > >>
>> >
>> >
>>
>> --
>> This message contains confidential information and is intended only for
>> the
>> individuals named. If you are not the named addressee you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately by e-mail if you have received this e-mail by mistake and
>> delete this e-mail from your system. E-mail transmission cannot be
>> guaranteed to be secure or error-free as information could be
>> intercepted,
>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>> viruses.
>> The sender therefore does not accept liability for any errors or
>> omissions
>> in the contents of this message, which arise as a result of e-mail
>> transmission. If verification is required, please request a hard-copy
>> version. -Hazelcast
>>
>

Re: [DISCUSS] Syntax upgrade about Session Window Table Function

Posted by JING ZHANG <be...@gmail.com>.
Sorry for late reply because we were in a vocation holiday.

@Julian

> Thanks for the examples. The PARTITION BY syntax is a clear improvement
> for the SESSION function and I think we should do it, even though it is
> breaking.

Thanks for great suggestion.

I’ll make further comments against
> https://issues.apache.org/jira/browse/CALCITE-4337 <
> https://issues.apache.org/jira/browse/CALCITE-4337>
>
The further comments in JIRA is great and very professional. I need double
check in the SQL standard for some points. Once I finish it, I would reply
in the JIRA as soon as possible.

@Viliam

> the table argument, according to the sql standard, must be in
> parentheses, like this:
>
> SELECT *
> FROM TABLE(SESSION(TABLE(input_table), ...
>
Good point, I would keep it in mind.

Best,
JING ZHANG


Viliam Durina <vi...@hazelcast.com> 于2021年10月3日周日 上午2:21写道:

> Btw, the table argument, according to the sql standard, must be in
> parentheses, like this:
>
> SELECT *
> FROM TABLE(SESSION(TABLE(input_table), ...
>
> When doing a breaking change, we should also consider this.
>
> Viliam
>
> On Thu, 30 Sept 2021 at 18:11, Julian Hyde <jh...@gmail.com> wrote:
>
> > Thanks for the examples. The PARTITION BY syntax is a clear improvement
> > for the SESSION function and I think we should do it, even though it is
> > breaking.
> >
> > I’ll make further comments against
> > https://issues.apache.org/jira/browse/CALCITE-4337 <
> > https://issues.apache.org/jira/browse/CALCITE-4337>.
> >
> > > On Sep 29, 2021, at 9:58 PM, JING ZHANG <be...@gmail.com> wrote:
> > >
> > > Hi Julian,
> > > Thanks for your feedback, the suggestion is very helpful.
> > > I've added the discussion content the CALCITE-4337
> > > <https://issues.apache.org/jira/browse/CALCITE-4337> [1]. I would
> > continue
> > > later discussion in the JIRA case.
> > > About an example of a query before and after the syntax change. I would
> > use
> > > the example in session table function document
> > > <https://calcite.apache.org/docs/reference.html#session> [2].
> > > Old syntax demo:
> > >
> > >> SELECT * FROM TABLE( SESSION( TABLE orders, DESCRIPTOR(rowtime),
> > >> DESCRIPTOR(product), INTERVAL '20' MINUTE)); -- or with the named
> > params --
> > >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
> > DATA
> > >> => TABLE orders, TIMECOL => DESCRIPTOR(rowtime), KEY =>
> > DESCRIPTOR(product
> > >> ), SIZE => INTERVAL '20' MINUTE));
> > >
> > >
> > > New syntax demo is as follows, the difference is use PARTITION BY
> clause
> > to
> > > replace KEY DESCRIPTOR.
> > >
> > >> SELECT * FROM TABLE( SESSION( TABLE orders PARTITION BY product,
> > >> DESCRIPTOR(rowtime), INTERVAL '20' MINUTE)); -- or with the named
> > params --
> > >> note: the DATA param must be the first SELECT * FROM TABLE( SESSION(
> > DATA
> > >> => TABLE orders PARTITION BY product, TIMECOL => DESCRIPTOR(rowtime),
> > SIZE
> > >> => INTERVAL '20' MINUTE));
> > >
> > >
> > > Best,
> > > JING ZHANG
> > >
> > > Julian Hyde <jh...@gmail.com> 于2021年9月30日周四 上午4:55写道:
> > >
> > >> Regarding changes to the syntax of the SESSION table function. I am
> open
> > >> to this, even though it would be a breaking change. Can you give an
> > example
> > >> of a query before and after the syntax change?
> > >>
> > >> I would like to support the new PARTITIONED BY clause for table
> > functions.
> > >> I encourage you to make the change for table functions in general,
> > before
> > >> and separately from the change to the SESSION function and window
> > functions.
> > >>
> > >> Please ensure that the discussion gets added to the JIRA case. It
> might
> > be
> > >> best if we continue discussion in the JIRA case.
> > >>
> > >> Julian
> > >>
> > >>
> > >>> On Sep 28, 2021, at 10:28 PM, JING ZHANG <be...@gmail.com>
> wrote:
> > >>>
> > >>> Hi community,
> > >>> I'm now working on CALCITE-4337
> > >>> <https://issues.apache.org/jira/browse/CALCITE-4337> [1] which aims
> to
> > >>> support PARTITION BY clause for table function argument.
> > >>> I've submitted a pull request
> > >>> <https://github.com/apache/calcite/pull/2524> [2],
> > >>> thanks @Danny very much for review.
> > >>> There are two points left which need more discussion. So I fire this
> > >>> discussion in order to get more broader suggestions.
> > >>> 1. SQL standard Polymorphic Table Functions
> > >>> <
> > >>
> >
> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
> > >>>
> > >>> [3]
> > >>> states:
> > >>>
> > >>>> Input tables have either row semantics or set semantics, as follows:
> > >>>> a) Row semantics means that the the result of the PTF is decided on
> a
> > >>>> row-by-row basis. As an extreme example, the DBMS could atomize the
> > >> input
> > >>>> table into individual rows, and send each single row to a different
> > >> virtual
> > >>>> processor.
> > >>>> b) Set semantics means that the outcome of the function depends on
> how
> > >> the
> > >>>> data is partitioned. A partition may not be split across virtual
> > >>>> processors, nor may a virtual processor handle more than one
> > partition.
> > >>>
> > >>>
> > >>> A SESSION window has an input table with set semantics which means it
> > >>> requires a PARTITION BY clause.
> > >>> The new syntax is conflict with current session window table function
> > >>> syntax, please take a look at session table function
> > >>> <https://calcite.apache.org/docs/reference.html#session> [4].
> > >>> *Could we replace the old syntax directly, or take compatible into
> > >>> consideration.*
> > >>> 2. Based on SQL standard, only input tables with set semantics may be
> > >>> partitioned while input table with row semantics may not be
> > partitioned.
> > >>> *Should we have separate branch in Parser.jj for set semantic input
> > table
> > >>> of table function(Currently, only input table of session window table
> > >>> function has set semantics)*?
> > >>>
> > >>> Any suggestion is appreciated. Thanks in advanced.
> > >>> [1] https://issues.apache.org/jira/browse/CALCITE-4337
> > >>> [2] https://github.com/apache/calcite/pull/2524
> > >>> [3]
> > >>>
> > >>
> >
> https://standards.iso.org/ittf/PubliclyAvailableStandards/c069776_ISO_IEC_TR_19075-7_2017.zip
> > >>> [4] https://calcite.apache.org/docs/reference.html#session
> > >>>
> > >>> Best
> > >>> JING ZHANG
> > >>
> > >>
> >
> >
>
> --
> This message contains confidential information and is intended only for
> the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast
>