You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Jungtaek Lim <ka...@gmail.com> on 2016/09/26 00:24:21 UTC

Strange behavior on MAP and ARRAY type

Hi devs,

I'm experimenting with ARRAY and MAP type with Calcite's JaninoRexCompiler.
(with Calcite 1.9.0)

While testing I found some behaviors, so would like to see they're
intentional
or bugs we want to resolve.

1. Even though I set map's key/value type or array's type explicitly,
compiler ignores the value type and create assignment to Object.

This makes extracting value from nested collection not possible. Also
comparing extracted value with numeric constant, too.
(For example, suppose there's a row which MAPFIELD['a'] has 1 as value.
Writing MAPFIELD['a'] = 1 on where statement throwing an Exception saying
there's no SqlFunctions.eq(Object, int).

Btw, I took a breakpoint on BinaryImplementor.implement() to see variables
just
before compiler finds SqlFunctions.eq().
The type of expression is Object which is not generic, but seems like
matching
RexCall operand have value type information. We might create a quick fix
from
here (handling type conversion) but I'm not sure this can clearly resolve
this.

2. I just replace their type to ANY, and wrap expression with CAST like
MAPFIELD['a'] -> CAST(MAPFIELD['a'] AS INTEGER) = 1 and succeed.
(While using MAP type even I wrap to CAST, above error happens.)

But it doesn't work when MAPFIELD['a'] is null because CAST AS INTEGER
calls
SqlFunctions.toInt(null) which throws RuntimeException.
(It happens MAPFIELD['a'] doesn't have item which has 'a' as key. I guess
same
applies to null value. And also same applies to ARRAY, and
ARRAYFIELD[<non integer>].)

CAST(COALESCE(MAPFIELD['a'], -1) AS INTEGER) also throws Exception since
MAPFIELD['a'] is Object and -1 is int.
(CompileException : Incompatible expression types "java.lang.Object" and
"int")

3. ArrayOutOfIndexException is thrown when query is trying to access the
array
with out of index. I don't know there's a standard rule on this, but at
least
PostgreSQL doesn't throw an Exception but just treats it as null.
https://www.postgresql.org/docs/9.1/static/arrays.html
(This means the result of ITEM operator is nullable.)

Any ideas on these behaviors?

Thanks,
Jungtaek Lim (HeartSaVioR)

Re: New type of window semantics

Posted by Julian Hyde <jh...@gmail.com>.

As Fabian says, the key thing is to decide whether you want one-record-per-group or one-record-per-input-record. Only use GROUP BY if you want the former behavior.

Slides 16 onwards in http://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912 <http://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912> have some diagrams that may help understand the various kinds of windows.

Julian


> On Sep 27, 2016, at 4:30 AM, Fabian Hueske <fh...@gmail.com> wrote:
> 
> Sliding windows can be partitioned but this definition does not go into a
> GROUP BY clause but into a PARTITION BY clause added to OVER.
> GROUP BY returns one record per group which is not what you want.
> Check the syntax of the OVER clause, e.g., here [1] (see windowSpec at the
> end).
> 
> [1] http://calcite.apache.org/docs/reference.html
> 
> 
> 
> 2016-09-27 13:20 GMT+02:00 Radu Tudoran <ra...@huawei.com>:
> 
>> Hi,
>> 
>> Thanks for the answer.
>> As a follow up question - is it possible to use a GROUP BY clause after
>> the previous query?
>> 
>> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
>> 5 PRECEDING)
>>   FROM Orders
>>   GROUP BY order_type
>> 
>> 
>> I am asking this from the perspective of knowing if this would enable to
>> implement this over KeyedStream(s) such as the ones in Flink + the windows
>> 
>> 
>> 
>> -----Original Message-----
>> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> Sent: Tuesday, September 27, 2016 11:32 AM
>> To: dev@calcite.apache.org
>> Subject: Re: New type of window semantics
>> 
>> Hi Radu,
>> 
>> sliding windows as described by Julian will emit exactly one row for each
>> incoming row.
>> In the scenario you describe only one row will be emitted when ordN6
>> arrives (otherwise, each input row would result in five emitted rows).
>> 
>> So sliding windows seem to be what you are looking for.
>> 
>> Best, Fabian
>> 
>> 2016-09-27 10:59 GMT+02:00 Radu Tudoran <ra...@huawei.com>:
>> 
>>> Hi,
>>> 
>>> Thanks for this points.
>>> I am not sure if I really understood the implications of using this
>>> option in the stream mode. I got the point that if we have 20 rows
>>> then we have 20 outputs. However, I wonder what happens when a new
>>> record comes in and we have the query you proposed
>>> 
>>> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
>>> 5
>>> PRECEDING)
>>>  FROM Orders
>>> 
>>> 
>>> Assuming we have up to moment T te following 5 orders:
>>> 
>>> ordN1, ordN2, ordN3, ordN4, ordN5
>>> 
>>> and we get ordN6 at moment T+1
>>> ..will the query provide only one result corresponding to ordN6 and
>>> thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because
>>> ordN2 to ordN5 are still in the system the query will return 5 results?
>>> 
>>> 
>>> If the query answer is 1 output in this case corersponding to element
>>> ordN6 then indeed it can do the job for this scenario.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Julian Hyde [mailto:jhyde@apache.org]
>>> Sent: Tuesday, September 27, 2016 2:41 AM
>>> To: dev@calcite.apache.org
>>> Subject: Re: New type of window semantics
>>> 
>>> Have you considered the sliding window, which is already part of
>>> standard SQL?  We propose to support it in streaming SQL also. Here is
>> an example:
>>> 
>>>  SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
>>> PRECEDING)
>>>  FROM Orders
>>> 
>>> (This is a non-streaming query, but you can add the STREAM keyword to
>>> get a streaming query.)
>>> 
>>> Given orders 1 .. 20, then order 10 would show the average for orders 5
>> ..
>>> 10 inclusive, order 11 would show the average for orders 6 .. 11, and
>>> so forth.
>>> 
>>> In streaming queries, windows are often used in the GROUP BY clause,
>>> but we do not use a GROUP BY here. The OVER clause with sliding
>>> windows does not aggregate rows. If 20 rows come in, then 20 rows go
>>> out. It makes sense, because each row cannot have its own window if
>>> multiple rows are squashed into one.
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>>> On Sep 26, 2016, at 12:53 AM, Radu Tudoran <ra...@huawei.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> First of all let me introduce myself - My name is Radu Tudoran and I
>>>> am
>>> working in the field of Big Data processing with a high focus on
>>> streaming and more recently in the area of SQL.
>>>> 
>>>> I wanted to raise a question/proposal for discussion in the community:
>>>> 
>>>> Based on our requirements I realized that I would need to create a
>>> window (e.g. hop window) that would move on every incoming element based.
>>> The syntax that I have in mind for it is
>>>> 
>>>> HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be #
>>> ELEMENT instead of EVENT?)
>>>> 
>>>> I wanted to check with you what do you think about such a grammar to
>>>> go
>>> directly in Calcite? I think it is relevant for streaming scenarios
>>> where you do not necessary have events coming at regular time interval
>>> but you would still like to react on every event.
>>>> As an example you can consider a stock market application where you
>>> would always compute for every new offer the average over the last hour.
>>>> 
>>>> Best regards,
>>> 
>>> 
>>

Re: New type of window semantics

Posted by Fabian Hueske <fh...@gmail.com>.

Sliding windows can be partitioned but this definition does not go into a
GROUP BY clause but into a PARTITION BY clause added to OVER.
GROUP BY returns one record per group which is not what you want.
Check the syntax of the OVER clause, e.g., here [1] (see windowSpec at the
end).

[1] http://calcite.apache.org/docs/reference.html



2016-09-27 13:20 GMT+02:00 Radu Tudoran <ra...@huawei.com>:

> Hi,
>
> Thanks for the answer.
> As a follow up question - is it possible to use a GROUP BY clause after
> the previous query?
>
> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
>  5 PRECEDING)
>    FROM Orders
>    GROUP BY order_type
>
>
> I am asking this from the perspective of knowing if this would enable to
> implement this over KeyedStream(s) such as the ones in Flink + the windows
>
>
>
> -----Original Message-----
> From: Fabian Hueske [mailto:fhueske@gmail.com]
> Sent: Tuesday, September 27, 2016 11:32 AM
> To: dev@calcite.apache.org
> Subject: Re: New type of window semantics
>
> Hi Radu,
>
> sliding windows as described by Julian will emit exactly one row for each
> incoming row.
> In the scenario you describe only one row will be emitted when ordN6
> arrives (otherwise, each input row would result in five emitted rows).
>
> So sliding windows seem to be what you are looking for.
>
> Best, Fabian
>
> 2016-09-27 10:59 GMT+02:00 Radu Tudoran <ra...@huawei.com>:
>
> > Hi,
> >
> > Thanks for this points.
> > I am not sure if I really understood the implications of using this
> > option in the stream mode. I got the point that if we have 20 rows
> > then we have 20 outputs. However, I wonder what happens when a new
> > record comes in and we have the query you proposed
> >
> > SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
> > 5
> > PRECEDING)
> >   FROM Orders
> >
> >
> > Assuming we have up to moment T te following 5 orders:
> >
> > ordN1, ordN2, ordN3, ordN4, ordN5
> >
> > and we get ordN6 at moment T+1
> > ..will the query provide only one result corresponding to ordN6 and
> > thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because
> > ordN2 to ordN5 are still in the system the query will return 5 results?
> >
> >
> > If the query answer is 1 output in this case corersponding to element
> > ordN6 then indeed it can do the job for this scenario.
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Julian Hyde [mailto:jhyde@apache.org]
> > Sent: Tuesday, September 27, 2016 2:41 AM
> > To: dev@calcite.apache.org
> > Subject: Re: New type of window semantics
> >
> > Have you considered the sliding window, which is already part of
> > standard SQL?  We propose to support it in streaming SQL also. Here is
> an example:
> >
> >   SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
> > PRECEDING)
> >   FROM Orders
> >
> > (This is a non-streaming query, but you can add the STREAM keyword to
> > get a streaming query.)
> >
> > Given orders 1 .. 20, then order 10 would show the average for orders 5
> ..
> > 10 inclusive, order 11 would show the average for orders 6 .. 11, and
> > so forth.
> >
> > In streaming queries, windows are often used in the GROUP BY clause,
> > but we do not use a GROUP BY here. The OVER clause with sliding
> > windows does not aggregate rows. If 20 rows come in, then 20 rows go
> > out. It makes sense, because each row cannot have its own window if
> > multiple rows are squashed into one.
> >
> > Julian
> >
> >
> >
> > > On Sep 26, 2016, at 12:53 AM, Radu Tudoran <ra...@huawei.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > First of all let me introduce myself - My name is Radu Tudoran and I
> > > am
> > working in the field of Big Data processing with a high focus on
> > streaming and more recently in the area of SQL.
> > >
> > > I wanted to raise a question/proposal for discussion in the community:
> > >
> > > Based on our requirements I realized that I would need to create a
> > window (e.g. hop window) that would move on every incoming element based.
> > The syntax that I have in mind for it is
> > >
> > > HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be #
> > ELEMENT instead of EVENT?)
> > >
> > > I wanted to check with you what do you think about such a grammar to
> > > go
> > directly in Calcite? I think it is relevant for streaming scenarios
> > where you do not necessary have events coming at regular time interval
> > but you would still like to react on every event.
> > > As an example you can consider a stock market application where you
> > would always compute for every new offer the average over the last hour.
> > >
> > > Best regards,
> >
> >
>

RE: New type of window semantics

Posted by Radu Tudoran <ra...@huawei.com>.

Hi,

Thanks for the answer. 
As a follow up question - is it possible to use a GROUP BY clause after the previous query?

SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 
 5 PRECEDING)
   FROM Orders
   GROUP BY order_type


I am asking this from the perspective of knowing if this would enable to implement this over KeyedStream(s) such as the ones in Flink + the windows



-----Original Message-----
From: Fabian Hueske [mailto:fhueske@gmail.com] 
Sent: Tuesday, September 27, 2016 11:32 AM
To: dev@calcite.apache.org
Subject: Re: New type of window semantics

Hi Radu,

sliding windows as described by Julian will emit exactly one row for each incoming row.
In the scenario you describe only one row will be emitted when ordN6 arrives (otherwise, each input row would result in five emitted rows).

So sliding windows seem to be what you are looking for.

Best, Fabian

2016-09-27 10:59 GMT+02:00 Radu Tudoran <ra...@huawei.com>:

> Hi,
>
> Thanks for this points.
> I am not sure if I really understood the implications of using this 
> option in the stream mode. I got the point that if we have 20 rows 
> then we have 20 outputs. However, I wonder what happens when a new 
> record comes in and we have the query you proposed
>
> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 
> 5
> PRECEDING)
>   FROM Orders
>
>
> Assuming we have up to moment T te following 5 orders:
>
> ordN1, ordN2, ordN3, ordN4, ordN5
>
> and we get ordN6 at moment T+1
> ..will the query provide only one result corresponding to ordN6 and 
> thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because 
> ordN2 to ordN5 are still in the system the query will return 5 results?
>
>
> If the query answer is 1 output in this case corersponding to element
> ordN6 then indeed it can do the job for this scenario.
>
>
>
>
>
> -----Original Message-----
> From: Julian Hyde [mailto:jhyde@apache.org]
> Sent: Tuesday, September 27, 2016 2:41 AM
> To: dev@calcite.apache.org
> Subject: Re: New type of window semantics
>
> Have you considered the sliding window, which is already part of 
> standard SQL?  We propose to support it in streaming SQL also. Here is an example:
>
>   SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
> PRECEDING)
>   FROM Orders
>
> (This is a non-streaming query, but you can add the STREAM keyword to 
> get a streaming query.)
>
> Given orders 1 .. 20, then order 10 would show the average for orders 5 ..
> 10 inclusive, order 11 would show the average for orders 6 .. 11, and 
> so forth.
>
> In streaming queries, windows are often used in the GROUP BY clause, 
> but we do not use a GROUP BY here. The OVER clause with sliding 
> windows does not aggregate rows. If 20 rows come in, then 20 rows go 
> out. It makes sense, because each row cannot have its own window if 
> multiple rows are squashed into one.
>
> Julian
>
>
>
> > On Sep 26, 2016, at 12:53 AM, Radu Tudoran <ra...@huawei.com>
> wrote:
> >
> > Hi,
> >
> > First of all let me introduce myself - My name is Radu Tudoran and I 
> > am
> working in the field of Big Data processing with a high focus on 
> streaming and more recently in the area of SQL.
> >
> > I wanted to raise a question/proposal for discussion in the community:
> >
> > Based on our requirements I realized that I would need to create a
> window (e.g. hop window) that would move on every incoming element based.
> The syntax that I have in mind for it is
> >
> > HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be #
> ELEMENT instead of EVENT?)
> >
> > I wanted to check with you what do you think about such a grammar to 
> > go
> directly in Calcite? I think it is relevant for streaming scenarios 
> where you do not necessary have events coming at regular time interval 
> but you would still like to react on every event.
> > As an example you can consider a stock market application where you
> would always compute for every new offer the average over the last hour.
> >
> > Best regards,
>
>

Re: New type of window semantics

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Radu,

sliding windows as described by Julian will emit exactly one row for each
incoming row.
In the scenario you describe only one row will be emitted when ordN6
arrives (otherwise, each input row would result in five emitted rows).

So sliding windows seem to be what you are looking for.

Best, Fabian

2016-09-27 10:59 GMT+02:00 Radu Tudoran <ra...@huawei.com>:

> Hi,
>
> Thanks for this points.
> I am not sure if I really understood the implications of using this option
> in the stream mode. I got the point that if we have 20 rows then we have 20
> outputs. However, I wonder what happens when a new record comes in and we
> have the query you proposed
>
> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
> PRECEDING)
>   FROM Orders
>
>
> Assuming we have up to moment T te following 5 orders:
>
> ordN1, ordN2, ordN3, ordN4, ordN5
>
> and we get ordN6 at moment T+1
> ..will the query provide only one result corresponding to ordN6 and thus
> average over ordN2, ordN3, ordN4, ordN5, ordN6....or because ordN2 to ordN5
> are still in the system the query will return 5 results?
>
>
> If the query answer is 1 output in this case corersponding to element
> ordN6 then indeed it can do the job for this scenario.
>
>
>
>
>
> -----Original Message-----
> From: Julian Hyde [mailto:jhyde@apache.org]
> Sent: Tuesday, September 27, 2016 2:41 AM
> To: dev@calcite.apache.org
> Subject: Re: New type of window semantics
>
> Have you considered the sliding window, which is already part of standard
> SQL?  We propose to support it in streaming SQL also. Here is an example:
>
>   SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
> PRECEDING)
>   FROM Orders
>
> (This is a non-streaming query, but you can add the STREAM keyword to get
> a streaming query.)
>
> Given orders 1 .. 20, then order 10 would show the average for orders 5 ..
> 10 inclusive, order 11 would show the average for orders 6 .. 11, and so
> forth.
>
> In streaming queries, windows are often used in the GROUP BY clause, but
> we do not use a GROUP BY here. The OVER clause with sliding windows does
> not aggregate rows. If 20 rows come in, then 20 rows go out. It makes
> sense, because each row cannot have its own window if multiple rows are
> squashed into one.
>
> Julian
>
>
>
> > On Sep 26, 2016, at 12:53 AM, Radu Tudoran <ra...@huawei.com>
> wrote:
> >
> > Hi,
> >
> > First of all let me introduce myself - My name is Radu Tudoran and I am
> working in the field of Big Data processing with a high focus on streaming
> and more recently in the area of SQL.
> >
> > I wanted to raise a question/proposal for discussion in the community:
> >
> > Based on our requirements I realized that I would need to create a
> window (e.g. hop window) that would move on every incoming element based.
> The syntax that I have in mind for it is
> >
> > HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be #
> ELEMENT instead of EVENT?)
> >
> > I wanted to check with you what do you think about such a grammar to go
> directly in Calcite? I think it is relevant for streaming scenarios where
> you do not necessary have events coming at regular time interval but you
> would still like to react on every event.
> > As an example you can consider a stock market application where you
> would always compute for every new offer the average over the last hour.
> >
> > Best regards,
>
>

RE: New type of window semantics

Posted by Radu Tudoran <ra...@huawei.com>.

Hi,

Thanks for this points. 
I am not sure if I really understood the implications of using this option in the stream mode. I got the point that if we have 20 rows then we have 20 outputs. However, I wonder what happens when a new record comes in and we have the query you proposed

SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5 PRECEDING)
  FROM Orders

Assuming we have up to moment T te following 5 orders:

ordN1, ordN2, ordN3, ordN4, ordN5

and we get ordN6 at moment T+1
..will the query provide only one result corresponding to ordN6 and thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because ordN2 to ordN5 are still in the system the query will return 5 results?

If the query answer is 1 output in this case corersponding to element ordN6 then indeed it can do the job for this scenario. 

-----Original Message-----
From: Julian Hyde [mailto:jhyde@apache.org] 
Sent: Tuesday, September 27, 2016 2:41 AM
To: dev@calcite.apache.org
Subject: Re: New type of window semantics

Have you considered the sliding window, which is already part of standard SQL?  We propose to support it in streaming SQL also. Here is an example:

  SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5 PRECEDING)
  FROM Orders

(This is a non-streaming query, but you can add the STREAM keyword to get a streaming query.)

Given orders 1 .. 20, then order 10 would show the average for orders 5 .. 10 inclusive, order 11 would show the average for orders 6 .. 11, and so forth.

In streaming queries, windows are often used in the GROUP BY clause, but we do not use a GROUP BY here. The OVER clause with sliding windows does not aggregate rows. If 20 rows come in, then 20 rows go out. It makes sense, because each row cannot have its own window if multiple rows are squashed into one.

Julian

> On Sep 26, 2016, at 12:53 AM, Radu Tudoran <ra...@huawei.com> wrote:
> 
> Hi,
> 
> First of all let me introduce myself - My name is Radu Tudoran and I am working in the field of Big Data processing with a high focus on streaming and more recently in the area of SQL. 
> 
> I wanted to raise a question/proposal for discussion in the community:
> 
> Based on our requirements I realized that I would need to create a window (e.g. hop window) that would move on every incoming element based. The syntax that I have in mind for it is
> 
> HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be # ELEMENT instead of EVENT?)
> 
> I wanted to check with you what do you think about such a grammar to go directly in Calcite? I think it is relevant for streaming scenarios where you do not necessary have events coming at regular time interval but you would still like to react on every event. 
> As an example you can consider a stock market application where you would always compute for every new offer the average over the last hour.
> 
> Best regards,

Re: New type of window semantics

Posted by Julian Hyde <jh...@apache.org>.

Have you considered the sliding window, which is already part of standard SQL?  We propose to support it in streaming SQL also. Here is an example:

  SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5 PRECEDING)
  FROM Orders

(This is a non-streaming query, but you can add the STREAM keyword to get a streaming query.)

Given orders 1 .. 20, then order 10 would show the average for orders 5 .. 10 inclusive, order 11 would show the average for orders 6 .. 11, and so forth.

In streaming queries, windows are often used in the GROUP BY clause, but we do not use a GROUP BY here. The OVER clause with sliding windows does not aggregate rows. If 20 rows come in, then 20 rows go out. It makes sense, because each row cannot have its own window if multiple rows are squashed into one.

Julian



> On Sep 26, 2016, at 12:53 AM, Radu Tudoran <ra...@huawei.com> wrote:
> 
> Hi,
> 
> First of all let me introduce myself - My name is Radu Tudoran and I am working in the field of Big Data processing with a high focus on streaming and more recently in the area of SQL. 
> 
> I wanted to raise a question/proposal for discussion in the community:
> 
> Based on our requirements I realized that I would need to create a window (e.g. hop window) that would move on every incoming element based. The syntax that I have in mind for it is
> 
> HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be # ELEMENT instead of EVENT?)
> 
> I wanted to check with you what do you think about such a grammar to go directly in Calcite? I think it is relevant for streaming scenarios where you do not necessary have events coming at regular time interval but you would still like to react on every event. 
> As an example you can consider a stock market application where you would always compute for every new offer the average over the last hour.
> 
> Best regards,

New type of window semantics

Posted by Radu Tudoran <ra...@huawei.com>.

Hi,

First of all let me introduce myself - My name is Radu Tudoran and I am working in the field of Big Data processing with a high focus on streaming and more recently in the area of SQL. 

I wanted to raise a question/proposal for discussion in the community:

Based on our requirements I realized that I would need to create a window (e.g. hop window) that would move on every incoming element based. The syntax that I have in mind for it is
 
HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be # ELEMENT instead of EVENT?)
 
I wanted to check with you what do you think about such a grammar to go directly in Calcite? I think it is relevant for streaming scenarios where you do not necessary have events coming at regular time interval but you would still like to react on every event. 
As an example you can consider a stock market application where you would always compute for every new offer the average over the last hour.

Best regards,

Re: Strange behavior on MAP and ARRAY type

Posted by Jungtaek Lim <ka...@gmail.com>.

Thanks Julian.

FYI, I submitted a pull request <https://github.com/apache/calcite/pull/283>
for CALCITE-1386 <https://issues.apache.org/jira/browse/CALCITE-1386>, and
after some testing I found the patch covers CALCITE-1387
<https://issues.apache.org/jira/browse/CALCITE-1387>.

- Jungtaek Lim (HeartSaVioR)

2016년 9월 28일 (수) 오전 1:05, Julian Hyde <jh...@apache.org>님이 작성:

> The SQL standard is not available online. Do some creative searching
> (including on this list) and you might find it.
>
> Julian
>
>
> > On Sep 27, 2016, at 12:30 AM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > Thanks Julian for detailed explanation.
> > I filed issues CALCITE-1386
> > <https://issues.apache.org/jira/browse/CALCITE-1386> and CALCITE-1387
> > <https://issues.apache.org/jira/browse/CALCITE-1387> for 1 and 2
> > respectively.
> >
> > Btw, could you link which you referred the page? I can't find it.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 9월 27일 (화) 오후 12:50, Julian Hyde <jh...@apache.org>님이 작성:
> >
> >> Regarding 1 and 2: these sound like bugs. Can you please log a JIRA case
> >> for each? If you can supply a test case, even better.
> >>
> >> Regarding 3. I checked the SQL standard, and the required behavior is to
> >> throw an exception. (Maybe ArrayIndexOutOfBoundsException isn’t the best
> >> exception to throw.)
> >>
> >> Julian
> >>
> >>
> >>
> >> From SQL:2014...
> >>
> >> 6.24 <array element reference> Function
> >>
> >> Return an element of an array.
> >>
> >> Format
> >>
> >> <array element reference> ::=
> >>  <array value expression>
> >>      <left bracket or trigraph> <numeric value expression> <right
> bracket
> >> or trigraph>
> >> Syntax Rules
> >>
> >> 1)  The declared type of an <array element reference> is the element
> type
> >> of the speci ed <array value expression>.
> >>
> >> 2)  The declared type of <numeric value expression> shall be exact
> numeric
> >> with scale 0 (zero).
> >>
> >> Access Rules
> >>
> >> None.
> >>
> >> General Rules
> >>
> >> 1)  If the value of <array value expression> or <numeric value
> expression>
> >> is the null value, then the result of <array element reference> is the
> null
> >> value.
> >>
> >> 2)  Let i be the value of <numeric value expression>. Case:
> >>
> >> a)  If i is greater than zero and less than or equal to the cardinality
> of
> >> <array value expression>, then the result of <array element reference>
> is
> >> the value of the i-th element of the value of <array value expression>.
> >>
> >> b)  Otherwise, an exception condition is raised: data exception — array
> >> element error.
> >>
> >>
> >>
> >>
> >>> On Sep 25, 2016, at 5:24 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >>>
> >>> Hi devs,
> >>>
> >>> I'm experimenting with ARRAY and MAP type with Calcite's
> >> JaninoRexCompiler.
> >>> (with Calcite 1.9.0)
> >>>
> >>> While testing I found some behaviors, so would like to see they're
> >>> intentional
> >>> or bugs we want to resolve.
> >>>
> >>> 1. Even though I set map's key/value type or array's type explicitly,
> >>> compiler ignores the value type and create assignment to Object.
> >>>
> >>> This makes extracting value from nested collection not possible. Also
> >>> comparing extracted value with numeric constant, too.
> >>> (For example, suppose there's a row which MAPFIELD['a'] has 1 as value.
> >>> Writing MAPFIELD['a'] = 1 on where statement throwing an Exception
> saying
> >>> there's no SqlFunctions.eq(Object, int).
> >>>
> >>> Btw, I took a breakpoint on BinaryImplementor.implement() to see
> >> variables
> >>> just
> >>> before compiler finds SqlFunctions.eq().
> >>> The type of expression is Object which is not generic, but seems like
> >>> matching
> >>> RexCall operand have value type information. We might create a quick
> fix
> >>> from
> >>> here (handling type conversion) but I'm not sure this can clearly
> resolve
> >>> this.
> >>>
> >>> 2. I just replace their type to ANY, and wrap expression with CAST like
> >>> MAPFIELD['a'] -> CAST(MAPFIELD['a'] AS INTEGER) = 1 and succeed.
> >>> (While using MAP type even I wrap to CAST, above error happens.)
> >>>
> >>> But it doesn't work when MAPFIELD['a'] is null because CAST AS INTEGER
> >>> calls
> >>> SqlFunctions.toInt(null) which throws RuntimeException.
> >>> (It happens MAPFIELD['a'] doesn't have item which has 'a' as key. I
> guess
> >>> same
> >>> applies to null value. And also same applies to ARRAY, and
> >>> ARRAYFIELD[<non integer>].)
> >>>
> >>> CAST(COALESCE(MAPFIELD['a'], -1) AS INTEGER) also throws Exception
> since
> >>> MAPFIELD['a'] is Object and -1 is int.
> >>> (CompileException : Incompatible expression types "java.lang.Object"
> and
> >>> "int")
> >>>
> >>> 3. ArrayOutOfIndexException is thrown when query is trying to access
> the
> >>> array
> >>> with out of index. I don't know there's a standard rule on this, but at
> >>> least
> >>> PostgreSQL doesn't throw an Exception but just treats it as null.
> >>> https://www.postgresql.org/docs/9.1/static/arrays.html
> >>> (This means the result of ITEM operator is nullable.)
> >>>
> >>> Any ideas on these behaviors?
> >>>
> >>> Thanks,
> >>> Jungtaek Lim (HeartSaVioR)
> >>
> >>
>
>

Re: Strange behavior on MAP and ARRAY type

Posted by Julian Hyde <jh...@apache.org>.

The SQL standard is not available online. Do some creative searching (including on this list) and you might find it.

Julian


> On Sep 27, 2016, at 12:30 AM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> Thanks Julian for detailed explanation.
> I filed issues CALCITE-1386
> <https://issues.apache.org/jira/browse/CALCITE-1386> and CALCITE-1387
> <https://issues.apache.org/jira/browse/CALCITE-1387> for 1 and 2
> respectively.
> 
> Btw, could you link which you referred the page? I can't find it.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2016년 9월 27일 (화) 오후 12:50, Julian Hyde <jh...@apache.org>님이 작성:
> 
>> Regarding 1 and 2: these sound like bugs. Can you please log a JIRA case
>> for each? If you can supply a test case, even better.
>> 
>> Regarding 3. I checked the SQL standard, and the required behavior is to
>> throw an exception. (Maybe ArrayIndexOutOfBoundsException isn’t the best
>> exception to throw.)
>> 
>> Julian
>> 
>> 
>> 
>> From SQL:2014...
>> 
>> 6.24 <array element reference> Function
>> 
>> Return an element of an array.
>> 
>> Format
>> 
>> <array element reference> ::=
>>  <array value expression>
>>      <left bracket or trigraph> <numeric value expression> <right bracket
>> or trigraph>
>> Syntax Rules
>> 
>> 1)  The declared type of an <array element reference> is the element type
>> of the speci ed <array value expression>.
>> 
>> 2)  The declared type of <numeric value expression> shall be exact numeric
>> with scale 0 (zero).
>> 
>> Access Rules
>> 
>> None.
>> 
>> General Rules
>> 
>> 1)  If the value of <array value expression> or <numeric value expression>
>> is the null value, then the result of <array element reference> is the null
>> value.
>> 
>> 2)  Let i be the value of <numeric value expression>. Case:
>> 
>> a)  If i is greater than zero and less than or equal to the cardinality of
>> <array value expression>, then the result of <array element reference> is
>> the value of the i-th element of the value of <array value expression>.
>> 
>> b)  Otherwise, an exception condition is raised: data exception — array
>> element error.
>> 
>> 
>> 
>> 
>>> On Sep 25, 2016, at 5:24 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>>> 
>>> Hi devs,
>>> 
>>> I'm experimenting with ARRAY and MAP type with Calcite's
>> JaninoRexCompiler.
>>> (with Calcite 1.9.0)
>>> 
>>> While testing I found some behaviors, so would like to see they're
>>> intentional
>>> or bugs we want to resolve.
>>> 
>>> 1. Even though I set map's key/value type or array's type explicitly,
>>> compiler ignores the value type and create assignment to Object.
>>> 
>>> This makes extracting value from nested collection not possible. Also
>>> comparing extracted value with numeric constant, too.
>>> (For example, suppose there's a row which MAPFIELD['a'] has 1 as value.
>>> Writing MAPFIELD['a'] = 1 on where statement throwing an Exception saying
>>> there's no SqlFunctions.eq(Object, int).
>>> 
>>> Btw, I took a breakpoint on BinaryImplementor.implement() to see
>> variables
>>> just
>>> before compiler finds SqlFunctions.eq().
>>> The type of expression is Object which is not generic, but seems like
>>> matching
>>> RexCall operand have value type information. We might create a quick fix
>>> from
>>> here (handling type conversion) but I'm not sure this can clearly resolve
>>> this.
>>> 
>>> 2. I just replace their type to ANY, and wrap expression with CAST like
>>> MAPFIELD['a'] -> CAST(MAPFIELD['a'] AS INTEGER) = 1 and succeed.
>>> (While using MAP type even I wrap to CAST, above error happens.)
>>> 
>>> But it doesn't work when MAPFIELD['a'] is null because CAST AS INTEGER
>>> calls
>>> SqlFunctions.toInt(null) which throws RuntimeException.
>>> (It happens MAPFIELD['a'] doesn't have item which has 'a' as key. I guess
>>> same
>>> applies to null value. And also same applies to ARRAY, and
>>> ARRAYFIELD[<non integer>].)
>>> 
>>> CAST(COALESCE(MAPFIELD['a'], -1) AS INTEGER) also throws Exception since
>>> MAPFIELD['a'] is Object and -1 is int.
>>> (CompileException : Incompatible expression types "java.lang.Object" and
>>> "int")
>>> 
>>> 3. ArrayOutOfIndexException is thrown when query is trying to access the
>>> array
>>> with out of index. I don't know there's a standard rule on this, but at
>>> least
>>> PostgreSQL doesn't throw an Exception but just treats it as null.
>>> https://www.postgresql.org/docs/9.1/static/arrays.html
>>> (This means the result of ITEM operator is nullable.)
>>> 
>>> Any ideas on these behaviors?
>>> 
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>> 
>>

Re: Strange behavior on MAP and ARRAY type

Posted by Jungtaek Lim <ka...@gmail.com>.

Thanks Julian for detailed explanation.
I filed issues CALCITE-1386
<https://issues.apache.org/jira/browse/CALCITE-1386> and CALCITE-1387
<https://issues.apache.org/jira/browse/CALCITE-1387> for 1 and 2
respectively.

Btw, could you link which you referred the page? I can't find it.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 9월 27일 (화) 오후 12:50, Julian Hyde <jh...@apache.org>님이 작성:

> Regarding 1 and 2: these sound like bugs. Can you please log a JIRA case
> for each? If you can supply a test case, even better.
>
> Regarding 3. I checked the SQL standard, and the required behavior is to
> throw an exception. (Maybe ArrayIndexOutOfBoundsException isn’t the best
> exception to throw.)
>
> Julian
>
>
>
> From SQL:2014...
>
> 6.24 <array element reference> Function
>
> Return an element of an array.
>
> Format
>
> <array element reference> ::=
>   <array value expression>
>       <left bracket or trigraph> <numeric value expression> <right bracket
> or trigraph>
> Syntax Rules
>
> 1)  The declared type of an <array element reference> is the element type
> of the speci ed <array value expression>.
>
> 2)  The declared type of <numeric value expression> shall be exact numeric
> with scale 0 (zero).
>
> Access Rules
>
> None.
>
> General Rules
>
> 1)  If the value of <array value expression> or <numeric value expression>
> is the null value, then the result of <array element reference> is the null
> value.
>
> 2)  Let i be the value of <numeric value expression>. Case:
>
> a)  If i is greater than zero and less than or equal to the cardinality of
> <array value expression>, then the result of <array element reference> is
> the value of the i-th element of the value of <array value expression>.
>
> b)  Otherwise, an exception condition is raised: data exception — array
> element error.
>
>
>
>
> > On Sep 25, 2016, at 5:24 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > Hi devs,
> >
> > I'm experimenting with ARRAY and MAP type with Calcite's
> JaninoRexCompiler.
> > (with Calcite 1.9.0)
> >
> > While testing I found some behaviors, so would like to see they're
> > intentional
> > or bugs we want to resolve.
> >
> > 1. Even though I set map's key/value type or array's type explicitly,
> > compiler ignores the value type and create assignment to Object.
> >
> > This makes extracting value from nested collection not possible. Also
> > comparing extracted value with numeric constant, too.
> > (For example, suppose there's a row which MAPFIELD['a'] has 1 as value.
> > Writing MAPFIELD['a'] = 1 on where statement throwing an Exception saying
> > there's no SqlFunctions.eq(Object, int).
> >
> > Btw, I took a breakpoint on BinaryImplementor.implement() to see
> variables
> > just
> > before compiler finds SqlFunctions.eq().
> > The type of expression is Object which is not generic, but seems like
> > matching
> > RexCall operand have value type information. We might create a quick fix
> > from
> > here (handling type conversion) but I'm not sure this can clearly resolve
> > this.
> >
> > 2. I just replace their type to ANY, and wrap expression with CAST like
> > MAPFIELD['a'] -> CAST(MAPFIELD['a'] AS INTEGER) = 1 and succeed.
> > (While using MAP type even I wrap to CAST, above error happens.)
> >
> > But it doesn't work when MAPFIELD['a'] is null because CAST AS INTEGER
> > calls
> > SqlFunctions.toInt(null) which throws RuntimeException.
> > (It happens MAPFIELD['a'] doesn't have item which has 'a' as key. I guess
> > same
> > applies to null value. And also same applies to ARRAY, and
> > ARRAYFIELD[<non integer>].)
> >
> > CAST(COALESCE(MAPFIELD['a'], -1) AS INTEGER) also throws Exception since
> > MAPFIELD['a'] is Object and -1 is int.
> > (CompileException : Incompatible expression types "java.lang.Object" and
> > "int")
> >
> > 3. ArrayOutOfIndexException is thrown when query is trying to access the
> > array
> > with out of index. I don't know there's a standard rule on this, but at
> > least
> > PostgreSQL doesn't throw an Exception but just treats it as null.
> > https://www.postgresql.org/docs/9.1/static/arrays.html
> > (This means the result of ITEM operator is nullable.)
> >
> > Any ideas on these behaviors?
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
>
>

Re: Strange behavior on MAP and ARRAY type

Posted by Julian Hyde <jh...@apache.org>.

Regarding 1 and 2: these sound like bugs. Can you please log a JIRA case for each? If you can supply a test case, even better.

Regarding 3. I checked the SQL standard, and the required behavior is to throw an exception. (Maybe ArrayIndexOutOfBoundsException isn’t the best exception to throw.)

Julian



From SQL:2014...

6.24 <array element reference> Function

Return an element of an array.

Format

<array element reference> ::=
  <array value expression>
      <left bracket or trigraph> <numeric value expression> <right bracket or trigraph>
Syntax Rules

1)  The declared type of an <array element reference> is the element type of the speci ed <array value expression>.

2)  The declared type of <numeric value expression> shall be exact numeric with scale 0 (zero).

Access Rules

None.

General Rules

1)  If the value of <array value expression> or <numeric value expression> is the null value, then the result of <array element reference> is the null value.

2)  Let i be the value of <numeric value expression>. Case:

a)  If i is greater than zero and less than or equal to the cardinality of <array value expression>, then the result of <array element reference> is the value of the i-th element of the value of <array value expression>.

b)  Otherwise, an exception condition is raised: data exception — array element error. 




> On Sep 25, 2016, at 5:24 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> Hi devs,
> 
> I'm experimenting with ARRAY and MAP type with Calcite's JaninoRexCompiler.
> (with Calcite 1.9.0)
> 
> While testing I found some behaviors, so would like to see they're
> intentional
> or bugs we want to resolve.
> 
> 1. Even though I set map's key/value type or array's type explicitly,
> compiler ignores the value type and create assignment to Object.
> 
> This makes extracting value from nested collection not possible. Also
> comparing extracted value with numeric constant, too.
> (For example, suppose there's a row which MAPFIELD['a'] has 1 as value.
> Writing MAPFIELD['a'] = 1 on where statement throwing an Exception saying
> there's no SqlFunctions.eq(Object, int).
> 
> Btw, I took a breakpoint on BinaryImplementor.implement() to see variables
> just
> before compiler finds SqlFunctions.eq().
> The type of expression is Object which is not generic, but seems like
> matching
> RexCall operand have value type information. We might create a quick fix
> from
> here (handling type conversion) but I'm not sure this can clearly resolve
> this.
> 
> 2. I just replace their type to ANY, and wrap expression with CAST like
> MAPFIELD['a'] -> CAST(MAPFIELD['a'] AS INTEGER) = 1 and succeed.
> (While using MAP type even I wrap to CAST, above error happens.)
> 
> But it doesn't work when MAPFIELD['a'] is null because CAST AS INTEGER
> calls
> SqlFunctions.toInt(null) which throws RuntimeException.
> (It happens MAPFIELD['a'] doesn't have item which has 'a' as key. I guess
> same
> applies to null value. And also same applies to ARRAY, and
> ARRAYFIELD[<non integer>].)
> 
> CAST(COALESCE(MAPFIELD['a'], -1) AS INTEGER) also throws Exception since
> MAPFIELD['a'] is Object and -1 is int.
> (CompileException : Incompatible expression types "java.lang.Object" and
> "int")
> 
> 3. ArrayOutOfIndexException is thrown when query is trying to access the
> array
> with out of index. I don't know there's a standard rule on this, but at
> least
> PostgreSQL doesn't throw an Exception but just treats it as null.
> https://www.postgresql.org/docs/9.1/static/arrays.html
> (This means the result of ITEM operator is nullable.)
> 
> Any ideas on these behaviors?
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)