You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Milinda Pathirage <mp...@umail.iu.edu> on 2015/11/14 06:35:28 UTC

Supporting stream-to-relation joins

Hi devs,

Current SqlValidatorImpl doesn't allow queries like following:

select stream orders.orderId, orders.productId, products.name from
orders join products on orders.productId = products.id


if the 'products' is a relation. This query fails at the modality check.
But I am not sure whether fixing (or changing)  the modality checking logic
is enough to solve this. Do we need to change planner rules as well. Really
appreciate any ideas on this.

Thanks
Milinda

p.s. I am trying to get this base case working where every element from a
stream is joined with a relation. stream-to-stream joins requires changes
to parser as well to support windowing. That's my understanding, Julian may
have better ideas.

-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Re: Supporting stream-to-relation joins

Posted by Milinda Pathirage <mp...@umail.iu.edu>.
Hi Julian,

Thanks for the samples with expressions in JOIN clause. I mentioned
tumbling and hopping windows because I saw a reference to hopping windows
in one of the survey papers I was reading. But I couldn't find any real
examples in the original paper ("On Indexing Sliding Windows over On-line
Data Streams"). I'll see whether I can think of a query useful in the real
world.

Thanks

On Mon, Nov 16, 2015 at 6:38 PM, Julian Hyde <jh...@apache.org> wrote:

> Yes, I don’t believe we ever need OVER in the FROM clause. I’ve updated
> the JIRA case with some examples.
>
> Do you have any examples of hopping or tumbling windows in mind? Say,
> "orders shipped within the same business day” might be a tumbling window?
>
> Julian
>
>
>
> > On Nov 16, 2015, at 2:05 PM, Milinda Pathirage <mp...@umail.iu.edu>
> wrote:
> >
> > Hi Julian,
> >
> > I came up with samples and added them to CALCITE-968. I am exactly not
> sure
> > how we can express windows over streams in case of stream-to-stream joins
> > if OVER is not supported in FROM clause. I only added sliding window join
> > queries. Given that current support provided by windowing clause is
> limited
> > to sliding windows, we have to figure out a way to express hopping and
> > tumbling windows.
> >
> > But I like the idea of expressing windowed joins without OVER clause if
> you
> > meant it in the above mail.
> >
> > Thanks
> > Milinda
> >
> > On Sun, Nov 15, 2015 at 7:53 PM, Milinda Pathirage <
> mpathira@umail.iu.edu>
> > wrote:
> >
> >> JIRA ticket to track stream joins can be found at
> >> https://issues.apache.org/jira/browse/CALCITE-968.
> >>
> >> Thanks
> >> Milinda
> >>
> >> On Sat, Nov 14, 2015 at 4:54 PM, Milinda Pathirage <
> mpathira@umail.iu.edu>
> >> wrote:
> >>
> >>> Hi Julian,
> >>>
> >>> Thanks for the response. Will create a jira ticket and come up with
> some
> >>> samples.
> >>>
> >>>
> >>> Milinda
> >>>
> >>> On Sat, Nov 14, 2015 at 3:38 AM, Julian Hyde <jh...@apache.org> wrote:
> >>>
> >>>> Short answer: yes, we should allow it.
> >>>>
> >>>> The design falls into 3 parts:
> >>>> * Validation. We should allow any combination: table-table,
> stream-table
> >>>> and stream-stream joins, as long as the query can make progress. That
> often
> >>>> means that where a stream is involved, the join condition should
> involve a
> >>>> monotonic expression. If it is a stream-table join you can make
> progress
> >>>> without the monotonic expression, but if there are 2 streams you will
> need
> >>>> it.
> >>>> * Translation to relational algebra. Inspired by differential
> calculus’
> >>>> product rule[1], "stream(x join y)" becomes "x join stream(y) union
> all
> >>>> stream(x) join y". Suppose that products is a table (i.e. we do not
> receive
> >>>> notifications of new products); then "stream(products)" is empty.
> Suppose
> >>>> that orders is a both a stream and a table; i.e. a stream with
> history.
> >>>> Because stream(products) is empty, "stream(products join orders)" is
> simply
> >>>> “products join stream(orders)”. These rewrites would happen in a
> >>>> DeltaJoinTransposeRule.
> >>>> * Updates to relations. Suppose that the products table is updated two
> >>>> or three times during each day. How quickly does the end user expect
> those
> >>>> updated records to appear in the output of the stream-table join? If
> the
> >>>> table is updated at 10am, should the new data be loaded only when
> >>>> processing transactions from 10am (which might not hit the join until
> say
> >>>> 10:07am). There is no ‘right answer’ here; we should offer the end
> user a
> >>>> choice of policies. A good basic policy would be “cache for no more
> than T
> >>>> seconds” or “cache as long as you like” but give a manual way to
> flush the
> >>>> cache.
> >>>>
> >>>> Can you please log a jira case to track this? Next step would be to
> >>>> write some sample queries and decide whether they are valid.
> >>>>
> >>>> Julian
> >>>>
> >>>> [1] https://en.wikipedia.org/wiki/Product_rule
> >>>>
> >>>>> On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <
> mpathira@umail.iu.edu>
> >>>> wrote:
> >>>>>
> >>>>> Hi devs,
> >>>>>
> >>>>> Current SqlValidatorImpl doesn't allow queries like following:
> >>>>>
> >>>>> select stream orders.orderId, orders.productId, products.name from
> >>>>> orders join products on orders.productId = products.id
> >>>>>
> >>>>>
> >>>>> if the 'products' is a relation. This query fails at the modality
> >>>> check.
> >>>>> But I am not sure whether fixing (or changing)  the modality checking
> >>>> logic
> >>>>> is enough to solve this. Do we need to change planner rules as well.
> >>>> Really
> >>>>> appreciate any ideas on this.
> >>>>>
> >>>>> Thanks
> >>>>> Milinda
> >>>>>
> >>>>> p.s. I am trying to get this base case working where every element
> >>>> from a
> >>>>> stream is joined with a relation. stream-to-stream joins requires
> >>>> changes
> >>>>> to parser as well to support windowing. That's my understanding,
> >>>> Julian may
> >>>>> have better ideas.
> >>>>>
> >>>>> --
> >>>>> Milinda Pathirage
> >>>>>
> >>>>> PhD Student | Research Assistant
> >>>>> School of Informatics and Computing | Data to Insight Center
> >>>>> Indiana University
> >>>>>
> >>>>> twitter: milindalakmal
> >>>>> skype: milinda.pathirage
> >>>>> blog: http://milinda.pathirage.org
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Milinda Pathirage
> >>>
> >>> PhD Student | Research Assistant
> >>> School of Informatics and Computing | Data to Insight Center
> >>> Indiana University
> >>>
> >>> twitter: milindalakmal
> >>> skype: milinda.pathirage
> >>> blog: http://milinda.pathirage.org
> >>>
> >>
> >>
> >>
> >> --
> >> Milinda Pathirage
> >>
> >> PhD Student | Research Assistant
> >> School of Informatics and Computing | Data to Insight Center
> >> Indiana University
> >>
> >> twitter: milindalakmal
> >> skype: milinda.pathirage
> >> blog: http://milinda.pathirage.org
> >>
> >
> >
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
>
>


-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Re: Supporting stream-to-relation joins

Posted by Julian Hyde <jh...@apache.org>.
Yes, I don’t believe we ever need OVER in the FROM clause. I’ve updated the JIRA case with some examples.

Do you have any examples of hopping or tumbling windows in mind? Say, "orders shipped within the same business day” might be a tumbling window?

Julian



> On Nov 16, 2015, at 2:05 PM, Milinda Pathirage <mp...@umail.iu.edu> wrote:
> 
> Hi Julian,
> 
> I came up with samples and added them to CALCITE-968. I am exactly not sure
> how we can express windows over streams in case of stream-to-stream joins
> if OVER is not supported in FROM clause. I only added sliding window join
> queries. Given that current support provided by windowing clause is limited
> to sliding windows, we have to figure out a way to express hopping and
> tumbling windows.
> 
> But I like the idea of expressing windowed joins without OVER clause if you
> meant it in the above mail.
> 
> Thanks
> Milinda
> 
> On Sun, Nov 15, 2015 at 7:53 PM, Milinda Pathirage <mp...@umail.iu.edu>
> wrote:
> 
>> JIRA ticket to track stream joins can be found at
>> https://issues.apache.org/jira/browse/CALCITE-968.
>> 
>> Thanks
>> Milinda
>> 
>> On Sat, Nov 14, 2015 at 4:54 PM, Milinda Pathirage <mp...@umail.iu.edu>
>> wrote:
>> 
>>> Hi Julian,
>>> 
>>> Thanks for the response. Will create a jira ticket and come up with some
>>> samples.
>>> 
>>> 
>>> Milinda
>>> 
>>> On Sat, Nov 14, 2015 at 3:38 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>>> Short answer: yes, we should allow it.
>>>> 
>>>> The design falls into 3 parts:
>>>> * Validation. We should allow any combination: table-table, stream-table
>>>> and stream-stream joins, as long as the query can make progress. That often
>>>> means that where a stream is involved, the join condition should involve a
>>>> monotonic expression. If it is a stream-table join you can make progress
>>>> without the monotonic expression, but if there are 2 streams you will need
>>>> it.
>>>> * Translation to relational algebra. Inspired by differential calculus’
>>>> product rule[1], "stream(x join y)" becomes "x join stream(y) union all
>>>> stream(x) join y". Suppose that products is a table (i.e. we do not receive
>>>> notifications of new products); then "stream(products)" is empty. Suppose
>>>> that orders is a both a stream and a table; i.e. a stream with history.
>>>> Because stream(products) is empty, "stream(products join orders)" is simply
>>>> “products join stream(orders)”. These rewrites would happen in a
>>>> DeltaJoinTransposeRule.
>>>> * Updates to relations. Suppose that the products table is updated two
>>>> or three times during each day. How quickly does the end user expect those
>>>> updated records to appear in the output of the stream-table join? If the
>>>> table is updated at 10am, should the new data be loaded only when
>>>> processing transactions from 10am (which might not hit the join until say
>>>> 10:07am). There is no ‘right answer’ here; we should offer the end user a
>>>> choice of policies. A good basic policy would be “cache for no more than T
>>>> seconds” or “cache as long as you like” but give a manual way to flush the
>>>> cache.
>>>> 
>>>> Can you please log a jira case to track this? Next step would be to
>>>> write some sample queries and decide whether they are valid.
>>>> 
>>>> Julian
>>>> 
>>>> [1] https://en.wikipedia.org/wiki/Product_rule
>>>> 
>>>>> On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <mp...@umail.iu.edu>
>>>> wrote:
>>>>> 
>>>>> Hi devs,
>>>>> 
>>>>> Current SqlValidatorImpl doesn't allow queries like following:
>>>>> 
>>>>> select stream orders.orderId, orders.productId, products.name from
>>>>> orders join products on orders.productId = products.id
>>>>> 
>>>>> 
>>>>> if the 'products' is a relation. This query fails at the modality
>>>> check.
>>>>> But I am not sure whether fixing (or changing)  the modality checking
>>>> logic
>>>>> is enough to solve this. Do we need to change planner rules as well.
>>>> Really
>>>>> appreciate any ideas on this.
>>>>> 
>>>>> Thanks
>>>>> Milinda
>>>>> 
>>>>> p.s. I am trying to get this base case working where every element
>>>> from a
>>>>> stream is joined with a relation. stream-to-stream joins requires
>>>> changes
>>>>> to parser as well to support windowing. That's my understanding,
>>>> Julian may
>>>>> have better ideas.
>>>>> 
>>>>> --
>>>>> Milinda Pathirage
>>>>> 
>>>>> PhD Student | Research Assistant
>>>>> School of Informatics and Computing | Data to Insight Center
>>>>> Indiana University
>>>>> 
>>>>> twitter: milindalakmal
>>>>> skype: milinda.pathirage
>>>>> blog: http://milinda.pathirage.org
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Milinda Pathirage
>>> 
>>> PhD Student | Research Assistant
>>> School of Informatics and Computing | Data to Insight Center
>>> Indiana University
>>> 
>>> twitter: milindalakmal
>>> skype: milinda.pathirage
>>> blog: http://milinda.pathirage.org
>>> 
>> 
>> 
>> 
>> --
>> Milinda Pathirage
>> 
>> PhD Student | Research Assistant
>> School of Informatics and Computing | Data to Insight Center
>> Indiana University
>> 
>> twitter: milindalakmal
>> skype: milinda.pathirage
>> blog: http://milinda.pathirage.org
>> 
> 
> 
> 
> -- 
> Milinda Pathirage
> 
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
> 
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org


Re: Supporting stream-to-relation joins

Posted by Milinda Pathirage <mp...@umail.iu.edu>.
Hi Julian,

I came up with samples and added them to CALCITE-968. I am exactly not sure
how we can express windows over streams in case of stream-to-stream joins
if OVER is not supported in FROM clause. I only added sliding window join
queries. Given that current support provided by windowing clause is limited
to sliding windows, we have to figure out a way to express hopping and
tumbling windows.

But I like the idea of expressing windowed joins without OVER clause if you
meant it in the above mail.

Thanks
Milinda

On Sun, Nov 15, 2015 at 7:53 PM, Milinda Pathirage <mp...@umail.iu.edu>
wrote:

> JIRA ticket to track stream joins can be found at
> https://issues.apache.org/jira/browse/CALCITE-968.
>
> Thanks
> Milinda
>
> On Sat, Nov 14, 2015 at 4:54 PM, Milinda Pathirage <mp...@umail.iu.edu>
> wrote:
>
>> Hi Julian,
>>
>> Thanks for the response. Will create a jira ticket and come up with some
>> samples.
>>
>>
>> Milinda
>>
>> On Sat, Nov 14, 2015 at 3:38 AM, Julian Hyde <jh...@apache.org> wrote:
>>
>>> Short answer: yes, we should allow it.
>>>
>>> The design falls into 3 parts:
>>> * Validation. We should allow any combination: table-table, stream-table
>>> and stream-stream joins, as long as the query can make progress. That often
>>> means that where a stream is involved, the join condition should involve a
>>> monotonic expression. If it is a stream-table join you can make progress
>>> without the monotonic expression, but if there are 2 streams you will need
>>> it.
>>> * Translation to relational algebra. Inspired by differential calculus’
>>> product rule[1], "stream(x join y)" becomes "x join stream(y) union all
>>> stream(x) join y". Suppose that products is a table (i.e. we do not receive
>>> notifications of new products); then "stream(products)" is empty. Suppose
>>> that orders is a both a stream and a table; i.e. a stream with history.
>>> Because stream(products) is empty, "stream(products join orders)" is simply
>>> “products join stream(orders)”. These rewrites would happen in a
>>> DeltaJoinTransposeRule.
>>> * Updates to relations. Suppose that the products table is updated two
>>> or three times during each day. How quickly does the end user expect those
>>> updated records to appear in the output of the stream-table join? If the
>>> table is updated at 10am, should the new data be loaded only when
>>> processing transactions from 10am (which might not hit the join until say
>>> 10:07am). There is no ‘right answer’ here; we should offer the end user a
>>> choice of policies. A good basic policy would be “cache for no more than T
>>> seconds” or “cache as long as you like” but give a manual way to flush the
>>> cache.
>>>
>>> Can you please log a jira case to track this? Next step would be to
>>> write some sample queries and decide whether they are valid.
>>>
>>> Julian
>>>
>>> [1] https://en.wikipedia.org/wiki/Product_rule
>>>
>>> > On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <mp...@umail.iu.edu>
>>> wrote:
>>> >
>>> > Hi devs,
>>> >
>>> > Current SqlValidatorImpl doesn't allow queries like following:
>>> >
>>> > select stream orders.orderId, orders.productId, products.name from
>>> > orders join products on orders.productId = products.id
>>> >
>>> >
>>> > if the 'products' is a relation. This query fails at the modality
>>> check.
>>> > But I am not sure whether fixing (or changing)  the modality checking
>>> logic
>>> > is enough to solve this. Do we need to change planner rules as well.
>>> Really
>>> > appreciate any ideas on this.
>>> >
>>> > Thanks
>>> > Milinda
>>> >
>>> > p.s. I am trying to get this base case working where every element
>>> from a
>>> > stream is joined with a relation. stream-to-stream joins requires
>>> changes
>>> > to parser as well to support windowing. That's my understanding,
>>> Julian may
>>> > have better ideas.
>>> >
>>> > --
>>> > Milinda Pathirage
>>> >
>>> > PhD Student | Research Assistant
>>> > School of Informatics and Computing | Data to Insight Center
>>> > Indiana University
>>> >
>>> > twitter: milindalakmal
>>> > skype: milinda.pathirage
>>> > blog: http://milinda.pathirage.org
>>>
>>>
>>
>>
>> --
>> Milinda Pathirage
>>
>> PhD Student | Research Assistant
>> School of Informatics and Computing | Data to Insight Center
>> Indiana University
>>
>> twitter: milindalakmal
>> skype: milinda.pathirage
>> blog: http://milinda.pathirage.org
>>
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Re: Supporting stream-to-relation joins

Posted by Milinda Pathirage <mp...@umail.iu.edu>.
JIRA ticket to track stream joins can be found at
https://issues.apache.org/jira/browse/CALCITE-968.

Thanks
Milinda

On Sat, Nov 14, 2015 at 4:54 PM, Milinda Pathirage <mp...@umail.iu.edu>
wrote:

> Hi Julian,
>
> Thanks for the response. Will create a jira ticket and come up with some
> samples.
>
>
> Milinda
>
> On Sat, Nov 14, 2015 at 3:38 AM, Julian Hyde <jh...@apache.org> wrote:
>
>> Short answer: yes, we should allow it.
>>
>> The design falls into 3 parts:
>> * Validation. We should allow any combination: table-table, stream-table
>> and stream-stream joins, as long as the query can make progress. That often
>> means that where a stream is involved, the join condition should involve a
>> monotonic expression. If it is a stream-table join you can make progress
>> without the monotonic expression, but if there are 2 streams you will need
>> it.
>> * Translation to relational algebra. Inspired by differential calculus’
>> product rule[1], "stream(x join y)" becomes "x join stream(y) union all
>> stream(x) join y". Suppose that products is a table (i.e. we do not receive
>> notifications of new products); then "stream(products)" is empty. Suppose
>> that orders is a both a stream and a table; i.e. a stream with history.
>> Because stream(products) is empty, "stream(products join orders)" is simply
>> “products join stream(orders)”. These rewrites would happen in a
>> DeltaJoinTransposeRule.
>> * Updates to relations. Suppose that the products table is updated two or
>> three times during each day. How quickly does the end user expect those
>> updated records to appear in the output of the stream-table join? If the
>> table is updated at 10am, should the new data be loaded only when
>> processing transactions from 10am (which might not hit the join until say
>> 10:07am). There is no ‘right answer’ here; we should offer the end user a
>> choice of policies. A good basic policy would be “cache for no more than T
>> seconds” or “cache as long as you like” but give a manual way to flush the
>> cache.
>>
>> Can you please log a jira case to track this? Next step would be to write
>> some sample queries and decide whether they are valid.
>>
>> Julian
>>
>> [1] https://en.wikipedia.org/wiki/Product_rule
>>
>> > On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <mp...@umail.iu.edu>
>> wrote:
>> >
>> > Hi devs,
>> >
>> > Current SqlValidatorImpl doesn't allow queries like following:
>> >
>> > select stream orders.orderId, orders.productId, products.name from
>> > orders join products on orders.productId = products.id
>> >
>> >
>> > if the 'products' is a relation. This query fails at the modality check.
>> > But I am not sure whether fixing (or changing)  the modality checking
>> logic
>> > is enough to solve this. Do we need to change planner rules as well.
>> Really
>> > appreciate any ideas on this.
>> >
>> > Thanks
>> > Milinda
>> >
>> > p.s. I am trying to get this base case working where every element from
>> a
>> > stream is joined with a relation. stream-to-stream joins requires
>> changes
>> > to parser as well to support windowing. That's my understanding, Julian
>> may
>> > have better ideas.
>> >
>> > --
>> > Milinda Pathirage
>> >
>> > PhD Student | Research Assistant
>> > School of Informatics and Computing | Data to Insight Center
>> > Indiana University
>> >
>> > twitter: milindalakmal
>> > skype: milinda.pathirage
>> > blog: http://milinda.pathirage.org
>>
>>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Re: Supporting stream-to-relation joins

Posted by Milinda Pathirage <mp...@umail.iu.edu>.
Hi Julian,

Thanks for the response. Will create a jira ticket and come up with some
samples.


Milinda

On Sat, Nov 14, 2015 at 3:38 AM, Julian Hyde <jh...@apache.org> wrote:

> Short answer: yes, we should allow it.
>
> The design falls into 3 parts:
> * Validation. We should allow any combination: table-table, stream-table
> and stream-stream joins, as long as the query can make progress. That often
> means that where a stream is involved, the join condition should involve a
> monotonic expression. If it is a stream-table join you can make progress
> without the monotonic expression, but if there are 2 streams you will need
> it.
> * Translation to relational algebra. Inspired by differential calculus’
> product rule[1], "stream(x join y)" becomes "x join stream(y) union all
> stream(x) join y". Suppose that products is a table (i.e. we do not receive
> notifications of new products); then "stream(products)" is empty. Suppose
> that orders is a both a stream and a table; i.e. a stream with history.
> Because stream(products) is empty, "stream(products join orders)" is simply
> “products join stream(orders)”. These rewrites would happen in a
> DeltaJoinTransposeRule.
> * Updates to relations. Suppose that the products table is updated two or
> three times during each day. How quickly does the end user expect those
> updated records to appear in the output of the stream-table join? If the
> table is updated at 10am, should the new data be loaded only when
> processing transactions from 10am (which might not hit the join until say
> 10:07am). There is no ‘right answer’ here; we should offer the end user a
> choice of policies. A good basic policy would be “cache for no more than T
> seconds” or “cache as long as you like” but give a manual way to flush the
> cache.
>
> Can you please log a jira case to track this? Next step would be to write
> some sample queries and decide whether they are valid.
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Product_rule
>
> > On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <mp...@umail.iu.edu>
> wrote:
> >
> > Hi devs,
> >
> > Current SqlValidatorImpl doesn't allow queries like following:
> >
> > select stream orders.orderId, orders.productId, products.name from
> > orders join products on orders.productId = products.id
> >
> >
> > if the 'products' is a relation. This query fails at the modality check.
> > But I am not sure whether fixing (or changing)  the modality checking
> logic
> > is enough to solve this. Do we need to change planner rules as well.
> Really
> > appreciate any ideas on this.
> >
> > Thanks
> > Milinda
> >
> > p.s. I am trying to get this base case working where every element from a
> > stream is joined with a relation. stream-to-stream joins requires changes
> > to parser as well to support windowing. That's my understanding, Julian
> may
> > have better ideas.
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
>
>


-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Re: Supporting stream-to-relation joins

Posted by Julian Hyde <jh...@apache.org>.
Short answer: yes, we should allow it.

The design falls into 3 parts:
* Validation. We should allow any combination: table-table, stream-table and stream-stream joins, as long as the query can make progress. That often means that where a stream is involved, the join condition should involve a monotonic expression. If it is a stream-table join you can make progress without the monotonic expression, but if there are 2 streams you will need it.
* Translation to relational algebra. Inspired by differential calculus’ product rule[1], "stream(x join y)" becomes "x join stream(y) union all stream(x) join y". Suppose that products is a table (i.e. we do not receive notifications of new products); then "stream(products)" is empty. Suppose that orders is a both a stream and a table; i.e. a stream with history. Because stream(products) is empty, "stream(products join orders)" is simply “products join stream(orders)”. These rewrites would happen in a DeltaJoinTransposeRule. 
* Updates to relations. Suppose that the products table is updated two or three times during each day. How quickly does the end user expect those updated records to appear in the output of the stream-table join? If the table is updated at 10am, should the new data be loaded only when processing transactions from 10am (which might not hit the join until say 10:07am). There is no ‘right answer’ here; we should offer the end user a choice of policies. A good basic policy would be “cache for no more than T seconds” or “cache as long as you like” but give a manual way to flush the cache.

Can you please log a jira case to track this? Next step would be to write some sample queries and decide whether they are valid.

Julian

[1] https://en.wikipedia.org/wiki/Product_rule

> On Nov 13, 2015, at 9:35 PM, Milinda Pathirage <mp...@umail.iu.edu> wrote:
> 
> Hi devs,
> 
> Current SqlValidatorImpl doesn't allow queries like following:
> 
> select stream orders.orderId, orders.productId, products.name from
> orders join products on orders.productId = products.id
> 
> 
> if the 'products' is a relation. This query fails at the modality check.
> But I am not sure whether fixing (or changing)  the modality checking logic
> is enough to solve this. Do we need to change planner rules as well. Really
> appreciate any ideas on this.
> 
> Thanks
> Milinda
> 
> p.s. I am trying to get this base case working where every element from a
> stream is joined with a relation. stream-to-stream joins requires changes
> to parser as well to support windowing. That's my understanding, Julian may
> have better ideas.
> 
> -- 
> Milinda Pathirage
> 
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
> 
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org