You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Sergi Vladykin <se...@gmail.com> on 2017/06/01 09:51:49 UTC

Re: nested SQL sub-queries with non-collocated joins

The approach you are suggesting will be very complex for current
implementation. Also most probably very inefficient.

Actually I was thinking about another but similar approach: in many cases
we can rewrite a subquery in WHERE clause into JOIN subquery.

Like the following:

SELECT x.* FROM x WHERE x.a = (SELECT MAX(y.a) FROM y WHERE y.b = x.b)

 ===>

SELECT x.* FROM x, (SELECT MAX(y.a), y.b FROM y GROUP BY y.b) z WHERE x.b =
z.b

There are still problems here:

1. We will not be able to rewrite all the queries.
2. We should not rewrite queries like this by default because this will
have a noticeable performance penalty for correctly collocated subqueries.
Probably we will need some flag for that.

Sergi

2017-05-31 21:26 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Igniters (specifically Sergi),
>
> It has come to my attention today that nested sub-select statements, when
> used in combination with non-collocated joins do not work properly in
> Ignite.
>
> So a query like this, where A, B, and C are all stored in Partitioned
> caches and are **not** collocated at all, will not work.
>
>
> > *select * from A, B where a.id <http://a.id> = b.a_id and b.somefield in
> > (select somefield from C where c.zipcode = ?)*
>
>
> The main reason it is not supported right now is because, in the absence of
> collocation, such query may create N^N complexity and it was decided that
> it is best not supporting it at all.
>
> However, I am not sure why N^N complexity is required. Why not support it
> as follows?
>
>    1. execute the nested subquery and store the result in a temporary
>    Replicated table.
>    2. execute the original query and use the temporary Replicated table
>    instead of the sub-query.
>
> Sergi, given that you are the author of the code, can you provide some
> insight here?
>
> Thanks,
> D.
>

Re: nested SQL sub-queries with non-collocated joins

Posted by Denis Magda <dm...@apache.org>.
Here is a “known limitations” section on readme.io:
https://apacheignite.readme.io/docs/sql-queries#section-known-limitations

Feel free to update it with the limitation discussed.

—
Denis

> On Jun 1, 2017, at 1:19 PM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> 
> On Thu, Jun 1, 2017 at 1:07 PM, Sergi Vladykin <se...@gmail.com>
> wrote:
> 
>> If you don't see an exception then it must be supported. This is the whole
>> point of this exception, right?
>> 
> 
> Exception is just to enforce the constraint. We still must clearly document
> what is supported.


Re: nested SQL sub-queries with non-collocated joins

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Thu, Jun 1, 2017 at 1:07 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> If you don't see an exception then it must be supported. This is the whole
> point of this exception, right?
>

Exception is just to enforce the constraint. We still must clearly document
what is supported.

Re: nested SQL sub-queries with non-collocated joins

Posted by Sergi Vladykin <se...@gmail.com>.
If you don't see an exception then it must be supported. This is the whole
point of this exception, right?

Sergi

2017-06-01 22:50 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Thu, Jun 1, 2017 at 12:32 PM, Sergi Vladykin <se...@gmail.com>
> wrote:
>
> > I guess it must work the following way:
> >
> > If distributed joins are enabled we can try to prove that the subquery is
> > collocated, if we can't then try to rewrite it, if we can't, then throw
> an
> > exception.
> >
> > Still this can not be done 100% correct, probably we have to have some
> flag
> > which allows to disable this subquery rewriting.
> >
>
> Sergi, but how do you explain to users what is supported and what is not?
>

Re: nested SQL sub-queries with non-collocated joins

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Thu, Jun 1, 2017 at 12:32 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> I guess it must work the following way:
>
> If distributed joins are enabled we can try to prove that the subquery is
> collocated, if we can't then try to rewrite it, if we can't, then throw an
> exception.
>
> Still this can not be done 100% correct, probably we have to have some flag
> which allows to disable this subquery rewriting.
>

Sergi, but how do you explain to users what is supported and what is not?

Re: nested SQL sub-queries with non-collocated joins

Posted by Sergi Vladykin <se...@gmail.com>.
I guess it must work the following way:

If distributed joins are enabled we can try to prove that the subquery is
collocated, if we can't then try to rewrite it, if we can't, then throw an
exception.

Still this can not be done 100% correct, probably we have to have some flag
which allows to disable this subquery rewriting.

Sergi

2017-06-01 21:33 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Sergi,
>
> I am OK with any improvement here, but we need to be able to clearly state
> to a user what is supported and what is not. If we cannot clearly describe
> it, I would rather not support it at all and throw an exception.
>
> Is this going to be possible with your solution?
>
> D.
>
> On Thu, Jun 1, 2017 at 2:51 AM, Sergi Vladykin <se...@gmail.com>
> wrote:
>
> > The approach you are suggesting will be very complex for current
> > implementation. Also most probably very inefficient.
> >
> > Actually I was thinking about another but similar approach: in many cases
> > we can rewrite a subquery in WHERE clause into JOIN subquery.
> >
> > Like the following:
> >
> > SELECT x.* FROM x WHERE x.a = (SELECT MAX(y.a) FROM y WHERE y.b = x.b)
> >
> >  ===>
> >
> > SELECT x.* FROM x, (SELECT MAX(y.a), y.b FROM y GROUP BY y.b) z WHERE
> x.b =
> > z.b
> >
> > There are still problems here:
> >
> > 1. We will not be able to rewrite all the queries.
> > 2. We should not rewrite queries like this by default because this will
> > have a noticeable performance penalty for correctly collocated
> subqueries.
> > Probably we will need some flag for that.
> >
> > Sergi
> >
> > 2017-05-31 21:26 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> > > Igniters (specifically Sergi),
> > >
> > > It has come to my attention today that nested sub-select statements,
> when
> > > used in combination with non-collocated joins do not work properly in
> > > Ignite.
> > >
> > > So a query like this, where A, B, and C are all stored in Partitioned
> > > caches and are **not** collocated at all, will not work.
> > >
> > >
> > > > *select * from A, B where a.id <http://a.id> = b.a_id and
> b.somefield
> > in
> > > > (select somefield from C where c.zipcode = ?)*
> > >
> > >
> > > The main reason it is not supported right now is because, in the
> absence
> > of
> > > collocation, such query may create N^N complexity and it was decided
> that
> > > it is best not supporting it at all.
> > >
> > > However, I am not sure why N^N complexity is required. Why not support
> it
> > > as follows?
> > >
> > >    1. execute the nested subquery and store the result in a temporary
> > >    Replicated table.
> > >    2. execute the original query and use the temporary Replicated table
> > >    instead of the sub-query.
> > >
> > > Sergi, given that you are the author of the code, can you provide some
> > > insight here?
> > >
> > > Thanks,
> > > D.
> > >
> >
>

Re: nested SQL sub-queries with non-collocated joins

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Sergi,

I am OK with any improvement here, but we need to be able to clearly state
to a user what is supported and what is not. If we cannot clearly describe
it, I would rather not support it at all and throw an exception.

Is this going to be possible with your solution?

D.

On Thu, Jun 1, 2017 at 2:51 AM, Sergi Vladykin <se...@gmail.com>
wrote:

> The approach you are suggesting will be very complex for current
> implementation. Also most probably very inefficient.
>
> Actually I was thinking about another but similar approach: in many cases
> we can rewrite a subquery in WHERE clause into JOIN subquery.
>
> Like the following:
>
> SELECT x.* FROM x WHERE x.a = (SELECT MAX(y.a) FROM y WHERE y.b = x.b)
>
>  ===>
>
> SELECT x.* FROM x, (SELECT MAX(y.a), y.b FROM y GROUP BY y.b) z WHERE x.b =
> z.b
>
> There are still problems here:
>
> 1. We will not be able to rewrite all the queries.
> 2. We should not rewrite queries like this by default because this will
> have a noticeable performance penalty for correctly collocated subqueries.
> Probably we will need some flag for that.
>
> Sergi
>
> 2017-05-31 21:26 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > Igniters (specifically Sergi),
> >
> > It has come to my attention today that nested sub-select statements, when
> > used in combination with non-collocated joins do not work properly in
> > Ignite.
> >
> > So a query like this, where A, B, and C are all stored in Partitioned
> > caches and are **not** collocated at all, will not work.
> >
> >
> > > *select * from A, B where a.id <http://a.id> = b.a_id and b.somefield
> in
> > > (select somefield from C where c.zipcode = ?)*
> >
> >
> > The main reason it is not supported right now is because, in the absence
> of
> > collocation, such query may create N^N complexity and it was decided that
> > it is best not supporting it at all.
> >
> > However, I am not sure why N^N complexity is required. Why not support it
> > as follows?
> >
> >    1. execute the nested subquery and store the result in a temporary
> >    Replicated table.
> >    2. execute the original query and use the temporary Replicated table
> >    instead of the sub-query.
> >
> > Sergi, given that you are the author of the code, can you provide some
> > insight here?
> >
> > Thanks,
> > D.
> >
>