You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by William <yh...@163.com> on 2016/09/23 04:24:35 UTC

Skip scan optimization failed for multi pk columns

Hi all,
   This is a simple scenario, there are two tables:
   create table t1 (pk integer primary key, a integer);
   create table t2 (pk1 integer not null, pk2 integer not null, a integer constraint pk primary key (pk1, pk2));


   Do the following selects:
  1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and pk < 40);
  results:     SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
  
  2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30 and pk1 < 40);
  results:     FULL SCAN OVER T2 
                   SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30 AND PK1 < 40))


    Apparently, 2nd SELECT statement should use skip scan instead of full table scan. But T2 has two pk columns and then WhereOptimizer failed to optimize it. I went through the code and made a small improvement.
    In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the attached patch file for detail. The main idea is we allow slot in childSlot is null, only if all slots afterwards are null too. So the following statements are still rejected:
    select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 < 40)


Please review this. Thanks.
William.

Re: Skip scan optimization failed for multi pk columns

Posted by James Taylor <ja...@apache.org>.

William,
Yes, PHOENIX-1801 is a good one too. IMHO, we should do that work in the
calcite branch where we can leverage the cost based optimizer framework in
Calcite.
Thanks,
James

On Sat, Sep 24, 2016 at 7:36 AM, Ted Yu <yu...@gmail.com> wrote:

> William:
> I don't see attachment in your original email.
>
> Mind sharing the JIRA number once you open the JIRA (I assume the patch
> would be attached there) ?
>
> Thanks
>
> On Thu, Sep 22, 2016 at 9:24 PM, William <yh...@163.com> wrote:
>
> > Hi all,
> >    This is a simple scenario, there are two tables:
> >    create table t1 (pk integer primary key, a integer);
> >    create table t2 (pk1 integer not null, pk2 integer not null, a integer
> > constraint pk primary key (pk1, pk2));
> >
> >    Do the following selects:
> >   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> > pk < 40);
> >   results:     SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
> >
> >   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> > and pk1 < 40);
> >   results:     FULL SCAN OVER T2
> >                    SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> > AND PK1 < 40))
> >
> >     Apparently, 2nd SELECT statement should use skip scan instead of full
> > table scan. But T2 has two pk columns and then WhereOptimizer failed to
> > optimize it. I went through the code and made a small improvement.
> >     In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> > attached patch file for detail. The main idea is we allow slot in
> childSlot
> > is null, only if all slots afterwards are null too. So the following
> > statements are still rejected:
> >     select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2
> <
> > 40)
> >
> > Please review this. Thanks.
> > William.
> >
> >
> >
> >
> >
>

Re:Re: Skip scan optimization failed for multi pk columns

Posted by William <yh...@163.com>.

Ted:
The email in my 'send box' has the attachment, but the email received didn't. I don't why.
I filed a jira, see https://issues.apache.org/jira/browse/PHOENIX-3328. 
I only add one test case for this. We already have some UT and IT for other scenarios that WHERE clause shouldn't be optimized to skip scan filter. i.e. EvaluationOfORIT.

Please take a look at it. 


Thanks.
William.

At 2016-09-24 22:36:08, "Ted Yu" <yu...@gmail.com> wrote:
>William:
>I don't see attachment in your original email.
>
>Mind sharing the JIRA number once you open the JIRA (I assume the patch
>would be attached there) ?
>
>Thanks
>
>On Thu, Sep 22, 2016 at 9:24 PM, William <yh...@163.com> wrote:
>
>> Hi all,
>>    This is a simple scenario, there are two tables:
>>    create table t1 (pk integer primary key, a integer);
>>    create table t2 (pk1 integer not null, pk2 integer not null, a integer
>> constraint pk primary key (pk1, pk2));
>>
>>    Do the following selects:
>>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
>> pk < 40);
>>   results:     SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>>
>>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
>> and pk1 < 40);
>>   results:     FULL SCAN OVER T2
>>                    SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
>> AND PK1 < 40))
>>
>>     Apparently, 2nd SELECT statement should use skip scan instead of full
>> table scan. But T2 has two pk columns and then WhereOptimizer failed to
>> optimize it. I went through the code and made a small improvement.
>>     In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
>> attached patch file for detail. The main idea is we allow slot in childSlot
>> is null, only if all slots afterwards are null too. So the following
>> statements are still rejected:
>>     select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
>> 40)
>>
>> Please review this. Thanks.
>> William.
>>
>>
>>
>>
>>

Re: Skip scan optimization failed for multi pk columns

Posted by Ted Yu <yu...@gmail.com>.

William:
I don't see attachment in your original email.

Mind sharing the JIRA number once you open the JIRA (I assume the patch
would be attached there) ?

Thanks

On Thu, Sep 22, 2016 at 9:24 PM, William <yh...@163.com> wrote:

> Hi all,
>    This is a simple scenario, there are two tables:
>    create table t1 (pk integer primary key, a integer);
>    create table t2 (pk1 integer not null, pk2 integer not null, a integer
> constraint pk primary key (pk1, pk2));
>
>    Do the following selects:
>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> pk < 40);
>   results:     SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>
>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> and pk1 < 40);
>   results:     FULL SCAN OVER T2
>                    SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> AND PK1 < 40))
>
>     Apparently, 2nd SELECT statement should use skip scan instead of full
> table scan. But T2 has two pk columns and then WhereOptimizer failed to
> optimize it. I went through the code and made a small improvement.
>     In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> attached patch file for detail. The main idea is we allow slot in childSlot
> is null, only if all slots afterwards are null too. So the following
> statements are still rejected:
>     select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
> 40)
>
> Please review this. Thanks.
> William.
>
>
>
>
>

Re:Re: Skip scan optimization failed for multi pk columns

Posted by William <yh...@163.com>.

OK James, I'll do this tomorrow and have a look at those tickets.
Also, I am planning to do this, https://issues.apache.org/jira/browse/PHOENIX-1801. I'd like to support intersect and union scenario first, then sort union. 
Actually the QueryOptimizer doesn't 'choose' the appropriate index tables, but just generate all probable query plans then try to find the 'best'.
If we want to support complete index merge algorithm, we have to introduce cost model and this is not an easy job. Do you have any suggestions? 


Thanks.
William.


At 2016-09-23 14:46:41, "James Taylor" <ja...@apache.org> wrote:
>Great find, William! Please file a JIRA with a test case and a patch, and
>we'll get this reviewed and committed.
>
>If you're interested and have time, there are a couple more issues related
>to skip scan that you could take a look at too: PHOENIX-1439, PHOENIX-307,
>and PHOENIX-1193.
>
>Thanks,
>James
>
>On Thu, Sep 22, 2016 at 9:24 PM, William <yh...@163.com> wrote:
>
>> Hi all,
>>    This is a simple scenario, there are two tables:
>>    create table t1 (pk integer primary key, a integer);
>>    create table t2 (pk1 integer not null, pk2 integer not null, a integer
>> constraint pk primary key (pk1, pk2));
>>
>>    Do the following selects:
>>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
>> pk < 40);
>>   results:     SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>>
>>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
>> and pk1 < 40);
>>   results:     FULL SCAN OVER T2
>>                    SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
>> AND PK1 < 40))
>>
>>     Apparently, 2nd SELECT statement should use skip scan instead of full
>> table scan. But T2 has two pk columns and then WhereOptimizer failed to
>> optimize it. I went through the code and made a small improvement.
>>     In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
>> attached patch file for detail. The main idea is we allow slot in childSlot
>> is null, only if all slots afterwards are null too. So the following
>> statements are still rejected:
>>     select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
>> 40)
>>
>> Please review this. Thanks.
>> William.
>>
>>
>>
>>
>>

Re: Skip scan optimization failed for multi pk columns

Posted by James Taylor <ja...@apache.org>.

Great find, William! Please file a JIRA with a test case and a patch, and
we'll get this reviewed and committed.

If you're interested and have time, there are a couple more issues related
to skip scan that you could take a look at too: PHOENIX-1439, PHOENIX-307,
and PHOENIX-1193.

Thanks,
James

On Thu, Sep 22, 2016 at 9:24 PM, William <yh...@163.com> wrote:

> Hi all,
>    This is a simple scenario, there are two tables:
>    create table t1 (pk integer primary key, a integer);
>    create table t2 (pk1 integer not null, pk2 integer not null, a integer
> constraint pk primary key (pk1, pk2));
>
>    Do the following selects:
>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> pk < 40);
>   results:     SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>
>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> and pk1 < 40);
>   results:     FULL SCAN OVER T2
>                    SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> AND PK1 < 40))
>
>     Apparently, 2nd SELECT statement should use skip scan instead of full
> table scan. But T2 has two pk columns and then WhereOptimizer failed to
> optimize it. I went through the code and made a small improvement.
>     In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> attached patch file for detail. The main idea is we allow slot in childSlot
> is null, only if all slots afterwards are null too. So the following
> statements are still rejected:
>     select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
> 40)
>
> Please review this. Thanks.
> William.
>
>
>
>
>