You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2019/01/10 21:03:00 UTC

[jira] [Commented] (PHOENIX-5096) Local index region pruning is not working as expected.

    [ https://issues.apache.org/jira/browse/PHOENIX-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739769#comment-16739769 ] 

Lars Hofhansl commented on PHOENIX-5096:
----------------------------------------

I have a table loaded with 4 regions:

create table test (pk1 integer not null, pk2 integer not null, pk3 integer not null, v1 float, v2 float, v3 integer CONSTRAINT pk PRIMARY KEY (pk1, pk2, pk3));
create local index l1 on test(v1);

0: jdbc:phoenix:localhost> explain select count(*) from test;
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                         PLAN                                         | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER TEST [1]  | 314572800       | 5518821        | 1547153796030  |
|     SERVER FILTER BY FIRST KEY ONLY                                                  | 314572800       | 5518821        | 1547153796030  |
|     SERVER AGGREGATE INTO SINGLE ROW                                                 | 314572800       | 5518821        | 1547153796030  |
+--------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.087 seconds)


0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 10000;
+------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                              PLAN                                              | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 2042680 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER TEST [*] - [10000]  | 314572800       | 2042680        | 1547153795523  |
|     SERVER FILTER BY FIRST KEY ONLY                                                            | 314572800       | 2042680        | 1547153795523  |
|     SERVER AGGREGATE INTO SINGLE ROW                                                           | 314572800       | 2042680        | 1547153795523  |
+------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.081 seconds)

Now Phoenix is only scanning 1 region, i.e. all pks starting with pk1 < 10000 are located in the first of the 4 regions.

0: jdbc:phoenix:localhost> explain select count(*) from test where pk1 < 4000000;
+--------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                               PLAN                                               | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 2-CHUNK 2042680 ROWS 314572800 BYTES PARALLEL 1-WAY RANGE SCAN OVER TEST [*] - [4000000]  | 314572800       | 2042680        | 1547153795523  |
|     SERVER FILTER BY FIRST KEY ONLY                                                              | 314572800       | 2042680        | 1547153795523  |
|     SERVER AGGREGATE INTO SINGLE ROW                                                             | 314572800       | 2042680        | 1547153795523  |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.064 seconds)

pks starting with pk1 < 10000 are all located in the first 2 regions.

0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1;
+------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                           PLAN                                           | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER TEST [1,0.1]  | 314572800       | 5518821        | 1547153796030  |
|     SERVER FILTER BY FIRST KEY ONLY                                                      | 314572800       | 5518821        | 1547153796030  |
|     SERVER AGGREGATE INTO SINGLE ROW                                                     | 314572800       | 5518821        | 1547153796030  |
+------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.065 seconds)

Using index l1, needs to consult all 4 regions, as expected.

0: jdbc:phoenix:localhost> explain select count(*) from test where v1 = 0.1 and pk1 < 10000;
+------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                    PLAN                                                    | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 4-CHUNK 5518821 ROWS 314572800 BYTES PARALLEL 4-WAY RANGE SCAN OVER TEST [1,0.1,*] - [1,0.1,10000]  | 314572800       | 5518821        | 1547153796030  |
|     SERVER FILTER BY FIRST KEY ONLY                                                                        | 314572800       | 5518821        | 1547153796030  |
|     SERVER AGGREGATE INTO SINGLE ROW                                                                       | 314572800       | 5518821        | 1547153796030  |
+------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
3 rows selected (0.063 seconds)

This is where I expect Phoenix to use the restriction of pk1 to determine that any results can only possible be located in the first region, and avoid even looking at the other regions. That's not what is happening. This query will still consult every region in the table.

> Local index region pruning is not working as expected.
> ------------------------------------------------------
>
>                 Key: PHOENIX-5096
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5096
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> The pruning of local indexes should do the following:
> * Use the local index
> * Reduce the number of regions based on filters on the table's primary (i.e. WHERE conditions including prefixes of the primary key)
> Instead it looks like in order for this to work the needed PK column need to be included in the local index itself, changing the sort order of the local index.
> I'll provide some examples in the comments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)