You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Tao Yang (JIRA)" <ji...@apache.org> on 2015/04/02 07:11:52 UTC

[jira] [Comment Edited] (PHOENIX-1015) Support joining back to data table row from local index when query condition involves leading columns in local index

    [ https://issues.apache.org/jira/browse/PHOENIX-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392163#comment-14392163 ] 

Tao Yang edited comment on PHOENIX-1015 at 4/2/15 5:11 AM:
-----------------------------------------------------------

Hi  [~rajesh23]
I have some questions about local index of phoenix in the scene below:
A table has multiple columns and I need to query from this table by random combinations of these columns.
For example: a table has three columns(c1,c2,c3) and queries should be made by some differnt combinations(one or two or three columns in where clause)

You are the committer of both hindex and phoenix. I have learned hindex by your introduction document (From here: http://www.slideshare.net/rajeshbabuchintaguntla/apache-con-hindex), and I think phoenix also can make local index work for the query above( generate range scan for every column in where clause, then merge in memory to get required results ), so I just need to create one local index for every column. But it seems to be different with what I think.

Here is the schema I used:
CREATE TABLE TEST_DATA_TABLE(PK VARCHAR PRIMARY KEY,COLUMN1 VARCHAR,COLUMN2 VARCHAR,COLUMN3 BIGINT) SALT_BUCKETS=8; 
CREATE LOCAL INDEX LI_1 ON TEST_DATA_TABLE(COLUMN1);
CREATE LOCAL INDEX LI_2 ON TEST_DATA_TABLE(COLUMN2);

I use the sql below to get the number of required data by two conditions in where clause:
SELECT COUNT(*) FROM TEST_DATA_TABLE WHERE COLUMN1='1' AND COLUMN2='1’

Here is the explain of the sql above:
| CLIENT 8-CHUNK PARALLEL 8-WAY FULL SCAN OVER TEST_DATA_TABLE |
|     SERVER FILTER BY (COLUMN1 = '1' AND COLUMN2 = '1') |
|     SERVER AGGREGATE INTO SINGLE ROW     |

According to the plan above, we know the local index does not work for this query. 

Here are my questions:
(1) Why local index of phoenix does not support this query? 
(2) Is there a plan to improve?

Thank you very much!


was (Author: tao yang):
Hi  [~rajesh23]
I have some questions about local index of phoenix in the scene below:
A table has multiple columns and I need to query from this table by random combinations of these columns.
For example: a table has three columns(c1,c2,c3) and queries should be made by some differnt combinations(one or two or three columns in where clause)

You are the committer of both hindex and phoenix. I have learned hindex by your introduction document (From here: http://www.slideshare.net/rajeshbabuchintaguntla/apache-con-hindex), and I think phoenix also can make local index work for the query above( generate range scan for every column in where clause, then merge in memory to get required results ), so I just need to create one local index for every column. But it seems to be different with what I think.

Here is the schema I used:
CREATE TABLE TEST_DATA_TABLE(PK VARCHAR PRIMARY KEY,COLUMN1 VARCHAR,COLUMN2 VARCHAR,COLUMN3 BIGINT) SALT_BUCKETS=8; 
CREATE LOCAL INDEX LI_1 ON TEST_DATA_TABLE(COLUMN1);
CREATE LOCAL INDEX LI_2 ON TEST_DATA_TABLE(COLUMN2);

I use the sql below to get the number of required data by two conditions in where clause:
SELECT COUNT(*) FROM TEST_DATA_TABLE WHERE COLUMN1='1' AND COLUMN2='1’

Here is the explain of the sql above:
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 8-CHUNK PARALLEL 8-WAY FULL SCAN OVER TEST_DATA_TABLE |
|     SERVER FILTER BY (COLUMN1 = '1' AND COLUMN2 = '1') |
|     SERVER AGGREGATE INTO SINGLE ROW     |
+------------------------------------------+   

According to the plan above, we know the local index does not work for this query. 

Here are my questions:
(1) Why local index of phoenix does not support this query? 
(2) Is there a plan to improve?

Thank you very much!

> Support joining back to data table row from local index when query condition involves leading columns in local index
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1015
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1015
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: rajeshbabu
>            Assignee: rajeshbabu
>             Fix For: 5.0.0, 4.1.0
>
>         Attachments: PHOENIX-1015.patch, PHOENIX-1015_v6.patch, PHOENIX-1015_v7.patch, PHOENIX-1015_v8.patch, PHOENIX-1015_v8.rar, PHOENIX-1015_v9.patch
>
>
> When a query involves more columns to project than columns in index and query condition involves leading columns in local index then first we can get matching rowkeys from local index table and then get the required columns from data table. In local index both data region and index region co-reside in the same RS, we can call get on data region to get the missing columns in the index, without any n/w overhead. So it's efficient. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)