You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Jacobo Coll <ja...@gmail.com> on 2015/12/15 14:48:02 UTC

Unable to filter using local indexes

Hi,

I recently started testing HBase/Phoenix as a storage solution for our
data. The problem is that I am not able to execute some "simple queries". I
am using phoenix 4.5.2 and hbase 1.0.0-cdh5.4.0. After creating a table and
making some selects (full scan), I started using local indexes to
accelerate it. As I know, global indexes only work for covered queries, but
"Unlike global indexes, local indexes *will* use an index even when all
columns referenced in the query are not contained in the index". Knowing
this, I prepared this simple test where the query will use one index and
filter the result using other columns:

-- Create table
create table test_table (mykey varchar primary key, col1 varchar, col2
varchar);
create local index idx2 on test_table (col2);
upsert into test_table (mykey, col1, col2) values('k1', 'v1-1', 'v1-2');
upsert into test_table (mykey, col1, col2) values('k2', 'v2-1', 'v2-2');

-- select using the index
select * from test_table where col2 = 'v1-2';
explain select * from test_table where col2 = 'v1-2';

-- select using the index and filtering the result, my goal
select * from test_table where col2 = 'v1-2' and col1 = 'v1-1';
explain select * from test_table where col2 = 'v1-2' and col1 = 'v1-1';


The first query is using the index, as it's seen in the explanation:
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE
[-32768,'v1-2']
    SERVER FILTER BY FIRST KEY ONLY

But the second query is executing a table scan, instead of using the index
and then filtering:
CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST_TABLE
    SERVER FILTER BY (COL2 = 'v1-2' AND COL1 = 'v1-1')

I was expecting something like:
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE
[-32768,'v1-2']
    SERVER FILTER BY "COL1" = 'v1-1'

Some of the workarounds that I have tried:

- Include col1 in the index, and it works, but I expect to have many
columns, and the overhead is unaffordable.
create local index idx2inc1 on test_table (col2) include (col1);

- I had read in this Huawei slices about hindex (
http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon_hindex_0.pdf)
that local indexes are supposed to work together, so I tried creating an
index on col1 to see if the query works:
create local index idx1 on test_table (col1);
Ideally this would be the best solution for my problem, so I can filter
using any column. But still not using any index.

- I also tried to force to use a hint, but throws an
ColumnNotFoundException:
select /*+ INDEX(TEST_TABLE IDX2) */ * from test_table where col2 = 'v1-2'
and col1 = 'v1-1';
Error: ERROR 504 (42703): Undefined column. columnName=COL1
(state=42703,code=504)
org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703):
Undefined column. columnName=COL1
    at
org.apache.phoenix.compile.WhereCompiler$WhereExpressionCompiler.resolveColumn(WhereCompiler.java:190)
    at
org.apache.phoenix.compile.WhereCompiler$WhereExpressionCompiler.visit(WhereCompiler.java:169)
    at
org.apache.phoenix.compile.WhereCompiler$WhereExpressionCompiler.visit(WhereCompiler.java:156)
    at
org.apache.phoenix.parse.ColumnParseNode.accept(ColumnParseNode.java:56)
    at
org.apache.phoenix.parse.CompoundParseNode.acceptChildren(CompoundParseNode.java:64)
    .......
I can imagine that it's attempting to filter col1 directly from the index
table.

- I also tried to make a join with itself, but I don't think that this is a
good solution:
explain select mykey,col1,col2 from test_table join ( select mykey as k
from test_table where col2 = 'v1-2') as sq1 on mykey = sq1.k where col1 =
'v1-1'
CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TEST_TABLE
    SERVER FILTER BY COL1 = 'v1-1'
    PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)
        CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE
[-32768,'v1-2']
            SERVER FILTER BY FIRST KEY ONLY
    DYNAMIC SERVER FILTER BY TEST_TABLE.MYKEY IN (SQ1.K)

Am I doing something wrong, or missing some step? Or it is impossible what
I am trying to do?
Is there some way to do this without include the other columns in the index?

Thanks,
Jacobo Coll

Re: Unable to filter using local indexes

Posted by Jacobo Coll <ja...@gmail.com>.
Hi Afshin,

I appreciate the answer, but I already took that into account in my first
message,
> I tried to include col1 in the index, and it works, but I expect to have
many columns, and the overhead is unaffordable.
and I also would want to know if this limitation is by design or if it is a
bug

Thanks,
Jacobo Coll

2015-12-15 15:03 GMT+00:00 Afshin Moazami <Af...@guavus.com>:

> Hi Nacho,
>
> One solution is to include the columns that you like to query over, in
> your index creation query.
> This works for me:
>
> 0: jdbc:phoenix:localhost> create local index idx2 on test_table (col2)
> include (col1);
> 2 rows affected (0.642 seconds)
> 0: jdbc:phoenix:localhost> explain select * from test_table where col2 =
> 'v1-2' and col1 = 'v1-1';
> +------------------------------------------+
> |                   PLAN                   |
> +------------------------------------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE
> [-32767,'v1-2'] |
> |     SERVER FILTER BY "COL1" = 'v1-1'     |
> +------------------------------------------+
>
>
>
> I hope it helps.
>
> Best,
> Afshin
>
> On Dec 15, 2015, at 8:48 AM, Jacobo Coll <ja...@gmail.com> wrote:
>
> create table test_table (mykey varchar primary key, col1 varchar, col2
> varchar);
> create local index idx2 on test_table (col2);
> upsert into test_table (mykey, col1, col2) values('k1', 'v1-1', 'v1-2');
> upsert into test_table (mykey, col1, col2) values('k2', 'v2-1', 'v2-2');
>
>
>

Re: Unable to filter using local indexes

Posted by Afshin Moazami <Af...@guavus.com>.
Hi Nacho,

One solution is to include the columns that you like to query over, in your index creation query.
This works for me:

0: jdbc:phoenix:localhost> create local index idx2 on test_table (col2) include (col1);
2 rows affected (0.642 seconds)
0: jdbc:phoenix:localhost> explain select * from test_table where col2 = 'v1-2' and col1 = 'v1-1';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_TEST_TABLE [-32767,'v1-2'] |
|     SERVER FILTER BY "COL1" = 'v1-1'     |
+------------------------------------------+



I hope it helps.

Best,
Afshin
On Dec 15, 2015, at 8:48 AM, Jacobo Coll <ja...@gmail.com>> wrote:

create table test_table (mykey varchar primary key, col1 varchar, col2 varchar);
create local index idx2 on test_table (col2);
upsert into test_table (mykey, col1, col2) values('k1', 'v1-1', 'v1-2');
upsert into test_table (mykey, col1, col2) values('k2', 'v2-1', 'v2-2');