You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Marcin Januszkiewicz <ka...@gmail.com> on 2017/03/03 08:59:51 UTC

Phoenix ignoring index and index hint with some queries over mapped hbase tables.

Hi,

I have a table in hbase and created a view of it in phoenix, along
with a local index:

create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR,
"cf"."number" VARCHAR, "cf"."class" VARCHAR);
create local index idxf on "dmstab_vehicle_traces" ("cf"."number",
"cf"."class", rowkey);

I need to filter rows based on a regex condition on the "number"
column, so I use queries like these:

explain select * from "traces" where regexp_substr("number", 'Q.*') = "number";

CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2]
    SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*',
1) = "number"


It's a little ugly and less efficient than using LIKE, but the
performance is still relatively acceptable thanks to the index.

However, if I want to range of rowkeys to include, Phoenix stops using
the index, which slows down the query significantly:

explain select * from "traces" where regexp_substr("number", 'Q.*') =
"number" and rowkey < 'BY';

CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN
RANGE SCAN OVER traces [*] - ['BY']
    SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number"

Using an index hint doesn't change anything. Is there a way to make
this work, and is this a bug?

Re: Phoenix ignoring index and index hint with some queries over mapped hbase tables.

Posted by Marcin Januszkiewicz <ka...@gmail.com>.

On Fri, Mar 3, 2017 at 7:52 PM, James Taylor <ja...@apache.org> wrote:
> Hi Marcin,
> There's a few things going on here:
> 1) An index created on a view over HBase data won't be maintained by
> Phoenix. You'd need to maintain it yourself through some other external
> means. If you create a table that maps to your HBase data, then it will be
> maintained.

UPSERTING data into the view will also work, right?

> 2) An index would only be used if you match against a constant on the
> right-hand side (while you're matching against the "number" column). For
> example, the following query would use the index and limit the scan to only
> rows in which "number" starts with 'Queen':
>
> 0: jdbc:phoenix:> explain select * from "traces" where
> regexp_substr("number", 'Q.*') = 'Queen';
> +------------------------------------------------------------------------------+
> |                                            PLAN
> |
> +------------------------------------------------------------------------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces
> [1,'Queen'] |
> |     SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1)
> =  |
> +------------------------------------------------------------------------------+
>
> Note that with local indexes, interpreting when the index is used is a bit
> subtle, but you'll see a [1, ...] after the RANGE SCAN as an indication. The
> index may be used if it's a full table scan (since the data contained in the
> index table may be smaller than that in the data table), but that won't
> buying you very much.
> 3) The index would only be used if your REGEXP_SUBSTR has a constant string
> before any wildcard matches in the second argument. You could also
> potentially use a function index [1], but it'd only use the index if the
> REGEXP_SUBSTR makes the same exact call (i.e. same pattern argument) as was
> used when the functional index was created.
>
> HTH.  Thanks,
>
>     James
>
> [1] https://phoenix.apache.org/secondary_indexing.html#Functional_Indexes


So, if I understand correctly, my best bet would be to use some
heuristics to hopefully extract a coarse prefix range from the regex?
That way I can do a preliminary range scan on the index and refine the
filtering on the returned data. I assume I could implement this as a
UDF with an appropriate newKeyPart method?



>
> On Fri, Mar 3, 2017 at 12:59 AM, Marcin Januszkiewicz
> <ka...@gmail.com> wrote:
>>
>> Hi,
>>
>> I have a table in hbase and created a view of it in phoenix, along
>> with a local index:
>>
>> create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR,
>> "cf"."number" VARCHAR, "cf"."class" VARCHAR);
>> create local index idxf on "dmstab_vehicle_traces" ("cf"."number",
>> "cf"."class", rowkey);
>>
>> I need to filter rows based on a regex condition on the "number"
>> column, so I use queries like these:
>>
>> explain select * from "traces" where regexp_substr("number", 'Q.*') =
>> "number";
>>
>> CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2]
>>     SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*',
>> 1) = "number"
>>
>>
>> It's a little ugly and less efficient than using LIKE, but the
>> performance is still relatively acceptable thanks to the index.
>>
>> However, if I want to range of rowkeys to include, Phoenix stops using
>> the index, which slows down the query significantly:
>>
>> explain select * from "traces" where regexp_substr("number", 'Q.*') =
>> "number" and rowkey < 'BY';
>>
>> CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN
>> RANGE SCAN OVER traces [*] - ['BY']
>>     SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number"
>>
>> Using an index hint doesn't change anything. Is there a way to make
>> this work, and is this a bug?
>
>

Re: Phoenix ignoring index and index hint with some queries over mapped hbase tables.

Posted by James Taylor <ja...@apache.org>.

Hi Marcin,
There's a few things going on here:
1) An index created on a view over HBase data won't be maintained by
Phoenix. You'd need to maintain it yourself through some other external
means. If you create a table that maps to your HBase data, then it will be
maintained.
2) An index would only be used if you match against a constant on the
right-hand side (while you're matching against the "number" column). For
example, the following query would use the index and limit the scan to only
rows in which "number" starts with 'Queen':

0: jdbc:phoenix:> explain select * from "traces" where
regexp_substr("number", 'Q.*') = 'Queen';
+------------------------------------------------------------------------------+
|                                            PLAN
   |
+------------------------------------------------------------------------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces
[1,'Queen'] |
|     SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1)
=  |
+------------------------------------------------------------------------------+

Note that with local indexes, interpreting when the index is used is a bit
subtle, but you'll see a [1, ...] after the RANGE SCAN as an indication.
The index may be used if it's a full table scan (since the data contained
in the index table may be smaller than that in the data table), but that
won't buying you very much.

3) The index would only be used if your REGEXP_SUBSTR has a constant string
before any wildcard matches in the second argument. You could also
potentially use a function index [1], but it'd only use the index if the
REGEXP_SUBSTR makes the same exact call (i.e. same pattern argument) as was
used when the functional index was created.

HTH.  Thanks,

    James

[1] https://phoenix.apache.org/secondary_indexing.html#Functional_Indexes

On Fri, Mar 3, 2017 at 12:59 AM, Marcin Januszkiewicz <
katamaran666@gmail.com> wrote:

> Hi,
>
> I have a table in hbase and created a view of it in phoenix, along
> with a local index:
>
> create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR,
> "cf"."number" VARCHAR, "cf"."class" VARCHAR);
> create local index idxf on "dmstab_vehicle_traces" ("cf"."number",
> "cf"."class", rowkey);
>
> I need to filter rows based on a regex condition on the "number"
> column, so I use queries like these:
>
> explain select * from "traces" where regexp_substr("number", 'Q.*') =
> "number";
>
> CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2]
>     SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*',
> 1) = "number"
>
>
> It's a little ugly and less efficient than using LIKE, but the
> performance is still relatively acceptable thanks to the index.
>
> However, if I want to range of rowkeys to include, Phoenix stops using
> the index, which slows down the query significantly:
>
> explain select * from "traces" where regexp_substr("number", 'Q.*') =
> "number" and rowkey < 'BY';
>
> CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN
> RANGE SCAN OVER traces [*] - ['BY']
>     SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number"
>
> Using an index hint doesn't change anything. Is there a way to make
> this work, and is this a bug?
>