You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by James Taylor <ja...@apache.org> on 2015/07/01 03:55:44 UTC

Re: Query Hints on Functional Index

Thanks for the detail, Bryan. Looks like a bug (as you've concluded) in the
hinting mechanism wrt functional indexes. Would you mind filing a JIRA?
FWIW, I tried the following and when only columns contained in the
functional index are used, the functional index is used as desired (see
below).

Might be worth trying to include those 30+ cols in the functional index.
Your query performance will be much better. Have you tried that?

Thanks,
James

0: jdbc:phoenix:localhost> EXPLAIN SELECT /*+ INDEX(LOG
LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE LOWER(RQ)='/jquery';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) = '/jquery' |
+------------------------------------------+
2 rows selected (0.022 seconds)
0: jdbc:phoenix:localhost> EXPLAIN SELECT F FROM LOG WHERE LOWER(RQ) LIKE
'/jquery%';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX
[0,'/jquery'] - [0,'/jquerz'] |
|     SERVER FILTER BY FIRST KEY ONLY      |
+------------------------------------------+


On Tue, Jun 30, 2015 at 1:54 PM, Gerber, Bryan W <Br...@pnnl.gov>
wrote:

>  I am trying to replace case-insensitive query using JOIN/Subselect with
> one using a query hint. This has worked well for our fixed-case fields. So
> far I have been unable to convince Phoenix to use a functional index on a
> SELECT * query.  Is this a bug, or is it intended for functional indexes to
> ignore hints?
>
>
>
> This is the query syntax we are trying to replace using query hints (Avg
> time 0.25s):
>
>
>
> EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE
> LOWER(RQ) LIKE '/jquery%') AS A ON  (LOG.TS = A.TS AND LOG.F=A.F AND
> LOG.R=A.R);
>
> +------------------------------------------+
>
> | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
>
> |     PARALLEL INNER-JOIN TABLE 0          |
>
> |         CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER
> LOG_LOWER_REQUEST_IDX [0,'/jquery'] - [0,'/jquerz'] |
>
> |             SERVER FILTER BY FIRST KEY ONLY |
>
> |     DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F,
> A.R)) |
>
> +------------------------------------------+
>
>
>
> This is closer to the query we want - Hint on the non-functional index
> generates an expected, but non-optimal due to the ILIKE/function on RQ (avg
> time 0.78s)
>
> EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE
> '/jquery%';
>
> +------------------------------------------+
>
> | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
>
> |     SKIP-SCAN-JOIN TABLE 0               |
>
> |         CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX |
>
> |             SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' |
>
> |     DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN
> (($707.$709, $707.$710, $707.$711)) |
>
> +------------------------------------------+
>
>
>
> This is what we REALLY want it to do, but the hint is ignored for the
> functional index (Avg 2.57s)
>
>  EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE
> LOWER(RQ) LIKE '/jquery%';
>
> +------------------------------------------+
>
> | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
>
> |     SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' |
>
> +------------------------------------------+
>
>
>
> Test table has 2.9 million records; production table is many orders of
> magnitude larger. Table is actually much wider than sample schema below
> (30+ cols) so INCLUDE() isn't viable.  Users want all the columns, all the
> time.
>
> Here’s a simplified schema for the table:
>
> CREATE TABLE IF NOT EXISTS LOG
>
> (
>
>     TS VARCHAR NOT NULL,
>
>     f VARCHAR NOT NULL,
>
>     r INTEGER NOT NULL,
>
>     sa VARCHAR,
>
>     da VARCHAR,
>
>     rq VARCHAR
>
> CONSTRAINT pkey PRIMARY KEY (TS, f, r)
>
> )
> TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
>
>
> CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX  ON LOG(LOWER(rq))
> TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
> CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX  ON LOG(rq)
> TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
>
> Bryan G.
>
>
>

RE: Query Hints on Functional Index

Posted by "Gerber, Bryan W" <Br...@pnnl.gov>.
PHOENIX-2094 created.

Covering the queries isn’t a viable option right now, the table is 5TB already and we have multiple indexes where we are trying to optimize case-insensitive queries. Fortunately we can brute-force the plan with the JOIN & subselect to get what we need near-term.

Bryan G.

From: James Taylor [mailto:jamestaylor@apache.org]
Sent: Tuesday, June 30, 2015 6:56 PM
To: user
Subject: Re: Query Hints on Functional Index

Thanks for the detail, Bryan. Looks like a bug (as you've concluded) in the hinting mechanism wrt functional indexes. Would you mind filing a JIRA? FWIW, I tried the following and when only columns contained in the functional index are used, the functional index is used as desired (see below).

Might be worth trying to include those 30+ cols in the functional index. Your query performance will be much better. Have you tried that?

Thanks,
James

0: jdbc:phoenix:localhost> EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE LOWER(RQ)='/jquery';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) = '/jquery' |
+------------------------------------------+
2 rows selected (0.022 seconds)
0: jdbc:phoenix:localhost> EXPLAIN SELECT F FROM LOG WHERE LOWER(RQ) LIKE '/jquery%';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX [0,'/jquery'] - [0,'/jquerz'] |
|     SERVER FILTER BY FIRST KEY ONLY      |
+------------------------------------------+


On Tue, Jun 30, 2015 at 1:54 PM, Gerber, Bryan W <Br...@pnnl.gov>> wrote:
I am trying to replace case-insensitive query using JOIN/Subselect with one using a query hint. This has worked well for our fixed-case fields. So far I have been unable to convince Phoenix to use a functional index on a SELECT * query.  Is this a bug, or is it intended for functional indexes to ignore hints?

This is the query syntax we are trying to replace using query hints (Avg time 0.25s):

EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE LOWER(RQ) LIKE '/jquery%') AS A ON  (LOG.TS = A.TS AND LOG.F=A.F AND LOG.R=A.R);
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     PARALLEL INNER-JOIN TABLE 0          |
|         CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX [0,'/jquery'] - [0,'/jquerz'] |
|             SERVER FILTER BY FIRST KEY ONLY |
|     DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F, A.R)) |
+------------------------------------------+

This is closer to the query we want - Hint on the non-functional index generates an expected, but non-optimal due to the ILIKE/function on RQ (avg time 0.78s)
EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE '/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SKIP-SCAN-JOIN TABLE 0               |
|         CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX |
|             SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' |
|     DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN (($707.$709, $707.$710, $707.$711)) |
+------------------------------------------+

This is what we REALLY want it to do, but the hint is ignored for the functional index (Avg 2.57s)
 EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE LOWER(RQ) LIKE '/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' |
+------------------------------------------+

Test table has 2.9 million records; production table is many orders of magnitude larger. Table is actually much wider than sample schema below (30+ cols) so INCLUDE() isn't viable.  Users want all the columns, all the time.
Here’s a simplified schema for the table:
CREATE TABLE IF NOT EXISTS LOG
(
    TS VARCHAR NOT NULL,
    f VARCHAR NOT NULL,
    r INTEGER NOT NULL,
    sa VARCHAR,
    da VARCHAR,
    rq VARCHAR
CONSTRAINT pkey PRIMARY KEY (TS, f, r)
) TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX  ON LOG(LOWER(rq)) TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX  ON LOG(rq) TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
Bryan G.