You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Geoffrey Jacoby (Jira)" <ji...@apache.org> on 2020/06/11 01:29:00 UTC
[jira] [Commented] (PHOENIX-5950) View With Where Clause On A Table
With Composite Key Should Be Able To Optimize Queries
[ https://issues.apache.org/jira/browse/PHOENIX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132819#comment-17132819 ]
Geoffrey Jacoby commented on PHOENIX-5950:
------------------------------------------
Thanks for researching this [~m2je]. I've definitely also seen poor performance on IndexScrutiny runs of views and your proposed solution makes sense.
> View With Where Clause On A Table With Composite Key Should Be Able To Optimize Queries
> ----------------------------------------------------------------------------------------
>
> Key: PHOENIX-5950
> URL: https://issues.apache.org/jira/browse/PHOENIX-5950
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Mehdi Salarkia
> Assignee: Mehdi Salarkia
> Priority: Major
>
> For a table with a composite primary
> {code:java}
> CREATE TABLE MY_TABLE (K1 INTEGER NOT NULL, K2 VARCHAR NOT NULL,K3 INTEGER NOT NULL, V1 DECIMAL, CONSTRAINT pk PRIMARY KEY (K1, K2, K3))
> {code}
> when a view is created that has some (and not all) of primary key columns
> {code:java}
> CREATE VIEW MY_VIEW(v2 VARCHAR, V3 VARCHAR ) AS SELECT * FROM MY_TABLE WHERE K2 = 'A'
> {code}
> if you run a query on the view without providing all the primary columns
> {code:java}
> EXPLAIN SELECT K1, K2, K3, V1 FROM MY_VIEW WHERE (K1,K3) IN ((1,2),(3,4));
> +------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
> +------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN SKIP SCAN ON 2 KEYS OVER MY_TABLE [1,'A'] - [3,'A'] | null | null | null |
> | SERVER FILTER BY (K1, K3) IN ([128,0,0,1,128,0,0,2],[128,0,0,3,128,0,0,4]) | null | null | null |
> +------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> 2 rows selected (0.047 seconds)
> {code}
> the query generated is scan rather than a point look up
> same query on the parent table (with all the keys) looks like this
> {code:java}
> EXPLAIN SELECT K1, K2, K3, V1 FROM MY_TABLE WHERE (K1,K2,K3) IN ((1,'A',2),(3,'A',4));
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | CLIENT 1-CHUNK 2 ROWS 268 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 2 KEYS OVER MY_TABLE | 268 | 2 | 0 |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> 1 row selected (0.025 seconds)
> {code}
> The issue is view condition is always added as `AND` to user provided where clause and in this case query optimizer is failing to optimize this query to a point look up.
>
> -------------------------------------------------------------- *[Affected Use Case]* ----------------------------------------------------------------------------
> The impact of this issue is most visible when you try to run IndexScrutiny tool on a view with an index which generates queries like:
> {code:java}
> SELECT /*+ NO_INDEX */ CAST("K1" AS INTEGER),CAST("K2" AS VARCHAR),CAST("V1" AS DECIMAL),CAST("0"."V2" AS VARCHAR),CAST("0"."V3" AS VARCHAR) FROM MY_VIEW WHERE ("K1","K3") IN ((?,?),(?,?));
> {code}
> and has very poor performance and causes performance degradation.
> -------------------------------------------------------------- *[POSSIBLE WORKAROUND]* ----------------------------------------------------------------------------
> One possible workaround is to provide all the pk (including the view pk columns)
> {code:java}
> EXPLAIN SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN ((1,'A',2),(3,'A',4));
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | CLIENT 1-CHUNK 2 ROWS 632 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 2 KEYS OVER MY_TABLE | 632 | 2 | 0 |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> {code}
> but as you can see the projected _EST_BYTES_READ_ goes up because the underlying query that gets executed is something like:
> {code:java}
> SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN ((1,'A',2),(3,'A',4)) AND K2 = 'A';
> {code}
> and certainly the `_AND K2 = 'A'_` is redundant.
>
>
> ------------------------------------------------------------ -- -*[PROPOSED SOLUTION]*- -- ----------------------------------------------------------------------------
> we can make the view condition to be injected into any partial primary key lookup (tuple style conditions) respecting the same order for columns defined in the parent table
--
This message was sent by Atlassian Jira
(v8.3.4#803005)