You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/01/08 22:43:00 UTC
[jira] [Commented] (IMPALA-9281) Inferred predicates not assigned to scan nodes when views are involved

    [ https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011045#comment-17011045 ] 

Aman Sinha commented on IMPALA-9281:
------------------------------------

[~fangyurao] so it looks like it is unrelated to type of underlying table. Couple of places to check are: (a) in the Analyzer.createEquivConjuncts() method, see if the new inferred predicate (i.e b.table_source = 'ONE') was created or not, (b) if it was created, then check SingleNodePlanner.migrateConjunctsToInlineView() to see whether it was migrated into the view.

This should be a straightforward predicate inference, but there may be something to do with the fact the 'table_source' is not a column in the base tables but is only present in the views. For example, if you change the join condition to the base table column 'c2',  does it do the right thing ?  i.e run the following query: 
{noformat}
select * 
from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
where a.c2 = b.c2 
and a.c2 = 'one'; {noformat}

> Inferred predicates not assigned to scan nodes when views are involved
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9281
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9281
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>         Attachments: profile_query_1_kudu.txt, profile_query_1_parquet.txt, profile_query_2_kudu.txt, profile_query_2_parquet.txt
>
>
> When a query involves the join of views each created based on multiple tables, the inferred predicate(s) is(are) not assigned to the scan node(s). This issue is/seems related to https://issues.apache.org/jira/browse/IMPALA-4578#.
> In the following a minimum example to reproduce the phenomenon using views created based on Kudu tables is given. However, we note that this issue is not Kudu specific, it also exists when the underlying tables are of Parquet format. The respective query profiles are also attached.
> To create the (Kudu) tables in the provided example below, please replace the address(es) of the Kudu master(s) accordingly.
> {code:java}
> CREATE TABLE default.t1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t1 values (1, 'one');
> CREATE TABLE default.t2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t2 values (2, 'two');
> CREATE TABLE default.ta1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta1 values (1,'one');
> CREATE TABLE default.ta2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta2 values (2,'two');
> CREATE VIEW myview_1_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.t1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.t2;
> CREATE VIEW myview_2_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.ta1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.ta2;
> {code}
> For easy reference, the contents of tables {{t1}}, {{t2}}, {{ta1}}, {{ta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given as follows.
> Contents of table {{t1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{t2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents of table {{ta1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{ta2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents in {{default.myview_1_on_2_tables}} (union of tables {{t1}} and {{t2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> Contents in {{default.myview_2_on_2_tables}} (union of tables {{ta1}} and {{ta2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> After creating the related tables and views described above, we consider the following 2 queries.
> Query 1:
> {code:java}
> select * 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Query 2:
> {code:java}
> select * 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and b.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Both queries join those 2 views on the column {{table_source}} and filter out those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the same result set as the following.
> {code:java}
> +--------------+----+-----+--------------+----+-----+
> | table_source | c1 | c2  | table_source | c1 | c2  |
> +--------------+----+-----+--------------+----+-----+
> | ONE          | 1  | one | ONE          | 1  | one |
> +--------------+----+-----+--------------+----+-----+
> {code}
> However, according to the query profile, Query 1 results in 3 Kudu scans on tables {{t1}}, {{ta1}}, and {{ta2}}, respectively. On the other hand, Query 2 that incorporates the additional/redundant predicate "{{b.table_source = 'ONE}}'" only involves 2 Kudu scans on tables {{t1}} and {{ta1}}, respectively due to this seemingly redundant predicate on {{b.table_source}}.
> Hence, it can be seen that the plan generated from Query 1 is sub-optimal since a table that cannot contain any row in the result set is still scanned, i.e., table {{ta2}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org