You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fang-Yu Rao (Jira)" <ji...@apache.org> on 2020/01/15 23:23:00 UTC
[jira] [Comment Edited] (IMPALA-9281) Inferred predicates not assigned to scan nodes when views are involved

    [ https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016353#comment-17016353 ] 

Fang-Yu Rao edited comment on IMPALA-9281 at 1/15/20 11:22 PM:
---------------------------------------------------------------

After some initial investigation, I found that the field of {{assignedConjunctsByTupleId}} of {{Analyzer}}, which according to the comment stores all registered inferred conjuncts is always empty throughout every query described above (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L358-L362).

Moreover, the method {{createInferredEqPred()}} at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1364-L1375 has never been called during the planning.



was (Author: fangyurao):
After some initial investigation, I found that the field of {{assignedConjunctsByTupleId}} of {{Analyzer}}, which according to the comment stores all registered inferred conjuncts is always empty throughout every query described above (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L358-L362).

Moreover, the method {{createInferredEqPred()}} at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1364-L1375 has never been called during the planning.

Since the example provided in the problem description is Kudu specific and the problem we are having here is not only specific to Kudu, I will update the description accordingly to make it easier for us to reproduce the issue.


> Inferred predicates not assigned to scan nodes when views are involved
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-9281
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9281
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>         Attachments: profile_query_1_parquet.txt, profile_query_2_parquet.txt
>
>
> When a query involves the join of views each created based on multiple tables, the inferred predicate(s) is(are) not assigned to the scan node(s). This issue is/seems related to https://issues.apache.org/jira/browse/IMPALA-4578#.
> In the following a minimum example to reproduce the phenomenon.
> {code:java}
> CREATE TABLE default.pt1 (
>    c1 INT,
>    c2 STRING
> ) 
> STORED AS PARQUET;
> insert into pt1 values (1, 'one');
> CREATE TABLE default.pt2 (
>    c1 INT,
>    c2 STRING
> ) 
> STORED AS PARQUET;
> insert into pt2 values (2, 'two');
> CREATE TABLE default.pta1 (
>    c1a INT, 
>    c2a STRING
> )
> STORED AS PARQUET;
> insert into pta1 values (1,'one');
> CREATE TABLE default.pta2 (
>    c1a INT, 
>    c2a STRING
> )
> STORED AS PARQUET;
> insert into pta2 values (2,'two');
> CREATE VIEW myview_1_on_2_parquet_tables AS 
> SELECT 'ONE' table_source, c1, c2 FROM `default`.pt1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.pt2;
> CREATE VIEW myview_2_on_2_parquet_tables AS  
> SELECT 'ONE' table_source_a, c1a, c2a FROM `default`.pta1 
> UNION ALL 
> SELECT 'TWO' table_source_a, c1a, c2a FROM `default`.pta2;
> {code}
> For easy reference, the contents of tables {{pt1}}, {{pt2}}, {{pta1}}, {{pta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given as follows.
> Contents of table {{pt1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{pt2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents of table {{pta1}} afterwards:
> {code:java}
> +-----+-----+
> | c1a | c2a |
> +-----+-----+
> | 1   | one |
> +-----+-----+
> {code}
> Contents of table {{pta2}} afterwards:
> {code:java}
> +-----+-----+
> | c1a | c2a |
> +-----+-----+
> | 2   | two |
> +-----+-----+
> {code}
> Contents in {{myview_1_on_2_parquet_tables}} (union of tables {{t1}} and {{t2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> Contents in {{myview_2_on_2_parquet_tables}} (union of tables {{ta1}} and {{ta2}}):
> {code:java}
> +----------------+-----+-----+
> | table_source_a | c1a | c2a |
> +----------------+-----+-----+
> | ONE            | 1   | one |
> | TWO            | 2   | two |
> +----------------+-----+-----+
> {code}
> After creating the related tables and views described above, we consider the following 2 queries.
> Query 1:
> {code:java}
> select * 
> from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b 
> where a.table_source = 'ONE' 
> and a.table_source = b.table_source_a;
> {code}
> Query 2:
> {code:java}
> select * 
> from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b 
> where a.table_source = 'ONE' 
> and b.table_source_a = 'ONE' 
> and a.table_source = b.table_source_a;
> {code}
> Both queries join those 2 views on the column {{table_source}} and filter out those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the same result set as the following.
> {code:java}
> +--------------+----+-----+----------------+-----+-----+
> | table_source | c1 | c2  | table_source_a | c1a | c2a |
> +--------------+----+-----+----------------+-----+-----+
> | ONE          | 1  | one | ONE            | 1   | one |
> +--------------+----+-----+----------------+-----+-----+
> {code}
> However, according to the query profile, Query 1 results in 3 scans on tables {{pt1}}, {{pta1}}, and {{pta2}}, respectively. On the other hand, Query 2 that incorporates the additional/redundant predicate "{{b.table_source_a = 'ONE}}'" only involves 2 scans on tables {{pt1}} and {{pta1}}, respectively due to this seemingly redundant predicate on {{b.table_source_a}}.
> Hence, it can be seen that the plan generated from Query 1 is sub-optimal since a table that cannot contain any row in the result set is still scanned, i.e., table {{pta2}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org