You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fang-Yu Rao (Jira)" <ji...@apache.org> on 2020/01/08 18:58:00 UTC
[jira] [Updated] (IMPALA-9281) Inferred predicates not assigned to Kudu scan nodes when views are involved

     [ https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fang-Yu Rao updated IMPALA-9281:
--------------------------------
    Description: 
When a query involves the join of views each created based on multiple Kudu tables, the inferred predicate(s) is(are) not assigned to the Kudu scan node(s). This issue is/seems related to https://issues.apache.org/jira/browse/IMPALA-4578#.

In the following a minimum example to reproduce the phenomenon is given.To create the tables in the provided example below, please replace the address(es) of the Kudu master(s) accordingly.
{code:java}
CREATE TABLE default.t1 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU 
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into t1 values (1, 'one');

CREATE TABLE default.t2 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU 
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into t2 values (2, 'two');

CREATE TABLE default.ta1 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into ta1 values (1,'one');

CREATE TABLE default.ta2 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into ta2 values (2,'two');

CREATE VIEW myview_1_on_2_tables AS  
SELECT 'ONE' table_source, c1, c2 FROM `default`.t1 
UNION ALL 
SELECT 'TWO' table_source, c1, c2 FROM `default`.t2;

CREATE VIEW myview_2_on_2_tables AS  
SELECT 'ONE' table_source, c1, c2 FROM `default`.ta1 
UNION ALL 
SELECT 'TWO' table_source, c1, c2 FROM `default`.ta2;
{code}
For easy reference, the contents of tables {{t1}}, {{t2}}, {{ta1}}, {{ta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given as follows.

Contents of table {{t1}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 1  | one |
+----+-----+
{code}
Contents of table {{t2}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 2  | two |
+----+-----+
{code}
Contents of table {{ta1}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 1  | one |
+----+-----+
{code}
Contents of table {{ta2}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 2  | two |
+----+-----+
{code}
Contents in {{default.myview_1_on_2_tables}} (union of tables {{t1}} and {{t2}}):
{code:java}
+--------------+----+-----+
| table_source | c1 | c2  |
+--------------+----+-----+
| ONE          | 1  | one |
| TWO          | 2  | two |
+--------------+----+-----+
{code}
Contents in {{default.myview_2_on_2_tables}} (union of tables{{ta1}} and {{ta2}}):
{code:java}
+--------------+----+-----+
| table_source | c1 | c2  |
+--------------+----+-----+
| ONE          | 1  | one |
| TWO          | 2  | two |
+--------------+----+-----+
{code}
After creating the related tables and views described above, we consider the following 2 queries.

Query 1:
{code:java}
select count(*) 
from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
where a.table_source = 'ONE' 
and a.table_source = b.table_source 
and a.c2 = 'one';
{code}
Query 2:
{code:java}
select count(*) 
from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
where a.table_source = 'ONE' 
and b.table_source = 'ONE' 
and a.table_source = b.table_source 
and a.c2 = 'one';
{code}
Both queries join those 2 views on the column {{table_source}} and filter out those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the same result set as the following.
{code:java}
+----------+
| count(*) |
+----------+
| 1        |
+----------+
{code}
However, according to the query profile, Query 1 results in 3 Kudu scans on tables {{t1}}, {{ta1}}, and {{ta2}}, respectively. On the other hand, Query 2 that incorporates the additional/redundant predicate "{{b.table_source = 'ONE}}'" only involves 2 Kudu scans on tables {{t1}} and {{ta1}}, respectively due to this seemingly redundant predicate on {{b.table_source}}.

Hence, it can be seen that the plan generated from Query 1 is sub-optimal since a table that cannot contain any row in the result set is still scanned, i.e., table {{ta2}}.

  was:
When a query involves the join of views each created based on multiple Kudu tables, the inferred predicate(s) is(are) not assigned to the Kudu scan node(s). This issue is/seems related to https://issues.apache.org/jira/browse/IMPALA-4578#.

In the following a minimum example to reproduce the phenomenon is given.To create the tables in the provided example below, please replace the address(es) of the Kudu master(s) accordingly.
{code:java}
CREATE TABLE default.t1 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU 
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into t1 values (1, 'one');

CREATE TABLE default.t2 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU 
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into t2 values (2, 'two');

CREATE TABLE default.ta1 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into ta1 values (1,'one');

CREATE TABLE default.ta2 (
   c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
   PRIMARY KEY (c1)
)
PARTITION BY HASH (c1) PARTITIONS 2
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
insert into ta2 values (2,'two');

CREATE VIEW myview_1_on_2_tables AS  
SELECT 'ONE' table_source, c1, c2 FROM `default`.t1 
UNION ALL 
SELECT 'TWO' table_source, c1, c2 FROM `default`.t2;

CREATE VIEW myview_2_on_2_tables AS  
SELECT 'ONE' table_source, c1, c2 FROM `default`.ta1 
UNION ALL 
SELECT 'TWO' table_source, c1, c2 FROM `default`.ta2;
{code}
For easy reference, the contents of tables {{t1}}, {{t2}}, {{ta1}}, {{ta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given as follows.

Contents of table {{t1}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 1  | one |
+----+-----+
{code}
Contents of table {{t2}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 2  | two |
+----+-----+
{code}
Contents of table {{ta1}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 1  | one |
+----+-----+
{code}
Contents of table {{ta2}} afterwards:
{code:java}
+----+-----+
| c1 | c2  |
+----+-----+
| 2  | two |
+----+-----+
{code}
Contents in {{default.myview_1_on_2_tables}}:
{code:java}
+--------------+----+-----+
| table_source | c1 | c2  |
+--------------+----+-----+
| ONE          | 1  | one |
| TWO          | 2  | two |
+--------------+----+-----+
{code}
Contents in {{default.myview_2_on_2_tables}}:
{code:java}
+--------------+----+-----+
| table_source | c1 | c2  |
+--------------+----+-----+
| ONE          | 1  | one |
| TWO          | 2  | two |
+--------------+----+-----+
{code}
After creating the related tables and views described above, we consider the following 2 queries.

Query 1:
{code:java}
select count(*) 
from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
where a.table_source = 'ONE' 
and a.table_source = b.table_source 
and a.c2 = 'one';
{code}
Query 2:
{code:java}
select count(*) 
from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
where a.table_source = 'ONE' 
and b.table_source = 'ONE' 
and a.table_source = b.table_source 
and a.c2 = 'one';
{code}
Both queries join those 2 views on the column {{table_source}} and filter out those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the same result set as the following.
{code:java}
+----------+
| count(*) |
+----------+
| 1        |
+----------+
{code}
However, according to the query profile, Query 1 results in 3 Kudu scans on tables {{t1}}, {{ta1}}, and {{ta2}}, respectively. On the other hand, Query 2 that incorporates the additional/redundant predicate "{{b.table_source = 'ONE}}'" only involves 2 Kudu scans on tables {{t1}} and {{ta1}}, respectively due to this seemingly redundant predicate on {{b.table_source}}.

Hence, it can be seen that the plan generated from Query 1 is sub-optimal since a table that cannot contain any row in the result set is still scanned, i.e., table {{ta2}}.


> Inferred predicates not assigned to Kudu scan nodes when views are involved
> ---------------------------------------------------------------------------
>
>                 Key: IMPALA-9281
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9281
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>
> When a query involves the join of views each created based on multiple Kudu tables, the inferred predicate(s) is(are) not assigned to the Kudu scan node(s). This issue is/seems related to https://issues.apache.org/jira/browse/IMPALA-4578#.
> In the following a minimum example to reproduce the phenomenon is given.To create the tables in the provided example below, please replace the address(es) of the Kudu master(s) accordingly.
> {code:java}
> CREATE TABLE default.t1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t1 values (1, 'one');
> CREATE TABLE default.t2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU 
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into t2 values (2, 'two');
> CREATE TABLE default.ta1 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta1 values (1,'one');
> CREATE TABLE default.ta2 (
>    c1 INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    c2 STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
>    PRIMARY KEY (c1)
> )
> PARTITION BY HASH (c1) PARTITIONS 2
> STORED AS KUDU
> TBLPROPERTIES ('kudu.master_addresses'='10.16.0.115');
> insert into ta2 values (2,'two');
> CREATE VIEW myview_1_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.t1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.t2;
> CREATE VIEW myview_2_on_2_tables AS  
> SELECT 'ONE' table_source, c1, c2 FROM `default`.ta1 
> UNION ALL 
> SELECT 'TWO' table_source, c1, c2 FROM `default`.ta2;
> {code}
> For easy reference, the contents of tables {{t1}}, {{t2}}, {{ta1}}, {{ta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are also given as follows.
> Contents of table {{t1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{t2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents of table {{ta1}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 1  | one |
> +----+-----+
> {code}
> Contents of table {{ta2}} afterwards:
> {code:java}
> +----+-----+
> | c1 | c2  |
> +----+-----+
> | 2  | two |
> +----+-----+
> {code}
> Contents in {{default.myview_1_on_2_tables}} (union of tables {{t1}} and {{t2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> Contents in {{default.myview_2_on_2_tables}} (union of tables{{ta1}} and {{ta2}}):
> {code:java}
> +--------------+----+-----+
> | table_source | c1 | c2  |
> +--------------+----+-----+
> | ONE          | 1  | one |
> | TWO          | 2  | two |
> +--------------+----+-----+
> {code}
> After creating the related tables and views described above, we consider the following 2 queries.
> Query 1:
> {code:java}
> select count(*) 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Query 2:
> {code:java}
> select count(*) 
> from default.myview_1_on_2_tables a, myview_2_on_2_tables b 
> where a.table_source = 'ONE' 
> and b.table_source = 'ONE' 
> and a.table_source = b.table_source 
> and a.c2 = 'one';
> {code}
> Both queries join those 2 views on the column {{table_source}} and filter out those rows not satisfying {{table_source = 'ONE'}}. Both queries produce the same result set as the following.
> {code:java}
> +----------+
> | count(*) |
> +----------+
> | 1        |
> +----------+
> {code}
> However, according to the query profile, Query 1 results in 3 Kudu scans on tables {{t1}}, {{ta1}}, and {{ta2}}, respectively. On the other hand, Query 2 that incorporates the additional/redundant predicate "{{b.table_source = 'ONE}}'" only involves 2 Kudu scans on tables {{t1}} and {{ta1}}, respectively due to this seemingly redundant predicate on {{b.table_source}}.
> Hence, it can be seen that the plan generated from Query 1 is sub-optimal since a table that cannot contain any row in the result set is still scanned, i.e., table {{ta2}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org