You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2022/11/22 07:20:00 UTC

[jira] [Assigned] (HIVE-26737) Subquery returning wrong results when database has materialized views

     [ https://issues.apache.org/jira/browse/HIVE-26737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krisztian Kasa reassigned HIVE-26737:
-------------------------------------

    Assignee: Steve Carlin

> Subquery returning wrong results when database has materialized views
> ---------------------------------------------------------------------
>
>                 Key: HIVE-26737
>                 URL: https://issues.apache.org/jira/browse/HIVE-26737
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Steve Carlin
>            Assignee: Steve Carlin
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When HS2 has materialized views in its registry, subqueries with correlated variables may return wrong results.
> An example of this:
>  
> {code:java}
> CREATE TABLE t_test1(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE t_test2(
>   id int,
>   int_col int,
>   year int,
>   month int 
> );
> CREATE TABLE dummy (
>   id int 
> ) stored as orc TBLPROPERTIES ('transactional'='true');
> CREATE MATERIALIZED VIEW need_a_mat_view_in_registry AS
> SELECT * FROM dummy where id > 5;
> INSERT INTO t_test1 VALUES (1, 1, 2009, 1), (10,0, 2009, 1); 
> INSERT INTO t_test2 VALUES (1, 1, 2009, 1); 
> select id, int_col, year, month from t_test1 s where s.int_col = (select count(*) from t_test2 t where s.id = t.id) order by id; 
> {code}
> The select statement should produce 2 rows, but it is only producing one.
> The CBO plan produced has an inner join instead of a left join.
> {code:java}
> HiveSortLimit(sort0=[$0], dir0=[ASC])
>   HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3])
>     HiveJoin(condition=[AND(=($0, $5), =($4, $6))], joinType=[inner], algorithm=[none], cost=[not available])
>       HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3], CAST=[CAST($1):BIGINT])
>         HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL(CAST($1):BIGINT))])
>           HiveTableScan(table=[[default, t_test1]], table:alias=[s])
>       HiveProject(id=[$0], $f1=[$1])
>         HiveFilter(condition=[IS NOT NULL($1)])
>           HiveAggregate(group=[{0}], agg#0=[count()])
>             HiveFilter(condition=[IS NOT NULL($0)])
>               HiveTableScan(table=[[default, t_test2]], table:alias=[t]){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)