You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hankó Gergely (Jira)" <ji...@apache.org> on 2022/09/21 10:22:00 UTC

[jira] [Updated] (HIVE-26552) PartitionConditionRemover doesn't remove constant filter with structs inside

     [ https://issues.apache.org/jira/browse/HIVE-26552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hankó Gergely updated HIVE-26552:
---------------------------------
    Description: 
Repro:
{code:java}
set hive.fetch.task.conversion=none;

create table test (a string) partitioned by (y string, m string);
insert into test values ('aa', 2022, 9);

explain vectorization expression select * from test where (y=year(date_sub('2022-09-11',4)) and m=month(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10)) and m=month(date_sub('2022-09-11',10)) ); {code}
Actual:
{code:java}
(...)
Filter Operator
  Filter Vectorization:
      className: VectorFilterOperator
      native: true
      predicateExpression: SelectColumnIsTrue(col 5:boolean)(children: VectorUDFAdaptor((const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), const struct(2022.0D,9.0D))) -> 5:boolean)
  predicate: (const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), const struct(2022.0D,9.0D)) (type: boolean)
  Statistics: Num rows: 1 Data size: 454 Basic stats: COMPLETE Column stats: COMPLETE 
(...){code}
Expected:

The filter operator should be optimized out similarly as it is removed in the following query:
{code:java}
explain vectorization expression select * from test where (y=year(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10))); {code}
 

  was:
Repro:
{code:java}
set hive.fetch.task.conversion=none;

create table test (a string) partitioned by (y string, m string);
insert into test values ('aa', 2022, 9);

explain vectorization expression select * from test where (y=year(date_sub('2022-09-11',4)) and m=month(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10)) and m=month(date_sub('2022-09-11',10)) ); {code}
Actual:
{code:java}
Filter Operator
  Filter Vectorization:
      className: VectorFilterOperator
      native: true
      predicateExpression: SelectColumnIsTrue(col 5:boolean)(children: VectorUDFAdaptor((const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), const struct(2022.0D,9.0D))) -> 5:boolean)
  predicate: (const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), const struct(2022.0D,9.0D)) (type: boolean)
  Statistics: Num rows: 1 Data size: 454 Basic stats: COMPLETE Column stats: COMPLETE {code}
Expected:

The filter operator should be optimized out similarly as it is removed in the following query:
{code:java}
explain vectorization expression select * from test where (y=year(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10))); {code}


> PartitionConditionRemover doesn't remove constant filter with structs inside
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-26552
>                 URL: https://issues.apache.org/jira/browse/HIVE-26552
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Hankó Gergely
>            Priority: Major
>
> Repro:
> {code:java}
> set hive.fetch.task.conversion=none;
> create table test (a string) partitioned by (y string, m string);
> insert into test values ('aa', 2022, 9);
> explain vectorization expression select * from test where (y=year(date_sub('2022-09-11',4)) and m=month(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10)) and m=month(date_sub('2022-09-11',10)) ); {code}
> Actual:
> {code:java}
> (...)
> Filter Operator
>   Filter Vectorization:
>       className: VectorFilterOperator
>       native: true
>       predicateExpression: SelectColumnIsTrue(col 5:boolean)(children: VectorUDFAdaptor((const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), const struct(2022.0D,9.0D))) -> 5:boolean)
>   predicate: (const struct(2022.0D,9.0D)) IN (const struct(2022.0D,9.0D), const struct(2022.0D,9.0D)) (type: boolean)
>   Statistics: Num rows: 1 Data size: 454 Basic stats: COMPLETE Column stats: COMPLETE 
> (...){code}
> Expected:
> The filter operator should be optimized out similarly as it is removed in the following query:
> {code:java}
> explain vectorization expression select * from test where (y=year(date_sub('2022-09-11',4))) or (y=year(date_sub('2022-09-11',10))); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)