You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2020/04/29 23:08:00 UTC

[jira] [Created] (IMPALA-9707) Parquet stat filtering issue when min/max values are cast to NULL

Csaba Ringhofer created IMPALA-9707:
---------------------------------------

             Summary: Parquet stat filtering issue when min/max values are cast to NULL
                 Key: IMPALA-9707
                 URL: https://issues.apache.org/jira/browse/IMPALA-9707
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Frontend
            Reporter: Csaba Ringhofer


This issue can occur if there is a cast during the evaluation of the min/max stats and the min or the max value are cast to NULL.

Example:
{code}
create table ts (dt string) stored as parquet;
insert into ts values ("2010-01-01"), ("non ts");

set PARQUET_READ_STATISTICS=1;
select * from ts where dt = cast("2010-01-01" as timestamp); -- returns 0 rows

set PARQUET_READ_STATISTICS=0;
select * from ts where dt = cast("2010-01-01" as timestamp); -- returns 1 row
{code}

The issue doesn't occur if "non ts" is not added to the table.
I think the root cause is that cast(max_stat_for_dt as timestamp) >= cast("2010-01-01") is evaluated during stat filtering, and as "non ts" is the biggest STRING in the table, we'll cast it to TIMESTAMP, which returns NULL. As <= with NULL always returns NULL, Impala will think that the row group doesn't contain values <= 2010-01-01.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org