You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Csaba Ringhofer (JIRA)" <ji...@apache.org> on 2018/09/11 14:21:00 UTC

[jira] [Created] (IMPALA-7559) Inconsistent parquet stat filtering of timestamps at dst change

Csaba Ringhofer created IMPALA-7559:
---------------------------------------

             Summary: Inconsistent parquet stat filtering of timestamps at dst change
                 Key: IMPALA-7559
                 URL: https://issues.apache.org/jira/browse/IMPALA-7559
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Csaba Ringhofer


If the min/max value of a timestamp column chunk is during the hour of the Summer->Winter dst change (UTC+2 -> UTC+1 in CET) then stat filtering can drop row groups that contain rows that would be "ok" for the predicate otherwise.

To reproduce (on current master branch):
{code}
1. it is assumed that the timezone is CET and that flag convert_legacy_hive_parquet_utc_timestamps is enabled
( export TZ=CET; bin/start-impala-cluster.py --impalad_args="-convert_legacy_hive_parquet_utc_timestamps=true" )
2. create a table in hive and fill data in 3 inserts to create 3 files:
create table t (i int, d timestamp) stored as parquet;
insert into t values (1, "2017-10-29 02:30:00"), (2, "2018-10-28 02:30:00");
insert into t values (3, "2018-10-28 02:30:00");
insert into t values (4, "2017-10-29 02:30:00")
3. Query from Impala
set num_nodes=1;
select * from t; -- returns all 4 values (same as Hive) 
select * from t where d = "2017-10-29 02:30:00"; -- returns 1 in Impala (Hive returns 1,4)
select * from t where d = "2018-10-28 02:30:00"; -- returns 2 in Impala (Hive returns 2,3)
profile; -- NumStatsFilteredRowGroups: 2 (only one row group should have been stat filtered)
select * from t where d = "2018-10-28 02:30:00" or i = 5; -- return 2 and 3 in Impala (same as Hive)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org