You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Prasanth Jayachandran (JIRA)" <ji...@apache.org> on 2017/01/25 21:56:27 UTC

[jira] [Updated] (ORC-135) PPD for timestamp is wrong when reader and writer timezones are different

     [ https://issues.apache.org/jira/browse/ORC-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasanth Jayachandran updated ORC-135:
--------------------------------------
    Description: 
When reader and writer timezones are different, PPD evaluation does not offset the timezone when reading the min and max values. This can result is wrong PPD evaluation and hence incorrect results.

Example:
Table written in US/Eastern timezone. All values in this table are "2007-08-01 00:00:00.0".
{code:title=PPD disabled}
hive> set hive.optimize.index.filter=false;
hive> select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 00:00:00.0' limit 1;
2007-08-01 00:00:00.0
OK
{code}

{code:title=PPD enabled}
set hive.optimize.index.filter=true;
select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 00:00:00.0' limit 1;
OK
{code}
No rows are returned when PPD is enabled (reader timezone is UTC)

  was:
When reader and writer timezones are different, PPD evaluation does not offset the timezone when reading the min and max values. This can result is wrong PPD evaluation and hence incorrect results.

Example:
Table written in US/Eastern timezone. All values in this table are "2007-08-01 00:00:00.0".
{code:title=PPD disabled}
hive> set hive.optimize.index.filter=false;
hive> select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 00:00:00.0' limit 1;
2007-08-01 00:00:00.0
OK
{code}

{code:title=PPD enabled}
set hive.optimize.index.filter=true;
select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 00:00:00.0' limit 1;
OK
{code}
No rows are returned when PPD is enabled.


> PPD for timestamp is wrong when reader and writer timezones are different
> -------------------------------------------------------------------------
>
>                 Key: ORC-135
>                 URL: https://issues.apache.org/jira/browse/ORC-135
>             Project: Orc
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Critical
>
> When reader and writer timezones are different, PPD evaluation does not offset the timezone when reading the min and max values. This can result is wrong PPD evaluation and hence incorrect results.
> Example:
> Table written in US/Eastern timezone. All values in this table are "2007-08-01 00:00:00.0".
> {code:title=PPD disabled}
> hive> set hive.optimize.index.filter=false;
> hive> select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 00:00:00.0' limit 1;
> 2007-08-01 00:00:00.0
> OK
> {code}
> {code:title=PPD enabled}
> set hive.optimize.index.filter=true;
> select ORDER_DATE from ORDER_FACT_small where ORDER_DATE='2007-08-01 00:00:00.0' limit 1;
> OK
> {code}
> No rows are returned when PPD is enabled (reader timezone is UTC)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)