You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2018/10/23 14:35:00 UTC

[jira] [Updated] (IMPALA-7730) Improve ORC File Format Timezone issues

     [ https://issues.apache.org/jira/browse/IMPALA-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philip Zeyliger updated IMPALA-7730:
------------------------------------
    Attachment: orc.zip

> Improve ORC File Format Timezone issues
> ---------------------------------------
>
>                 Key: IMPALA-7730
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7730
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 3.0
>            Reporter: Philip Zeyliger
>            Priority: Major
>         Attachments: orc.zip
>
>
> As pointed out in https://gerrit.cloudera.org/#/c/11731 by [~csringhofer], our support for the ORC file format doesn't follow the same timezone conventions as the rest of Impala.
> {quote}
> tldr: ORC's timezone handling is likely to be broken in Impala so we should patch it in the toolchain
> The ORC library implements its own IANA timezone handling to convert stored timestamps from UTC to local time + do something similar for min/max stats. The writer's timezone can be also stored in .orc files and used instead of local timezone.
> Impala's and ORC library's timezone can be different because of several reasons:
> ORC's timezone is not overridden by env var TZ and query option timezone
> ORC uses a simpler way to detect the local timezone which may not work on some Linux distros (see TimezoneDatabase::LocalZoneName in Impala vs LOCAL_TIMEZONE in Orc)
> .orc files can use any time zone as writer's timezone and we cannot be sure that it will exist on the reader machine
> My suggestion is to patch the ORC library in the toolchain and remove timezone handling (e.g. by always using UTC, maybe depending on a flag), as the way it is currently working is likely to be broken and is surely not consistent with the rest of Impala.
> I am not sure how timezones could be handled correctly in Orc + Impala. If someone plans to work on it, I would gladly help in the integration to Impala.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org