You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2016/02/12 00:23:18 UTC

[jira] [Comment Edited] (ORC-10) Fix timestamp moving between timezones (HIVE-8746 in C++)

    [ https://issues.apache.org/jira/browse/ORC-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643487#comment-14643487 ] 

Owen O'Malley edited comment on ORC-10 at 2/11/16 11:22 PM:
------------------------------------------------------------

To implement this, I need the ability to take a timezone like "America/Los_Angeles" and be able to determine the UTC offset at arbitrary points in time. This precise information is encoded in the tzinfo files that are present on MacOS and Linux, but unfortunately libc uses the files to implement localtime, but doesn't expose the information we need. Futhermore, the public domain library https://www.iana.org/time-zones/repository/tz-link.html has the code, but doesn't present a clean API to the functionality we need either.

Boost does contain the necessary libraries, but I'd rather not pull in boost as a dependence.

At this point, I believe the best course is to implement a simple file reader that reads the tzinfo files that are installed on the system and use them to do the conversions that we need. I expect the code to read the files to be
roughly 150 lines of code. I'll add an environment variable (TZHOME) for systems that put their tzinfo files someplace other than the standard /usr/share/zoneinfo/.

On the other side, I'll add a StripeStreams method that provides the writer's timezone information and the reader's timezone information.



was (Author: owen.omalley):
To implement this, I need the ability to take a timezone like "America/Los_Angeles" and be able to determine the UTC offset at arbitrary points in time. This precise information is encoded in the tzinfo files that are present on MacOS and Linux, but unfortunately libc uses the files to implement localtime, but doesn't expose the information we need. Futhermore, the public domain library https://www.iana.org/time-zones/repository/tz-link.html has the code, but doesn't present a clean API to the functionality we need either.

Boost does contain the necessary libraries, but I'd rather not pull in boost as a dependence.

At this point, I believe the best course is to implement a simple file reader that reads the tzinfo files that are installed on the system and use them to do the conversions that we need. I expect the code to read the files to be
roughly 150 lines of code. I'll add an environment variable (ORC_TZINFO_DIR) for systems that put their tzinfo files someplace other than the standard /usr/share/zoneinfo/.

On the other side, I'll add a StripeStreams method that provides the writer's timezone information and the reader's timezone information.


> Fix timestamp moving between timezones (HIVE-8746 in C++)
> ---------------------------------------------------------
>
>                 Key: ORC-10
>                 URL: https://issues.apache.org/jira/browse/ORC-10
>             Project: Orc
>          Issue Type: Bug
>          Components: C++, encoding
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 1.1.0
>
>
> ORC stores the difference from an epoch in the local timezone. That makes ORC mostly resilient when moving between timezones, but fails when moving between timezones with different daylight savings rules.
> ORC currently stores the timezone information in the stripe footer, so that the reader can understand the times correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)