You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "Fabian Groffen (JIRA)" <ji...@apache.org> on 2019/06/27 12:28:00 UTC

[jira] [Created] (ORC-528) orc-tools timestamps off by one?

Fabian Groffen created ORC-528:
----------------------------------

             Summary: orc-tools timestamps off by one?
                 Key: ORC-528
                 URL: https://issues.apache.org/jira/browse/ORC-528
             Project: ORC
          Issue Type: Bug
          Components: tools
    Affects Versions: 1.5.5, 1.6.0
            Reporter: Fabian Groffen


I'm trying to understand how do deal properly with timestamps.  I've created a CSV file with some crucial timestamps (at least I believe these are):

{{2019-01-01 00:00:00.0000}}
{{2015-01-01 00:00:00.0001}}
{{2015-01-01 00:00:00.0000}}
{{2014-12-31 23:59:59.9999}}
{{1970-01-01 00:00:00.0001}}
{{1970-01-01 00:00:00.0000}}
{{1969-12-31 23:59:59.9999}}
{{1969-12-31 23:59:59.0001}}
{{1969-12-31 23:59:59.0000}}
{{1969-12-31 23:59:58.9999}}

I've created an ORC file using hive-1.1.0-cdh5.14.2.  Hive is able to read this file back correctly.  All timestamps seem to match.  Reading the same file using orc-tools shows different results:

{{{"_col0":"2019-01-01 00:00:00.0"}}}
{{{"_col0":"2015-01-01 00:00:00.0001"}}}
{{{"_col0":"2015-01-01 00:00:00.0"}}}
{{{"_col0":"2014-12-31 23:59:59.9999"}}}
{{{"_col0":"1970-01-01 00:00:00.0001"}}}
{{{"_col0":"1970-01-01 00:00:00.0"}}}
{{{"_col0":"1969-12-31 23:59:58.9999"}}}
{{{"_col0":"1969-12-31 23:59:59.0001"}}}
{{{"_col0":"1969-12-31 23:59:59.0"}}}
{{{"_col0":"1969-12-31 23:59:57.9999"}}}

The actual result/difference here being the last and 4th from last row, which are one second off.

With some modifications I managed to have orc-tools generate a file itself with timestamps using convert (see ORC-526), which, when I read this one back in hive-1.1.0-cdh5.14.2 results in:

{{2019-01-01 00:00:00}}
{{2015-01-01 00:00:00.0001}}
{{2015-01-01 00:00:00}}
{{2014-12-31 23:59:59.9999}}
{{1970-01-01 00:00:00.0001}}
{{1970-01-01 00:00:00}}
{{1970-01-01 00:00:00.9999}}
{{1969-12-31 23:59:59.0001}}
{{1969-12-31 23:59:59}}
{{1969-12-31 23:59:59.9999}}

which is also wrong: 4th row from bottom and on the last row by one second, but this time in the other direction.  When I read the file with orc-tools itself, it shows correct output (58) for the last row, but incorrect ouput for the 4th from bottom.  I noticed orc-tools-1.2.0 cannot read the file from 1.6.0.  1.3.4 can, which also results in the incorrect output.

{{orc-tools-1.6.0:}}
{{{"mytime":"2019-01-01 00:00:00.0"}}}
{{{"mytime":"2015-01-01 00:00:00.0001"}}}
{{{"mytime":"2015-01-01 00:00:00.0"}}}
{{{"mytime":"2014-12-31 23:59:59.9999"}}}
{{{"mytime":"1970-01-01 00:00:00.0001"}}}
{{{"mytime":"1970-01-01 00:00:00.0"}}}
{{{"mytime":"1970-01-01 00:00:00.9999"}}}
{{{"mytime":"1969-12-31 23:59:59.0001"}}}
{{{"mytime":"1969-12-31 23:59:59.0"}}}
{{{"mytime":"1969-12-31 23:59:58.9999"}}}

{{orc-tools-1.3.4:}}
{{{"mytime":"2019-01-01 00:00:00.0"}}}
{{{"mytime":"2015-01-01 00:00:00.0001"}}}
{{{"mytime":"2015-01-01 00:00:00.0"}}}
{{{"mytime":"2014-12-31 23:59:59.9999"}}}
{{{"mytime":"1970-01-01 00:00:00.0001"}}}
{{{"mytime":"1970-01-01 00:00:00.0"}}}
{{{"mytime":"1970-01-01 00:00:00.9999"}}}
{{{"mytime":"1969-12-31 23:59:58.0001"}}}
{{{"mytime":"1969-12-31 23:59:59.0"}}}
{{{"mytime":"1969-12-31 23:59:58.9999"}}}

I'm getting a bit lost at what's right and wrong, but I'm getting the feeling something doesn't add up here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)