You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (Jira)" <ji...@apache.org> on 2019/10/09 20:15:00 UTC

[jira] [Resolved] (ORC-528) orc-tools timestamps off by one?

     [ https://issues.apache.org/jira/browse/ORC-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved ORC-528.
-------------------------------
    Fix Version/s: 1.7.0
                   1.6.1
                   1.5.7
       Resolution: Fixed

I just committed this.

> orc-tools timestamps off by one?
> --------------------------------
>
>                 Key: ORC-528
>                 URL: https://issues.apache.org/jira/browse/ORC-528
>             Project: ORC
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 1.5.5, 1.6.0
>            Reporter: Fabian Groffen
>            Assignee: Yukihiro Okada
>            Priority: Minor
>             Fix For: 1.5.7, 1.6.1, 1.7.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> I'm trying to understand how do deal properly with timestamps.  I've created a CSV file with some crucial timestamps (at least I believe these are):
> {code:java}
> 2019-01-01 00:00:00.0000
>  2015-01-01 00:00:00.0001
>  2015-01-01 00:00:00.0000
>  2014-12-31 23:59:59.9999
>  1970-01-01 00:00:00.0001
>  1970-01-01 00:00:00.0000
>  1969-12-31 23:59:59.9999
>  1969-12-31 23:59:59.0001
>  1969-12-31 23:59:59.0000
>  1969-12-31 23:59:58.9999
> {code}
> I've created an ORC file using hive-1.1.0-cdh5.14.2.  Hive is able to read this file back correctly.  All timestamps seem to match.  Reading the same file using orc-tools shows different results:
>  
> {code:java}
> {{
> {"_col0":"2019-01-01 00:00:00.0"}
> }}
>  {{
> {"_col0":"2015-01-01 00:00:00.0001"}
> }}
>  {{
> {"_col0":"2015-01-01 00:00:00.0"}
> }}
>  {{
> {"_col0":"2014-12-31 23:59:59.9999"}
> }}
>  {{
> {"_col0":"1970-01-01 00:00:00.0001"}
> }}
>  {{
> {"_col0":"1970-01-01 00:00:00.0"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:58.9999"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:59.0001"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:59.0"}
> }}
>  {{
> {"_col0":"1969-12-31 23:59:57.9999"}
> }}
> {code}
>  
> The actual result/difference here being the last and 4th from last row, which are one second off.
> With some modifications I managed to have orc-tools generate a file itself with timestamps using convert (see ORC-526), which, when I read this one back in hive-1.1.0-cdh5.14.2 results in:
> {code:java}
> 2019-01-01 00:00:00
>  2015-01-01 00:00:00.0001
>  2015-01-01 00:00:00
>  2014-12-31 23:59:59.9999
>  1970-01-01 00:00:00.0001
>  1970-01-01 00:00:00
>  1970-01-01 00:00:00.9999
>  1969-12-31 23:59:59.0001
>  1969-12-31 23:59:59
>  1969-12-31 23:59:59.9999{code}
> which is also wrong: 4th row from bottom and on the last row by one second, but this time in the other direction.  When I read the file with orc-tools itself, it shows correct output (58) for the last row, but incorrect ouput for the 4th from bottom.  I noticed orc-tools-1.2.0 cannot read the file from 1.6.0.  1.3.4 can, which also results in the incorrect output.
> {{orc-tools-1.6.0:}}
> {code:java}
> {{
> {"mytime":"2019-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2014-12-31 23:59:59.9999"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.9999"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:59.0001"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:59.0"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:58.9999"}
> }}
> {code}
>  
> {{orc-tools-1.3.4:}}
> {code:java}
> {{
> {"mytime":"2019-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"2015-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"2014-12-31 23:59:59.9999"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0001"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.0"}
> }}
>  {{
> {"mytime":"1970-01-01 00:00:00.9999"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:58.0001"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:59.0"}
> }}
>  {{
> {"mytime":"1969-12-31 23:59:58.9999"}
> }}
> {code}
>  
> I'm getting a bit lost at what's right and wrong, but I'm getting the feeling something doesn't add up here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)