You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Krystal (JIRA)" <ji...@apache.org> on 2017/03/24 18:50:41 UTC

[jira] [Created] (DRILL-5381) convert_from(col, 'TIMESTAMP_IMPALA') returns incorrect timestamp if there are multiple nulls

Krystal created DRILL-5381:
------------------------------

             Summary: convert_from(col, 'TIMESTAMP_IMPALA') returns incorrect timestamp if there are multiple nulls 
                 Key: DRILL-5381
                 URL: https://issues.apache.org/jira/browse/DRILL-5381
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.9.0, 1.8.0, 1.10.0
            Reporter: Krystal


In drill-1.10, setting `store.parquet.reader.int96_as_timestamp`=true returns expected data:

select voter_id,create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 15;
+-----------+------------------------+
| voter_id  |    create_timestamp    |
+-----------+------------------------+
| 1         | 2016-10-23 20:03:58.0  |
| 2         | null                   |
| 3         | 2016-09-09 12:01:18.0  |
| 4         | 2017-03-06 20:35:55.0  |
| 5         | 2017-01-20 22:32:43.0  |
| 6         | 2016-10-22 05:46:12.0  |
| 7         | 2016-09-19 10:21:29.0  |
| 8         | null                   |
| 9         | 2016-07-23 13:39:02.0  |
| 10        | 2017-01-28 17:27:19.0  |
| 11        | 2016-10-23 10:55:44.0  |
| 12        | 2016-06-07 22:44:03.0  |
| 13        | 2016-05-04 13:59:20.0  |
| 14        | 2016-11-08 17:20:14.0  |
| 15        | 2016-05-14 11:23:53.0  |
+-----------+------------------------+

However, setting  `store.parquet.reader.int96_as_timestamp`=false returns incorrect timestamp when it encounters the second "null" value.

select voter_id,convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from dfs.`/user/hive/warehouse/voter_hive_parquet` limit 15;
+-----------+------------------------+
| voter_id  |         EXPR$1         |
+-----------+------------------------+
| 1         | 2016-10-23 20:03:58.0  |
| 2         | null                   |
| 3         | 2016-09-09 12:01:18.0  |
| 4         | 2017-03-06 20:35:55.0  |
| 5         | 2017-01-20 22:32:43.0  |
| 6         | 2016-10-22 05:46:12.0  |
| 7         | 2016-09-19 10:21:29.0  |
| 8         | 2016-07-23 13:39:02.0  |
| 9         | 2016-10-23 10:55:44.0  |
| 10        | 2016-06-07 22:44:03.0  |
| 11        | 2016-05-04 13:59:20.0  |
| 12        | 2016-11-08 17:20:14.0  |
| 13        | 2016-05-14 11:23:53.0  |
| 14        | 2016-06-20 16:18:51.0  |
| 15        | 2016-09-09 10:02:28.0  |
+-----------+------------------------+

Notice that the timestamp for voter_id=9 shifts to voter_id=8 which suppose to have value of "null".  The rest of the timestamps after voter_id=7 are incorrect.  This issue is also reproducible on both drill-1.8.0 and drill-1.9.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)