You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Stéphane Trou (JIRA)" <ji...@apache.org> on 2015/12/16 00:21:46 UTC
[jira] [Created] (DRILL-4203) Parquet File : Date is stored wrongly
Stéphane Trou created DRILL-4203:
------------------------------------
Summary: Parquet File : Date is stored wrongly
Key: DRILL-4203
URL: https://issues.apache.org/jira/browse/DRILL-4203
Project: Apache Drill
Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Stéphane Trou
Hello,
I have some problems when i try to read parquet files produce by drill with Spark, all dates are corrupted.
I think the problem come from drill :)
{code}
cat /tmp/date_parquet.csv
Epoch,1970-01-01
{code}
{code}
0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) as epoch_date from dfs.tmp.`date_parquet.csv`;
+--------+-------------+
| name | epoch_date |
+--------+-------------+
| Epoch | 1970-01-01 |
+--------+-------------+
{code}
{code}
0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select columns[0] as name, cast(columns[1] as date) as epoch_date from dfs.tmp.`date_parquet.csv`;
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 1 |
+-----------+----------------------------+
{code}
When I read the file with parquet tools, i found
{code}
java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
name = Epoch
epoch_date = 4881176
{code}
According to [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], epoch_date should be equals to 0.
Meta :
{code}
java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
file: file:/tmp/buggy_parquet/0_0_0.parquet
creator: parquet-mr version 1.8.1-drill-r0 (build 6b605a4ea05b66e1a6bf843353abcb4834a4ced8)
extra: drill.version = 1.4.0
file schema: root
--------------------------------------------------------------------------------
name: OPTIONAL BINARY O:UTF8 R:0 D:1
epoch_date: OPTIONAL INT32 O:DATE R:0 D:1
row group 1: RC:1 TS:93 OFFSET:4
--------------------------------------------------------------------------------
name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
epoch_date: INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)