You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Stéphane Trou (JIRA)" <ji...@apache.org> on 2015/12/16 00:21:46 UTC

[jira] [Created] (DRILL-4203) Parquet File : Date is stored wrongly

Stéphane Trou created DRILL-4203:
------------------------------------

             Summary: Parquet File : Date is stored wrongly
                 Key: DRILL-4203
                 URL: https://issues.apache.org/jira/browse/DRILL-4203
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.4.0
            Reporter: Stéphane Trou


Hello,

I have some problems when i try to read parquet files produce by drill with  Spark,  all dates are corrupted.

I think the problem come from drill :)

{code}
cat /tmp/date_parquet.csv 
Epoch,1970-01-01
{code}

{code}
0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) as epoch_date from dfs.tmp.`date_parquet.csv`;
+--------+-------------+
|  name  | epoch_date  |
+--------+-------------+
| Epoch  | 1970-01-01  |
+--------+-------------+
{code}

{code}
0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select columns[0] as name, cast(columns[1] as date) as epoch_date from dfs.tmp.`date_parquet.csv`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1                          |
+-----------+----------------------------+
{code}

When I read the file with parquet tools, i found  
{code}
java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/
name = Epoch
epoch_date = 4881176
{code}

According to [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], epoch_date should be equals to 0.

Meta : 
{code}
java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/
file:        file:/tmp/buggy_parquet/0_0_0.parquet 
creator:     parquet-mr version 1.8.1-drill-r0 (build 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) 
extra:       drill.version = 1.4.0 

file schema: root 
--------------------------------------------------------------------------------
name:        OPTIONAL BINARY O:UTF8 R:0 D:1
epoch_date:  OPTIONAL INT32 O:DATE R:0 D:1

row group 1: RC:1 TS:93 OFFSET:4 
--------------------------------------------------------------------------------
name:         BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
epoch_date:   INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 ENC:RLE,BIT_PACKED,PLAIN
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)