You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Srinivas M <sm...@gmail.com> on 2018/10/21 08:49:57 UTC

Writing Parquet Timestamp and reading from Hive table

 Hi

We have a java application which writes parquet files. We are using the
Parquet 1.9.0 API to write the Timestamp data. Since there are
incompatibilities between the Parquet and Hive representation of the
Timestamp data, we have tried to work around the same by writing the Parquet
Timestamp data as 12 byte array by converting the Timestamp fields in the
format Hive expects. However, while setting the field type in the Schema,
since Avro Schema Types does not have an enumeration for the INT96 type, we
have set it to bytes under the assumption that hive would allow reading the
data since we have written in the format Hive expects. However, when we are
trying to read the data from the Hive table, we are running into the
following exception.


*Question : *
*---------------*
*1. Is there any way we can work around this issue by making hive read the
data when the timestamp field is set as bytes*
*2. Is there any way in which the data type can be set as INT96 in the
parquet schema ?*

Exception :
========
Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be
cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
========

Schema of the file
=============
file schema: parquet.filecc
--------------------------------------------------------------------------------
C1:          REQUIRED INT32 R:0 D:0
C2:          REQUIRED BINARY O:UTF8 R:0 D:0
C3:          REQUIRED BINARY O:UTF8 R:0 D:0
*C4:          REQUIRED BINARY R:0 D:0                  ----> Timestamp
Column*
*C5:          REQUIRED BINARY R:0 D:0                  ----> Timestamp
Column*

-----------------------------------------------------------------------------------------------------------

hive> show create table HiveParquetTimestamp;
OK
CREATE EXTERNAL TABLE `HiveParquetTimestamp`(
  `c1` int,
  `c2` char(4),
  `c3` varchar(8),
  `c4` timestamp,
  `c5` timestamp)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://cdhkrb123.fyre.com:8020/tmp/HiveParquetTimestamp'

-- 
Srinivas
(*-*)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
You have to grow from the inside out. None can teach you, none can make you
spiritual.
                      -Narendra Nath Dutta(Swamy Vivekananda)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Re: Writing Parquet Timestamp and reading from Hive table

Posted by Srinivas M <sm...@gmail.com>.
Anybody has any thoughts or suggestions  on the issue mentioned in my
earlier post ?


On Sun, Oct 21, 2018 at 2:19 PM Srinivas M <sm...@gmail.com> wrote:

> Hi
>
> We have a java application which writes parquet files. We are using the
> Parquet 1.9.0 API to write the Timestamp data. Since there are
> incompatibilities between the Parquet and Hive representation of the
> Timestamp data, we have tried to work around the same by writing the
> Parquet Timestamp data as 12 byte array by converting the Timestamp
> fields in the format Hive expects. However, while setting the field type in
> the Schema, since Avro Schema Types does not have an enumeration for the
> INT96 type, we have set it to bytes under the assumption that hive would
> allow reading the data since we have written in the format Hive expects.
> However, when we are trying to read the data from the Hive table, we are
> running into the following exception.
>
>
> *Question : *
> *---------------*
> *1. Is there any way we can work around this issue by making hive read the
> data when the timestamp field is set as bytes*
> *2. Is there any way in which the data type can be set as INT96 in the
> parquet schema ?*
>
> Exception :
> ========
> Failed with exception
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be
> cast to org.apache.hadoop.hive.serde2.io.TimestampWritable
> ========
>
> Schema of the file
> =============
> file schema: parquet.filecc
>
> --------------------------------------------------------------------------------
> C1:          REQUIRED INT32 R:0 D:0
> C2:          REQUIRED BINARY O:UTF8 R:0 D:0
> C3:          REQUIRED BINARY O:UTF8 R:0 D:0
> *C4:          REQUIRED BINARY R:0 D:0                  ----> Timestamp
> Column*
> *C5:          REQUIRED BINARY R:0 D:0                  ----> Timestamp
> Column*
>
>
> -----------------------------------------------------------------------------------------------------------
>
> hive> show create table HiveParquetTimestamp;
> OK
> CREATE EXTERNAL TABLE `HiveParquetTimestamp`(
>   `c1` int,
>   `c2` char(4),
>   `c3` varchar(8),
>   `c4` timestamp,
>   `c5` timestamp)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'hdfs://cdhkrb123.fyre.com:8020/tmp/HiveParquetTimestamp'
>
> --
> Srinivas
> (*-*)
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> You have to grow from the inside out. None can teach you, none can make
> you spiritual.
>                       -Narendra Nath Dutta(Swamy Vivekananda)
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>


-- 
Srinivas
(*-*)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
You have to grow from the inside out. None can teach you, none can make you
spiritual.
                      -Narendra Nath Dutta(Swamy Vivekananda)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------