You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ravi Tatapudi <ra...@in.ibm.com> on 2016/03/09 14:09:06 UTC

How to write "date, timestamp, decimal" data to Parquet-files

Hello,

I am Ravi Tatapudi, from IBM-India. I am working on a simple test-tool, 
that writes data to Parquet-files, which can be imported into hive-tables. 
Pl. find attached sample-program, which writes simple parquet-data-file:



Using the above program, I could create "parquet-files" with data-types: 
INT, LONG, STRING, Boolean...etc (i.e., basically all data-types supported 
by "org.apache.avro.Schema.Type) & load it into "hive" tables 
successfully.

Now, I am trying to figure out, how to write "date, timestamp, decimal 
data" into parquet-files.  In this context, I request you provide the 
possible options (and/or sample-program, if any..), in this regard.

Thanks,
 Ravi


Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

Many thanks for the inputs & confirmation. 

Do you have any idea on, when "parquet-avro-1.9.0" would be released (any 
tentative release-date / month or Q2/Q3 2016) ? Could you please let me 
know, so that I can plan accordingly.

Thanks,
 Ravi



From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas 
Mudigonda/India/IBM@IBMIN
Date:   03/14/2016 09:56 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



Ravi,

Support for those types in parquet-avro hasn't been committed yet. It's
implemented in the branch I pointed you to. If you want to use released
versions, it should be out in 1.9.0.

rb

On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Thanks for the inputs.
>
> I am building & running the test-application, primarily using the
> following JAR-files (for Avro, Parquet-Avro & Hive APIs):
>
> 1) avro-1.8.0.jar
> 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> maven-repository-URL:
> http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> 3) hive-exec-1.2.1.jar
>
> Am I supposed to build/run the test, using a different version of the
> JAR-files ? Could you please let me know.
>
> Thanks,
>  Ravi
>
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/11/2016 10:54 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Yes, it is supported in 1.2.1. It went in here:
>
>
>
> 
https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b

>
>
> Are you using a version of Parquet with that pull request in it? Also, 
if
> you're using CDH this may not work.
>
> rb
>
> On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi 
<ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > I am using hive-version: 1.2.1, as indicated below:
> >
> > --------------------------------------
> > $ hive --version
> > Hive 1.2.1
> > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > From source with checksum ab480aca41b24a9c3751b8c023338231
> > $
> > --------------------------------------
> >
> > As I understand, this version of "hive" supports "date" datatype. 
right
> ?.
> > Do you want me to re-test using any other higher-version of hive ? Pl.
> let
> > me know your thoughts.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/11/2016 06:18 AM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > What version of Hive are you using? You should make sure date is
> supported
> > there.
> >
> > rb
> >
> > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > Many thanks for the reply. I see that, the text-attachment 
containing
> my
> > > test-program is not sent to the mail-group, but got filtered out.
> Hence,
> > > copying the program-code below:
> > >
> > > =================================================================
> > > import java.io.IOException;
> > > import java.util.*;
> > > import org.apache.hadoop.conf.Configuration;
> > > import org.apache.hadoop.fs.FileSystem;
> > > import org.apache.hadoop.fs.Path;
> > > import org.apache.avro.Schema;
> > > import org.apache.avro.Schema.Type;
> > > import org.apache.avro.Schema.Field;
> > > import org.apache.avro.generic.* ;
> > > import org.apache.avro.LogicalTypes;
> > > import org.apache.avro.LogicalTypes.*;
> > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > import parquet.avro.*;
> > >
> > > public class pqtw {
> > >
> > > public static Schema makeSchema() {
> > >      List<Field> fields = new ArrayList<Field>();
> > >      fields.add(new Field("name", Schema.create(Type.STRING), null,
> > > null));
> > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> null));
> > >
> > >      Schema date =
> > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > >      fields.add(new Field("doj", date, null, null));
> > >
> > >      Schema schema = Schema.createRecord("filecc", null, "parquet",
> > > false);
> > >      schema.setFields(fields);
> > >
> > >      return(schema);
> > > }
> > >
> > > public static GenericData.Record makeRecord (Schema schema, String
> name,
> > > int age, int doj) {
> > >      GenericData.Record record = new GenericData.Record(schema);
> > >      record.put("name", name);
> > >      record.put("age", age);
> > >      record.put("doj", doj);
> > >      return(record);
> > > }
> > >
> > > public static void main(String[] args) throws IOException,
> > >
> > >     InterruptedException, ClassNotFoundException {
> > >
> > >         String pqfile = "/tmp/pqtfile1";
> > >
> > >         try {
> > >
> > >         Configuration conf = new Configuration();
> > >         FileSystem fs = FileSystem.getLocal(conf);
> > >
> > >         Schema schema = makeSchema() ;
> > >         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) 
;
> > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > Path(pqfile),
> > > schema);
> > >         writer.write(rec);
> > >         writer.close();
> > >         }
> > >         catch (Exception e)
> > >         {
> > >                 e.printStackTrace();
> > >         }
> > >     }
> > > }
> > > =================================================================
> > >
> > > With the above logic, I could write the data to parquet-file. 
However,
> > > when I load the same into a hive-table & select columns, I could
> select
> > > the columns: "name", "age" (i.e., VARCHAR, INT columns) 
successfully,
> > but
> > > select of "date" column failed with the error given below:
> > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED 
AS
> > > PARQUET ;
> > > OK
> > > Time taken: 0.369 seconds
> > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > hive> SELECT name,age from PT1;
> > > OK
> > > abcd    21
> > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > hive> SELECT doj from PT1;
> > > OK
> > > Failed with exception
> > > 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
cannot
> be
> > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > Time taken: 0.167 seconds
> > > hive>
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > >
> > > Basically, for "date datatype", I am trying to pass an integer-value
> > (for
> > > the # of days from Unix epoch, 1 January 1970, so that the date 
falls
> > > somewhere around 2011..etc). Is this the correct approach to process
> > date
> > > data (or is there any other approach / API to do it) ? Could you
> please
> > > let me know your inputs, in this regard ?
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/09/2016 10:48 PM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > Hi Ravi,
> > >
> > > Not all of the types are fully-implemented yet. I think Hive only 
has
> > > partial support. If I remember correctly:
> > > * Decimal is supported if the backing primitive type is fixed-length
> > > binary
> > > * Date and Timestamp are supported, but Time has not been 
implemented
> > yet
> > >
> > > For object models you can build applications on (instead of those
> > embedded
> > > in SQL), only Avro objects can support those types through its
> > > LogicalTypes
> > > API. That API has been implemented in parquet-avro, but not yet
> > committed.
> > > I would like for this feature to make it into 1.9.0. If you want to
> test
> > > in
> > > the mean time, check out the pull request:
> > >
> > >   https://github.com/apache/parquet-mr/pull/318
> > >
> > > rb
> > >
> > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > test-tool,
> > > > that writes data to Parquet-files, which can be imported into
> > > hive-tables.
> > > > Pl. find attached sample-program, which writes simple
> > parquet-data-file:
> > > >
> > > >
> > > >
> > > > Using the above program, I could create "parquet-files" with
> > data-types:
> > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > supported
> > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > successfully.
> > > >
> > > > Now, I am trying to figure out, how to write "date, timestamp,
> decimal
> > > > data" into parquet-files.  In this context, I request you provide
> the
> > > > possible options (and/or sample-program, if any..), in this 
regard.
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

Many thanks for the inputs. I will try to build it today & see how it 
goes. 

Could you please let me know, any approximate date (or month) as to, when 
"parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include 
this fix)" would be officially released (for example: by "june 2016" or 
"dec 2016" or later) ? It would be very helpful, for my planning.

Thanks,
 Ravi



From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Date:   04/04/2016 10:05 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



I don't think you can get the artifacts produced by our CI builds, but you
can check out the branch and build it using instructions in the 
repository.

On Mon, Apr 4, 2016 at 5:39 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Regarding the support for "date, timestamp, decimal" data types for
> Parquet-files:
>
> In your earlier mail, you have mentioned the pull-request-URL:
> https://github.com/apache/parquet-mr/pull/318 has the necessary support
> for these data-types (and that it would be released as part of
> parquet-avro-release:1.9.0).
>
> I see that, this fix is included in build# 1247 (& above?). How to get
> this build (or the latest-build), that includes the JAR-file:
> "parquet-avro" including the support for "date,timestamp"..etc. ? Could
> you please let me know.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/14/2016 09:56 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Ravi,
>
> Support for those types in parquet-avro hasn't been committed yet. It's
> implemented in the branch I pointed you to. If you want to use released
> versions, it should be out in 1.9.0.
>
> rb
>
> On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi 
<ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Thanks for the inputs.
> >
> > I am building & running the test-application, primarily using the
> > following JAR-files (for Avro, Parquet-Avro & Hive APIs):
> >
> > 1) avro-1.8.0.jar
> > 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> > maven-repository-URL:
> > http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> > 3) hive-exec-1.2.1.jar
> >
> > Am I supposed to build/run the test, using a different version of the
> > JAR-files ? Could you please let me know.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/11/2016 10:54 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Yes, it is supported in 1.2.1. It went in here:
> >
> >
> >
> >
>
> 
https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b

>
> >
> >
> > Are you using a version of Parquet with that pull request in it? Also,
> if
> > you're using CDH this may not work.
> >
> > rb
> >
> > On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > I am using hive-version: 1.2.1, as indicated below:
> > >
> > > --------------------------------------
> > > $ hive --version
> > > Hive 1.2.1
> > > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > > From source with checksum ab480aca41b24a9c3751b8c023338231
> > > $
> > > --------------------------------------
> > >
> > > As I understand, this version of "hive" supports "date" datatype.
> right
> > ?.
> > > Do you want me to re-test using any other higher-version of hive ? 
Pl.
> > let
> > > me know your thoughts.
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/11/2016 06:18 AM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > What version of Hive are you using? You should make sure date is
> > supported
> > > there.
> > >
> > > rb
> > >
> > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello Ryan:
> > > >
> > > > Many thanks for the reply. I see that, the text-attachment
> containing
> > my
> > > > test-program is not sent to the mail-group, but got filtered out.
> > Hence,
> > > > copying the program-code below:
> > > >
> > > > =================================================================
> > > > import java.io.IOException;
> > > > import java.util.*;
> > > > import org.apache.hadoop.conf.Configuration;
> > > > import org.apache.hadoop.fs.FileSystem;
> > > > import org.apache.hadoop.fs.Path;
> > > > import org.apache.avro.Schema;
> > > > import org.apache.avro.Schema.Type;
> > > > import org.apache.avro.Schema.Field;
> > > > import org.apache.avro.generic.* ;
> > > > import org.apache.avro.LogicalTypes;
> > > > import org.apache.avro.LogicalTypes.*;
> > > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > > import parquet.avro.*;
> > > >
> > > > public class pqtw {
> > > >
> > > > public static Schema makeSchema() {
> > > >      List<Field> fields = new ArrayList<Field>();
> > > >      fields.add(new Field("name", Schema.create(Type.STRING), 
null,
> > > > null));
> > > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> > null));
> > > >
> > > >      Schema date =
> > > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > > >      fields.add(new Field("doj", date, null, null));
> > > >
> > > >      Schema schema = Schema.createRecord("filecc", null, 
"parquet",
> > > > false);
> > > >      schema.setFields(fields);
> > > >
> > > >      return(schema);
> > > > }
> > > >
> > > > public static GenericData.Record makeRecord (Schema schema, String
> > name,
> > > > int age, int doj) {
> > > >      GenericData.Record record = new GenericData.Record(schema);
> > > >      record.put("name", name);
> > > >      record.put("age", age);
> > > >      record.put("doj", doj);
> > > >      return(record);
> > > > }
> > > >
> > > > public static void main(String[] args) throws IOException,
> > > >
> > > >     InterruptedException, ClassNotFoundException {
> > > >
> > > >         String pqfile = "/tmp/pqtfile1";
> > > >
> > > >         try {
> > > >
> > > >         Configuration conf = new Configuration();
> > > >         FileSystem fs = FileSystem.getLocal(conf);
> > > >
> > > >         Schema schema = makeSchema() ;
> > > >         GenericData.Record rec = makeRecord(schema,"abcd", 
21,15000)
> ;
> > > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > > Path(pqfile),
> > > > schema);
> > > >         writer.write(rec);
> > > >         writer.close();
> > > >         }
> > > >         catch (Exception e)
> > > >         {
> > > >                 e.printStackTrace();
> > > >         }
> > > >     }
> > > > }
> > > > =================================================================
> > > >
> > > > With the above logic, I could write the data to parquet-file.
> However,
> > > > when I load the same into a hive-table & select columns, I could
> > select
> > > > the columns: "name", "age" (i.e., VARCHAR, INT columns)
> successfully,
> > > but
> > > > select of "date" column failed with the error given below:
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) 
STORED
> AS
> > > > PARQUET ;
> > > > OK
> > > > Time taken: 0.369 seconds
> > > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > > hive> SELECT name,age from PT1;
> > > > OK
> > > > abcd    21
> > > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > > hive> SELECT doj from PT1;
> > > > OK
> > > > Failed with exception
> > > >
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable
> cannot
> > be
> > > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > > Time taken: 0.167 seconds
> > > > hive>
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > >
> > > > Basically, for "date datatype", I am trying to pass an 
integer-value
> > > (for
> > > > the # of days from Unix epoch, 1 January 1970, so that the date
> falls
> > > > somewhere around 2011..etc). Is this the correct approach to 
process
> > > date
> > > > data (or is there any other approach / API to do it) ? Could you
> > please
> > > > let me know your inputs, in this regard ?
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > > >
> > > >
> > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > Mudigonda/India/IBM@IBMIN
> > > > Date:   03/09/2016 10:48 PM
> > > > Subject:        Re: How to write "date, timestamp, decimal" data 
to
> > > > Parquet-files
> > > >
> > > >
> > > >
> > > > Hi Ravi,
> > > >
> > > > Not all of the types are fully-implemented yet. I think Hive only
> has
> > > > partial support. If I remember correctly:
> > > > * Decimal is supported if the backing primitive type is 
fixed-length
> > > > binary
> > > > * Date and Timestamp are supported, but Time has not been
> implemented
> > > yet
> > > >
> > > > For object models you can build applications on (instead of those
> > > embedded
> > > > in SQL), only Avro objects can support those types through its
> > > > LogicalTypes
> > > > API. That API has been implemented in parquet-avro, but not yet
> > > committed.
> > > > I would like for this feature to make it into 1.9.0. If you want 
to
> > test
> > > > in
> > > > the mean time, check out the pull request:
> > > >
> > > >   https://github.com/apache/parquet-mr/pull/318
> > > >
> > > > rb
> > > >
> > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > > test-tool,
> > > > > that writes data to Parquet-files, which can be imported into
> > > > hive-tables.
> > > > > Pl. find attached sample-program, which writes simple
> > > parquet-data-file:
> > > > >
> > > > >
> > > > >
> > > > > Using the above program, I could create "parquet-files" with
> > > data-types:
> > > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > > supported
> > > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > > successfully.
> > > > >
> > > > > Now, I am trying to figure out, how to write "date, timestamp,
> > decimal
> > > > > data" into parquet-files.  In this context, I request you 
provide
> > the
> > > > > possible options (and/or sample-program, if any..), in this
> regard.
> > > > >
> > > > > Thanks,
> > > > >  Ravi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

I have downloaded the source via the "pull-request-URL: 
https://github.com/apache/parquet-mr/pull/318" (did a "fork" & downloaded 
the source-ZIP-file) & built it using maven. The build completed 
successfully & I got the file: "parquet-avro-1.8.2-SNAPSHOT.jar". When I 
tried to verify "date" data type using this JAR-file, I realized that, the 
existing test-programs are failing with build with this new JAR. 

So far, I have my test-programs built (and run) using 
"parquet-avro-1.6.0.jar". Now, when I try to re-build the test-programs 
using "parquet-avro-1.8.2-SNAPSHOT.jar", I see that, the builds failed. 
After going thro' the source-code, I realized that, there are many changes 
in the API, between "1.6.0" & "1.8.2", because of which the 
sample-programs that built with "1.6.0" are not building now. (It looks 
like, now the "AvroParquetWriter" doesn't have the methods: "write", 
"close"...etc, but using some other approach. Do you know, why these 
methods are removed completely & made incompatible with parquet-avro-1.6.0 
?)

Pl. find below a sample parquet-write program, which is now failing with 
"parquet-avro-1.8.2-snapshot.jar". Do you have any sample 
parquet-write-program that works with "parquet-avro-1.8.2.jar" (to write 
primitive data types such as: "int", "char"..etc, to a parquet-file, as 
shown in the below example) ? If yes, could you please point me to the 
same.

=================================================================================================
public static Schema makeSchema() {
     List<Field> fields = new ArrayList<Field>();
     fields.add(new Field("name", Schema.create(Type.STRING), null, 
null));
     fields.add(new Field("age", Schema.create(Type.INT), null, null));
     fields.add(new Field("dept", Schema.create(Type.STRING), null, 
null));

     Schema schema = Schema.createRecord("filecc", null, "parquet", 
false);
     schema.setFields(fields);
     return(schema);
}

public static GenericData.Record makeRecord (Schema schema, String name, 
int age, String dept) {
     GenericData.Record record = new GenericData.Record(schema);
     record.put("name", name);
     record.put("age", age);
     record.put("dept", dept);
     return(record);
}

public static void main(String[] args) throws IOException, 
InterruptedException, ClassNotFoundException {

        String pqfile = "/tmp/pqtfile1";
        try {
        conf = new Configuration();
        FileSystem fs = FileSystem.getLocal(conf);

        Schema schema = makeSchema() ;
        GenericData.Record rec = makeRecord(schema,"Person A", 21,"ED2") ;
        AvroParquetWriter writer = new AvroParquetWriter(new Path(pqfile), 
schema);
        writer.write(rec);
        writer.close() ;

} catch (Exception e) { e.printStackTrace(); }
=================================================================================================

Thanks,
 Ravi



From:   Ravi Tatapudi/India/IBM
To:     dev@parquet.apache.org
Date:   04/05/2016 10:53 AM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files


Hello Ryan:

Many thanks for the inputs. I will try to build it today & see how it 
goes. 

Could you please let me know, any approximate date (or month) as to, when 
"parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include 
this fix)" would be officially released (for example: by "june 2016" or 
"dec 2016" or later) ? It would be very helpful, for my planning.

Thanks,
 Ravi




From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Date:   04/04/2016 10:05 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



I don't think you can get the artifacts produced by our CI builds, but you
can check out the branch and build it using instructions in the 
repository.

On Mon, Apr 4, 2016 at 5:39 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Regarding the support for "date, timestamp, decimal" data types for
> Parquet-files:
>
> In your earlier mail, you have mentioned the pull-request-URL:
> https://github.com/apache/parquet-mr/pull/318 has the necessary support
> for these data-types (and that it would be released as part of
> parquet-avro-release:1.9.0).
>
> I see that, this fix is included in build# 1247 (& above?). How to get
> this build (or the latest-build), that includes the JAR-file:
> "parquet-avro" including the support for "date,timestamp"..etc. ? Could
> you please let me know.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/14/2016 09:56 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Ravi,
>
> Support for those types in parquet-avro hasn't been committed yet. It's
> implemented in the branch I pointed you to. If you want to use released
> versions, it should be out in 1.9.0.
>
> rb
>
> On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi 
<ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Thanks for the inputs.
> >
> > I am building & running the test-application, primarily using the
> > following JAR-files (for Avro, Parquet-Avro & Hive APIs):
> >
> > 1) avro-1.8.0.jar
> > 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> > maven-repository-URL:
> > http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> > 3) hive-exec-1.2.1.jar
> >
> > Am I supposed to build/run the test, using a different version of the
> > JAR-files ? Could you please let me know.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/11/2016 10:54 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Yes, it is supported in 1.2.1. It went in here:
> >
> >
> >
> >
>
> 
https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b

>
> >
> >
> > Are you using a version of Parquet with that pull request in it? Also,
> if
> > you're using CDH this may not work.
> >
> > rb
> >
> > On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > I am using hive-version: 1.2.1, as indicated below:
> > >
> > > --------------------------------------
> > > $ hive --version
> > > Hive 1.2.1
> > > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > > From source with checksum ab480aca41b24a9c3751b8c023338231
> > > $
> > > --------------------------------------
> > >
> > > As I understand, this version of "hive" supports "date" datatype.
> right
> > ?.
> > > Do you want me to re-test using any other higher-version of hive ? 
Pl.
> > let
> > > me know your thoughts.
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/11/2016 06:18 AM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > What version of Hive are you using? You should make sure date is
> > supported
> > > there.
> > >
> > > rb
> > >
> > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello Ryan:
> > > >
> > > > Many thanks for the reply. I see that, the text-attachment
> containing
> > my
> > > > test-program is not sent to the mail-group, but got filtered out.
> > Hence,
> > > > copying the program-code below:
> > > >
> > > > =================================================================
> > > > import java.io.IOException;
> > > > import java.util.*;
> > > > import org.apache.hadoop.conf.Configuration;
> > > > import org.apache.hadoop.fs.FileSystem;
> > > > import org.apache.hadoop.fs.Path;
> > > > import org.apache.avro.Schema;
> > > > import org.apache.avro.Schema.Type;
> > > > import org.apache.avro.Schema.Field;
> > > > import org.apache.avro.generic.* ;
> > > > import org.apache.avro.LogicalTypes;
> > > > import org.apache.avro.LogicalTypes.*;
> > > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > > import parquet.avro.*;
> > > >
> > > > public class pqtw {
> > > >
> > > > public static Schema makeSchema() {
> > > >      List<Field> fields = new ArrayList<Field>();
> > > >      fields.add(new Field("name", Schema.create(Type.STRING), 
null,
> > > > null));
> > > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> > null));
> > > >
> > > >      Schema date =
> > > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > > >      fields.add(new Field("doj", date, null, null));
> > > >
> > > >      Schema schema = Schema.createRecord("filecc", null, 
"parquet",
> > > > false);
> > > >      schema.setFields(fields);
> > > >
> > > >      return(schema);
> > > > }
> > > >
> > > > public static GenericData.Record makeRecord (Schema schema, String
> > name,
> > > > int age, int doj) {
> > > >      GenericData.Record record = new GenericData.Record(schema);
> > > >      record.put("name", name);
> > > >      record.put("age", age);
> > > >      record.put("doj", doj);
> > > >      return(record);
> > > > }
> > > >
> > > > public static void main(String[] args) throws IOException,
> > > >
> > > >     InterruptedException, ClassNotFoundException {
> > > >
> > > >         String pqfile = "/tmp/pqtfile1";
> > > >
> > > >         try {
> > > >
> > > >         Configuration conf = new Configuration();
> > > >         FileSystem fs = FileSystem.getLocal(conf);
> > > >
> > > >         Schema schema = makeSchema() ;
> > > >         GenericData.Record rec = makeRecord(schema,"abcd", 
21,15000)
> ;
> > > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > > Path(pqfile),
> > > > schema);
> > > >         writer.write(rec);
> > > >         writer.close();
> > > >         }
> > > >         catch (Exception e)
> > > >         {
> > > >                 e.printStackTrace();
> > > >         }
> > > >     }
> > > > }
> > > > =================================================================
> > > >
> > > > With the above logic, I could write the data to parquet-file.
> However,
> > > > when I load the same into a hive-table & select columns, I could
> > select
> > > > the columns: "name", "age" (i.e., VARCHAR, INT columns)
> successfully,
> > > but
> > > > select of "date" column failed with the error given below:
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) 
STORED
> AS
> > > > PARQUET ;
> > > > OK
> > > > Time taken: 0.369 seconds
> > > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > > hive> SELECT name,age from PT1;
> > > > OK
> > > > abcd    21
> > > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > > hive> SELECT doj from PT1;
> > > > OK
> > > > Failed with exception
> > > >
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable
> cannot
> > be
> > > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > > Time taken: 0.167 seconds
> > > > hive>
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > >
> > > > Basically, for "date datatype", I am trying to pass an 
integer-value
> > > (for
> > > > the # of days from Unix epoch, 1 January 1970, so that the date
> falls
> > > > somewhere around 2011..etc). Is this the correct approach to 
process
> > > date
> > > > data (or is there any other approach / API to do it) ? Could you
> > please
> > > > let me know your inputs, in this regard ?
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > > >
> > > >
> > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > Mudigonda/India/IBM@IBMIN
> > > > Date:   03/09/2016 10:48 PM
> > > > Subject:        Re: How to write "date, timestamp, decimal" data 
to
> > > > Parquet-files
> > > >
> > > >
> > > >
> > > > Hi Ravi,
> > > >
> > > > Not all of the types are fully-implemented yet. I think Hive only
> has
> > > > partial support. If I remember correctly:
> > > > * Decimal is supported if the backing primitive type is 
fixed-length
> > > > binary
> > > > * Date and Timestamp are supported, but Time has not been
> implemented
> > > yet
> > > >
> > > > For object models you can build applications on (instead of those
> > > embedded
> > > > in SQL), only Avro objects can support those types through its
> > > > LogicalTypes
> > > > API. That API has been implemented in parquet-avro, but not yet
> > > committed.
> > > > I would like for this feature to make it into 1.9.0. If you want 
to
> > test
> > > > in
> > > > the mean time, check out the pull request:
> > > >
> > > >   https://github.com/apache/parquet-mr/pull/318
> > > >
> > > > rb
> > > >
> > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > > test-tool,
> > > > > that writes data to Parquet-files, which can be imported into
> > > > hive-tables.
> > > > > Pl. find attached sample-program, which writes simple
> > > parquet-data-file:
> > > > >
> > > > >
> > > > >
> > > > > Using the above program, I could create "parquet-files" with
> > > data-types:
> > > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > > supported
> > > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > > successfully.
> > > > >
> > > > > Now, I am trying to figure out, how to write "date, timestamp,
> > decimal
> > > > > data" into parquet-files.  In this context, I request you 
provide
> > the
> > > > > possible options (and/or sample-program, if any..), in this
> regard.
> > > > >
> > > > > Thanks,
> > > > >  Ravi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Thanks Ryan, for the info.

Regards,
Ravi



From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Date:   04/05/2016 09:07 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



Ravi,

The only breaking API changes were the renamed packages between 1.6.0 and
1.7.0. Other changes are binary compatible and we have no plans to
deprecate the API you're using. For the release date, I don't know yet. We
haven't closed out all of the 1.9.0 issues yet.

rb

On Tue, Apr 5, 2016 at 5:35 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Regarding my question on compatibility between versions: 1.6.0 & 1.8.2":
>
> My apologies for the confusion caused. After investigating further, I
> realized that, the functionality is now in different JARs. With the
> version: 1.6.0, I only included the JAR-file: "parquet-avro-1.6.0.jar"
> during build & execution of the programs.
>
> Now, I see that, I should include the JARs: parquet-avro-1.8.2.jar,
> parquet-hadoop-1.8.2.jar at build-time & include the JARs:
> parquet-format-2.3.1.jar, parquet-column-1.8.2.jar,
> parquet-common-1.8.2.jar, parquet-encoding-1.8.2.jar, for running the
> programs). After doing that, I could build my old applications
> successfully (of course, I had to change some of the import-statements
> from "import parquet.avro" to "import org.apache.parquet.avro"...etc) &
> run the tests successfully.
>
> So, my outstanding queries are:
>
> 1) I believe, now all my tests are using the "depricatedAPI" for
> AvroParquetWriter. If you have a sample-program using the 
latest-approach,
> I request you to point me to the same.
> 2) If you are aware of any approximate date (or month) as to, when
> "parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include
> this fix)" would be officially released (for example: by "june 2016" or
> "dec 2016" or later), then I request you to please let me know. It would
> be very helpful, for my planning.
>
> Many thanks for your support & help, in this regard.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ravi Tatapudi/India/IBM
> To:     dev@parquet.apache.org
> Date:   04/05/2016 04:29 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
> Hello Ryan:
>
> I have downloaded the source via the "pull-request-URL:
> https://github.com/apache/parquet-mr/pull/318" (did a "fork" & 
downloaded
> the source-ZIP-file) & built it using maven. The build completed
> successfully & I got the file: "parquet-avro-1.8.2-SNAPSHOT.jar". When I
> tried to verify "date" data type using this JAR-file, I realized that, 
the
> existing test-programs are failing with build with this new JAR.
>
> So far, I have my test-programs built (and run) using
> "parquet-avro-1.6.0.jar". Now, when I try to re-build the test-programs
> using "parquet-avro-1.8.2-SNAPSHOT.jar", I see that, the builds failed.
> After going thro' the source-code, I realized that, there are many 
changes
> in the API, between "1.6.0" & "1.8.2", because of which the
> sample-programs that built with "1.6.0" are not building now. (It looks
> like, now the "AvroParquetWriter" doesn't have the methods: "write",
> "close"...etc, but using some other approach. Do you know, why these
> methods are removed completely & made incompatible with 
parquet-avro-1.6.0
> ?)
>
> Pl. find below a sample parquet-write program, which is now failing with
> "parquet-avro-1.8.2-snapshot.jar". Do you have any sample
> parquet-write-program that works with "parquet-avro-1.8.2.jar" (to write
> primitive data types such as: "int", "char"..etc, to a parquet-file, as
> shown in the below example) ? If yes, could you please point me to the
> same.
>
>
> 
=================================================================================================
> public static Schema makeSchema() {
>      List<Field> fields = new ArrayList<Field>();
>      fields.add(new Field("name", Schema.create(Type.STRING), null,
> null));
>      fields.add(new Field("age", Schema.create(Type.INT), null, null));
>      fields.add(new Field("dept", Schema.create(Type.STRING), null,
> null));
>
>      Schema schema = Schema.createRecord("filecc", null, "parquet",
> false);
>      schema.setFields(fields);
>      return(schema);
> }
>
> public static GenericData.Record makeRecord (Schema schema, String name,
> int age, String dept) {
>      GenericData.Record record = new GenericData.Record(schema);
>      record.put("name", name);
>      record.put("age", age);
>      record.put("dept", dept);
>      return(record);
> }
>
> public static void main(String[] args) throws IOException,
> InterruptedException, ClassNotFoundException {
>
>         String pqfile = "/tmp/pqtfile1";
>         try {
>         conf = new Configuration();
>         FileSystem fs = FileSystem.getLocal(conf);
>
>         Schema schema = makeSchema() ;
>         GenericData.Record rec = makeRecord(schema,"Person A", 21,"ED2") 
;
>         AvroParquetWriter writer = new AvroParquetWriter(new 
Path(pqfile),
> schema);
>         writer.write(rec);
>         writer.close() ;
>
> } catch (Exception e) { e.printStackTrace(); }
>
> 
=================================================================================================
>
> Thanks,
>  Ravi
>
>
>
>
> From:   Ravi Tatapudi/India/IBM
> To:     dev@parquet.apache.org
> Date:   04/05/2016 10:53 AM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
> Hello Ryan:
>
> Many thanks for the inputs. I will try to build it today & see how it
> goes.
>
> Could you please let me know, any approximate date (or month) as to, 
when
> "parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include
> this fix)" would be officially released (for example: by "june 2016" or
> "dec 2016" or later) ? It would be very helpful, for my planning.
>
> Thanks,
>  Ravi
>
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Date:   04/04/2016 10:05 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> I don't think you can get the artifacts produced by our CI builds, but 
you
> can check out the branch and build it using instructions in the
> repository.
>
> On Mon, Apr 4, 2016 at 5:39 AM, Ravi Tatapudi <ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Regarding the support for "date, timestamp, decimal" data types for
> > Parquet-files:
> >
> > In your earlier mail, you have mentioned the pull-request-URL:
> > https://github.com/apache/parquet-mr/pull/318 has the necessary 
support
> > for these data-types (and that it would be released as part of
> > parquet-avro-release:1.9.0).
> >
> > I see that, this fix is included in build# 1247 (& above?). How to get
> > this build (or the latest-build), that includes the JAR-file:
> > "parquet-avro" including the support for "date,timestamp"..etc. ? 
Could
> > you please let me know.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/14/2016 09:56 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Ravi,
> >
> > Support for those types in parquet-avro hasn't been committed yet. 
It's
> > implemented in the branch I pointed you to. If you want to use 
released
> > versions, it should be out in 1.9.0.
> >
> > rb
> >
> > On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > Thanks for the inputs.
> > >
> > > I am building & running the test-application, primarily using the
> > > following JAR-files (for Avro, Parquet-Avro & Hive APIs):
> > >
> > > 1) avro-1.8.0.jar
> > > 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> > > maven-repository-URL:
> > > http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> > > 3) hive-exec-1.2.1.jar
> > >
> > > Am I supposed to build/run the test, using a different version of 
the
> > > JAR-files ? Could you please let me know.
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/11/2016 10:54 PM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > Yes, it is supported in 1.2.1. It went in here:
> > >
> > >
> > >
> > >
> >
> >
>
> 
https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b

>
> >
> > >
> > >
> > > Are you using a version of Parquet with that pull request in it? 
Also,
> > if
> > > you're using CDH this may not work.
> > >
> > > rb
> > >
> > > On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello Ryan:
> > > >
> > > > I am using hive-version: 1.2.1, as indicated below:
> > > >
> > > > --------------------------------------
> > > > $ hive --version
> > > > Hive 1.2.1
> > > > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > > > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > > > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > > > From source with checksum ab480aca41b24a9c3751b8c023338231
> > > > $
> > > > --------------------------------------
> > > >
> > > > As I understand, this version of "hive" supports "date" datatype.
> > right
> > > ?.
> > > > Do you want me to re-test using any other higher-version of hive ?
> Pl.
> > > let
> > > > me know your thoughts.
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > > >
> > > >
> > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > Mudigonda/India/IBM@IBMIN
> > > > Date:   03/11/2016 06:18 AM
> > > > Subject:        Re: How to write "date, timestamp, decimal" data 
to
> > > > Parquet-files
> > > >
> > > >
> > > >
> > > > What version of Hive are you using? You should make sure date is
> > > supported
> > > > there.
> > > >
> > > > rb
> > > >
> > > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> > > <ra...@in.ibm.com>
> > > > wrote:
> > > >
> > > > > Hello Ryan:
> > > > >
> > > > > Many thanks for the reply. I see that, the text-attachment
> > containing
> > > my
> > > > > test-program is not sent to the mail-group, but got filtered 
out.
> > > Hence,
> > > > > copying the program-code below:
> > > > >
> > > > > 
=================================================================
> > > > > import java.io.IOException;
> > > > > import java.util.*;
> > > > > import org.apache.hadoop.conf.Configuration;
> > > > > import org.apache.hadoop.fs.FileSystem;
> > > > > import org.apache.hadoop.fs.Path;
> > > > > import org.apache.avro.Schema;
> > > > > import org.apache.avro.Schema.Type;
> > > > > import org.apache.avro.Schema.Field;
> > > > > import org.apache.avro.generic.* ;
> > > > > import org.apache.avro.LogicalTypes;
> > > > > import org.apache.avro.LogicalTypes.*;
> > > > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > > > import parquet.avro.*;
> > > > >
> > > > > public class pqtw {
> > > > >
> > > > > public static Schema makeSchema() {
> > > > >      List<Field> fields = new ArrayList<Field>();
> > > > >      fields.add(new Field("name", Schema.create(Type.STRING),
> null,
> > > > > null));
> > > > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> > > null));
> > > > >
> > > > >      Schema date =
> > > > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > > > >      fields.add(new Field("doj", date, null, null));
> > > > >
> > > > >      Schema schema = Schema.createRecord("filecc", null,
> "parquet",
> > > > > false);
> > > > >      schema.setFields(fields);
> > > > >
> > > > >      return(schema);
> > > > > }
> > > > >
> > > > > public static GenericData.Record makeRecord (Schema schema, 
String
> > > name,
> > > > > int age, int doj) {
> > > > >      GenericData.Record record = new GenericData.Record(schema);
> > > > >      record.put("name", name);
> > > > >      record.put("age", age);
> > > > >      record.put("doj", doj);
> > > > >      return(record);
> > > > > }
> > > > >
> > > > > public static void main(String[] args) throws IOException,
> > > > >
> > > > >     InterruptedException, ClassNotFoundException {
> > > > >
> > > > >         String pqfile = "/tmp/pqtfile1";
> > > > >
> > > > >         try {
> > > > >
> > > > >         Configuration conf = new Configuration();
> > > > >         FileSystem fs = FileSystem.getLocal(conf);
> > > > >
> > > > >         Schema schema = makeSchema() ;
> > > > >         GenericData.Record rec = makeRecord(schema,"abcd",
> 21,15000)
> > ;
> > > > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > > > Path(pqfile),
> > > > > schema);
> > > > >         writer.write(rec);
> > > > >         writer.close();
> > > > >         }
> > > > >         catch (Exception e)
> > > > >         {
> > > > >                 e.printStackTrace();
> > > > >         }
> > > > >     }
> > > > > }
> > > > > 
=================================================================
> > > > >
> > > > > With the above logic, I could write the data to parquet-file.
> > However,
> > > > > when I load the same into a hive-table & select columns, I could
> > > select
> > > > > the columns: "name", "age" (i.e., VARCHAR, INT columns)
> > successfully,
> > > > but
> > > > > select of "date" column failed with the error given below:
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date)
> STORED
> > AS
> > > > > PARQUET ;
> > > > > OK
> > > > > Time taken: 0.369 seconds
> > > > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > > > hive> SELECT name,age from PT1;
> > > > > OK
> > > > > abcd    21
> > > > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > > > hive> SELECT doj from PT1;
> > > > > OK
> > > > > Failed with exception
> > > > >
> > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable
> > cannot
> > > be
> > > > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > > > Time taken: 0.167 seconds
> > > > > hive>
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > > >
> > > > > Basically, for "date datatype", I am trying to pass an
> integer-value
> > > > (for
> > > > > the # of days from Unix epoch, 1 January 1970, so that the date
> > falls
> > > > > somewhere around 2011..etc). Is this the correct approach to
> process
> > > > date
> > > > > data (or is there any other approach / API to do it) ? Could you
> > > please
> > > > > let me know your inputs, in this regard ?
> > > > >
> > > > > Thanks,
> > > > >  Ravi
> > > > >
> > > > >
> > > > >
> > > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > > Mudigonda/India/IBM@IBMIN
> > > > > Date:   03/09/2016 10:48 PM
> > > > > Subject:        Re: How to write "date, timestamp, decimal" data
> to
> > > > > Parquet-files
> > > > >
> > > > >
> > > > >
> > > > > Hi Ravi,
> > > > >
> > > > > Not all of the types are fully-implemented yet. I think Hive 
only
> > has
> > > > > partial support. If I remember correctly:
> > > > > * Decimal is supported if the backing primitive type is
> fixed-length
> > > > > binary
> > > > > * Date and Timestamp are supported, but Time has not been
> > implemented
> > > > yet
> > > > >
> > > > > For object models you can build applications on (instead of 
those
> > > > embedded
> > > > > in SQL), only Avro objects can support those types through its
> > > > > LogicalTypes
> > > > > API. That API has been implemented in parquet-avro, but not yet
> > > > committed.
> > > > > I would like for this feature to make it into 1.9.0. If you want
> to
> > > test
> > > > > in
> > > > > the mean time, check out the pull request:
> > > > >
> > > > >   https://github.com/apache/parquet-mr/pull/318
> > > > >
> > > > > rb
> > > > >
> > > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> > > <ra...@in.ibm.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > > > test-tool,
> > > > > > that writes data to Parquet-files, which can be imported into
> > > > > hive-tables.
> > > > > > Pl. find attached sample-program, which writes simple
> > > > parquet-data-file:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Using the above program, I could create "parquet-files" with
> > > > data-types:
> > > > > > INT, LONG, STRING, Boolean...etc (i.e., basically all 
data-types
> > > > > supported
> > > > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > > > successfully.
> > > > > >
> > > > > > Now, I am trying to figure out, how to write "date, timestamp,
> > > decimal
> > > > > > data" into parquet-files.  In this context, I request you
> provide
> > > the
> > > > > > possible options (and/or sample-program, if any..), in this
> > regard.
> > > > > >
> > > > > > Thanks,
> > > > > >  Ravi
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ryan Blue
> > > > > Software Engineer
> > > > > Netflix
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

Using "parquet-avro-1.8.2" (that I have built from the pull-request# 
https://github.com/apache/parquet-mr/pull/318), I have tried creating a 
"logical type for date" to write to parquet-file, using the code-block 
below:

-----------------------------------------------------------------------
 34      LogicalType AvroDate = new LogicalType("AvroDate") ;
 35      Schema Pdate = AvroDate.addToSchema(Schema.create(Type.INT)) ;
 36      fields.add(new Field("doj", Pdate, null, null));
-----------------------------------------------------------------------

The program is built successfully. But, when I try to run the program, I 
got the below exeception:

----------------------------------------------------------------------------------------------------------------------
Exception in thread "main" java.lang.NoSuchMethodError: 
org/apache/avro/Schema.setLogicalType(Lorg/apache/avro/LogicalType;)V
        at org.apache.avro.LogicalType.addToSchema(LogicalType.java:72)
        at pqtw.makeSchema(pqtw.java:35)
        at pqtw.main(pqtw.java:63)
----------------------------------------------------------------------------------------------------------------------

I am using "parquet-avro-1.8.2.jar" & "avro-1.8.0.jar". The error is 
indicating that the method: "org/apache/avro/Schema.setLogicalType" is NOT 
found. From the trace, it looks like "addToSchema" function is in turn 
calling "setLogicalType" in schema class, which is where it is failing 
with "NoMethodFound" exception. 

Hence, I am trying to understand, whether it is a correct way to create a 
"LogicalType" (or) if there is any other approach (or if I should use a 
"higher version" of "avro.jar", if any...) ? 

Could you please let me know your inputs in this regard.. (or do you 
suppose, this question should go to "AVRO-mailing-list" ?) Pl. let me know 
your thoughts.

Thanks,
 Ravi

NOTE:
Pl. find below the full-code of the test-program, if you wish to have a 
look. FYI only.

======================================================================================
 1 import java.io.IOException;
  2 import java.util.*;
  3
  4 import org.apache.hadoop.conf.Configuration;
  5 import org.apache.hadoop.fs.FileSystem;
  6 import org.apache.hadoop.fs.Path;
  7 import org.apache.hadoop.io.Text;
  8
  9 import org.apache.parquet.avro.*;
 10
 11 import org.apache.avro.Schema;
 12 import org.apache.avro.Schema.Type;
 13 import org.apache.avro.Schema.Field;
 14 import org.apache.avro.LogicalType;
 15 import org.apache.avro.LogicalTypes;
 16
 17 import org.apache.parquet.column.ParquetProperties.WriterVersion;
 18
 19 import org.apache.parquet.hadoop.api.WriteSupport;
 20 import org.apache.parquet.hadoop.ParquetWriter;
 21 import org.apache.parquet.hadoop.ParquetWriter.*;
 22 import org.apache.parquet.hadoop.metadata.CompressionCodecName;
 23
 24 import org.apache.avro.generic.* ;
 25
 26 public class pqtw {
 27
 28 public static Schema makeSchema() {
 29      List<Field> fields = new ArrayList<Field>();
 30      fields.add(new Field("name", Schema.create(Type.STRING), null, 
null));
 31      fields.add(new Field("age", Schema.create(Type.INT), null, 
null));
 32      //fields.add(new Field("doj", Schema.create(Type.INT), null, 
null));
 33
 34      LogicalType AvroDate = new LogicalType("AvroDate") ;
 35      Schema Pdate = AvroDate.addToSchema(Schema.create(Type.INT)) ;
 36      fields.add(new Field("doj", Pdate, null, null));
 37
 38      Schema schema = Schema.createRecord("filecc", null, "parquet", 
false);
 39      schema.setFields(fields);
 40
 41      return(schema);
 42 }
 43
 44 public static GenericData.Record makeRecord (Schema schema, String 
name, int age, int doj) {
 45      GenericData.Record record = new GenericData.Record(schema);
 46      record.put("name", name);
 47      record.put("age", age);
 48      record.put("doj", doj);
 49      return(record);
 50 }
 51
 52 public static void main(String[] args) throws IOException,
 53
 54     InterruptedException, ClassNotFoundException {
 55
 56         String pqfile = "/tmp/pqtfile2";
 57
 58         try {
 59
 60         Configuration conf = new Configuration();
 61         FileSystem fs = FileSystem.getLocal(conf);
 62
 63         Schema schema = makeSchema() ;
 64         GenericData.Record rec = makeRecord(schema,"abcd", 5,15000) ;
 65         AvroParquetWriter writer = new AvroParquetWriter(new 
Path(pqfile), schema) ;
 66         writer.write(rec);
 67         writer.close();
 68         }
 69         catch (Exception e)
 70         {
 71                 e.printStackTrace();
 72         }
 73     }
 74 }
======================================================================================


Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Ravi,

The only breaking API changes were the renamed packages between 1.6.0 and
1.7.0. Other changes are binary compatible and we have no plans to
deprecate the API you're using. For the release date, I don't know yet. We
haven't closed out all of the 1.9.0 issues yet.

rb

On Tue, Apr 5, 2016 at 5:35 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Regarding my question on compatibility between versions: 1.6.0 & 1.8.2":
>
> My apologies for the confusion caused. After investigating further, I
> realized that, the functionality is now in different JARs. With the
> version: 1.6.0, I only included the JAR-file: "parquet-avro-1.6.0.jar"
> during build & execution of the programs.
>
> Now, I see that, I should include the JARs: parquet-avro-1.8.2.jar,
> parquet-hadoop-1.8.2.jar at build-time & include the JARs:
> parquet-format-2.3.1.jar, parquet-column-1.8.2.jar,
> parquet-common-1.8.2.jar, parquet-encoding-1.8.2.jar, for running the
> programs). After doing that, I could build my old applications
> successfully (of course, I had to change some of the import-statements
> from "import parquet.avro" to "import org.apache.parquet.avro"...etc) &
> run the tests successfully.
>
> So, my outstanding queries are:
>
> 1) I believe, now all my tests are using the "depricatedAPI" for
> AvroParquetWriter. If you have a sample-program using the latest-approach,
> I request you to point me to the same.
> 2) If you are aware of any approximate date (or month) as to, when
> "parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include
> this fix)" would be officially released (for example: by "june 2016" or
> "dec 2016" or later), then I request you to please let me know. It would
> be very helpful, for my planning.
>
> Many thanks for your support & help, in this regard.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ravi Tatapudi/India/IBM
> To:     dev@parquet.apache.org
> Date:   04/05/2016 04:29 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
> Hello Ryan:
>
> I have downloaded the source via the "pull-request-URL:
> https://github.com/apache/parquet-mr/pull/318" (did a "fork" & downloaded
> the source-ZIP-file) & built it using maven. The build completed
> successfully & I got the file: "parquet-avro-1.8.2-SNAPSHOT.jar". When I
> tried to verify "date" data type using this JAR-file, I realized that, the
> existing test-programs are failing with build with this new JAR.
>
> So far, I have my test-programs built (and run) using
> "parquet-avro-1.6.0.jar". Now, when I try to re-build the test-programs
> using "parquet-avro-1.8.2-SNAPSHOT.jar", I see that, the builds failed.
> After going thro' the source-code, I realized that, there are many changes
> in the API, between "1.6.0" & "1.8.2", because of which the
> sample-programs that built with "1.6.0" are not building now. (It looks
> like, now the "AvroParquetWriter" doesn't have the methods: "write",
> "close"...etc, but using some other approach. Do you know, why these
> methods are removed completely & made incompatible with parquet-avro-1.6.0
> ?)
>
> Pl. find below a sample parquet-write program, which is now failing with
> "parquet-avro-1.8.2-snapshot.jar". Do you have any sample
> parquet-write-program that works with "parquet-avro-1.8.2.jar" (to write
> primitive data types such as: "int", "char"..etc, to a parquet-file, as
> shown in the below example) ? If yes, could you please point me to the
> same.
>
>
> =================================================================================================
> public static Schema makeSchema() {
>      List<Field> fields = new ArrayList<Field>();
>      fields.add(new Field("name", Schema.create(Type.STRING), null,
> null));
>      fields.add(new Field("age", Schema.create(Type.INT), null, null));
>      fields.add(new Field("dept", Schema.create(Type.STRING), null,
> null));
>
>      Schema schema = Schema.createRecord("filecc", null, "parquet",
> false);
>      schema.setFields(fields);
>      return(schema);
> }
>
> public static GenericData.Record makeRecord (Schema schema, String name,
> int age, String dept) {
>      GenericData.Record record = new GenericData.Record(schema);
>      record.put("name", name);
>      record.put("age", age);
>      record.put("dept", dept);
>      return(record);
> }
>
> public static void main(String[] args) throws IOException,
> InterruptedException, ClassNotFoundException {
>
>         String pqfile = "/tmp/pqtfile1";
>         try {
>         conf = new Configuration();
>         FileSystem fs = FileSystem.getLocal(conf);
>
>         Schema schema = makeSchema() ;
>         GenericData.Record rec = makeRecord(schema,"Person A", 21,"ED2") ;
>         AvroParquetWriter writer = new AvroParquetWriter(new Path(pqfile),
> schema);
>         writer.write(rec);
>         writer.close() ;
>
> } catch (Exception e) { e.printStackTrace(); }
>
> =================================================================================================
>
> Thanks,
>  Ravi
>
>
>
>
> From:   Ravi Tatapudi/India/IBM
> To:     dev@parquet.apache.org
> Date:   04/05/2016 10:53 AM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
> Hello Ryan:
>
> Many thanks for the inputs. I will try to build it today & see how it
> goes.
>
> Could you please let me know, any approximate date (or month) as to, when
> "parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include
> this fix)" would be officially released (for example: by "june 2016" or
> "dec 2016" or later) ? It would be very helpful, for my planning.
>
> Thanks,
>  Ravi
>
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Date:   04/04/2016 10:05 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> I don't think you can get the artifacts produced by our CI builds, but you
> can check out the branch and build it using instructions in the
> repository.
>
> On Mon, Apr 4, 2016 at 5:39 AM, Ravi Tatapudi <ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Regarding the support for "date, timestamp, decimal" data types for
> > Parquet-files:
> >
> > In your earlier mail, you have mentioned the pull-request-URL:
> > https://github.com/apache/parquet-mr/pull/318 has the necessary support
> > for these data-types (and that it would be released as part of
> > parquet-avro-release:1.9.0).
> >
> > I see that, this fix is included in build# 1247 (& above?). How to get
> > this build (or the latest-build), that includes the JAR-file:
> > "parquet-avro" including the support for "date,timestamp"..etc. ? Could
> > you please let me know.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/14/2016 09:56 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Ravi,
> >
> > Support for those types in parquet-avro hasn't been committed yet. It's
> > implemented in the branch I pointed you to. If you want to use released
> > versions, it should be out in 1.9.0.
> >
> > rb
> >
> > On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > Thanks for the inputs.
> > >
> > > I am building & running the test-application, primarily using the
> > > following JAR-files (for Avro, Parquet-Avro & Hive APIs):
> > >
> > > 1) avro-1.8.0.jar
> > > 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> > > maven-repository-URL:
> > > http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> > > 3) hive-exec-1.2.1.jar
> > >
> > > Am I supposed to build/run the test, using a different version of the
> > > JAR-files ? Could you please let me know.
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/11/2016 10:54 PM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > Yes, it is supported in 1.2.1. It went in here:
> > >
> > >
> > >
> > >
> >
> >
>
> https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b
>
> >
> > >
> > >
> > > Are you using a version of Parquet with that pull request in it? Also,
> > if
> > > you're using CDH this may not work.
> > >
> > > rb
> > >
> > > On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello Ryan:
> > > >
> > > > I am using hive-version: 1.2.1, as indicated below:
> > > >
> > > > --------------------------------------
> > > > $ hive --version
> > > > Hive 1.2.1
> > > > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > > > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > > > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > > > From source with checksum ab480aca41b24a9c3751b8c023338231
> > > > $
> > > > --------------------------------------
> > > >
> > > > As I understand, this version of "hive" supports "date" datatype.
> > right
> > > ?.
> > > > Do you want me to re-test using any other higher-version of hive ?
> Pl.
> > > let
> > > > me know your thoughts.
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > > >
> > > >
> > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > Mudigonda/India/IBM@IBMIN
> > > > Date:   03/11/2016 06:18 AM
> > > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > > Parquet-files
> > > >
> > > >
> > > >
> > > > What version of Hive are you using? You should make sure date is
> > > supported
> > > > there.
> > > >
> > > > rb
> > > >
> > > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> > > <ra...@in.ibm.com>
> > > > wrote:
> > > >
> > > > > Hello Ryan:
> > > > >
> > > > > Many thanks for the reply. I see that, the text-attachment
> > containing
> > > my
> > > > > test-program is not sent to the mail-group, but got filtered out.
> > > Hence,
> > > > > copying the program-code below:
> > > > >
> > > > > =================================================================
> > > > > import java.io.IOException;
> > > > > import java.util.*;
> > > > > import org.apache.hadoop.conf.Configuration;
> > > > > import org.apache.hadoop.fs.FileSystem;
> > > > > import org.apache.hadoop.fs.Path;
> > > > > import org.apache.avro.Schema;
> > > > > import org.apache.avro.Schema.Type;
> > > > > import org.apache.avro.Schema.Field;
> > > > > import org.apache.avro.generic.* ;
> > > > > import org.apache.avro.LogicalTypes;
> > > > > import org.apache.avro.LogicalTypes.*;
> > > > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > > > import parquet.avro.*;
> > > > >
> > > > > public class pqtw {
> > > > >
> > > > > public static Schema makeSchema() {
> > > > >      List<Field> fields = new ArrayList<Field>();
> > > > >      fields.add(new Field("name", Schema.create(Type.STRING),
> null,
> > > > > null));
> > > > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> > > null));
> > > > >
> > > > >      Schema date =
> > > > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > > > >      fields.add(new Field("doj", date, null, null));
> > > > >
> > > > >      Schema schema = Schema.createRecord("filecc", null,
> "parquet",
> > > > > false);
> > > > >      schema.setFields(fields);
> > > > >
> > > > >      return(schema);
> > > > > }
> > > > >
> > > > > public static GenericData.Record makeRecord (Schema schema, String
> > > name,
> > > > > int age, int doj) {
> > > > >      GenericData.Record record = new GenericData.Record(schema);
> > > > >      record.put("name", name);
> > > > >      record.put("age", age);
> > > > >      record.put("doj", doj);
> > > > >      return(record);
> > > > > }
> > > > >
> > > > > public static void main(String[] args) throws IOException,
> > > > >
> > > > >     InterruptedException, ClassNotFoundException {
> > > > >
> > > > >         String pqfile = "/tmp/pqtfile1";
> > > > >
> > > > >         try {
> > > > >
> > > > >         Configuration conf = new Configuration();
> > > > >         FileSystem fs = FileSystem.getLocal(conf);
> > > > >
> > > > >         Schema schema = makeSchema() ;
> > > > >         GenericData.Record rec = makeRecord(schema,"abcd",
> 21,15000)
> > ;
> > > > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > > > Path(pqfile),
> > > > > schema);
> > > > >         writer.write(rec);
> > > > >         writer.close();
> > > > >         }
> > > > >         catch (Exception e)
> > > > >         {
> > > > >                 e.printStackTrace();
> > > > >         }
> > > > >     }
> > > > > }
> > > > > =================================================================
> > > > >
> > > > > With the above logic, I could write the data to parquet-file.
> > However,
> > > > > when I load the same into a hive-table & select columns, I could
> > > select
> > > > > the columns: "name", "age" (i.e., VARCHAR, INT columns)
> > successfully,
> > > > but
> > > > > select of "date" column failed with the error given below:
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> --------------------------------------------------------------------------------
> > > > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date)
> STORED
> > AS
> > > > > PARQUET ;
> > > > > OK
> > > > > Time taken: 0.369 seconds
> > > > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > > > hive> SELECT name,age from PT1;
> > > > > OK
> > > > > abcd    21
> > > > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > > > hive> SELECT doj from PT1;
> > > > > OK
> > > > > Failed with exception
> > > > >
> > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable
> > cannot
> > > be
> > > > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > > > Time taken: 0.167 seconds
> > > > > hive>
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> --------------------------------------------------------------------------------
> > > > >
> > > > > Basically, for "date datatype", I am trying to pass an
> integer-value
> > > > (for
> > > > > the # of days from Unix epoch, 1 January 1970, so that the date
> > falls
> > > > > somewhere around 2011..etc). Is this the correct approach to
> process
> > > > date
> > > > > data (or is there any other approach / API to do it) ? Could you
> > > please
> > > > > let me know your inputs, in this regard ?
> > > > >
> > > > > Thanks,
> > > > >  Ravi
> > > > >
> > > > >
> > > > >
> > > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > > Mudigonda/India/IBM@IBMIN
> > > > > Date:   03/09/2016 10:48 PM
> > > > > Subject:        Re: How to write "date, timestamp, decimal" data
> to
> > > > > Parquet-files
> > > > >
> > > > >
> > > > >
> > > > > Hi Ravi,
> > > > >
> > > > > Not all of the types are fully-implemented yet. I think Hive only
> > has
> > > > > partial support. If I remember correctly:
> > > > > * Decimal is supported if the backing primitive type is
> fixed-length
> > > > > binary
> > > > > * Date and Timestamp are supported, but Time has not been
> > implemented
> > > > yet
> > > > >
> > > > > For object models you can build applications on (instead of those
> > > > embedded
> > > > > in SQL), only Avro objects can support those types through its
> > > > > LogicalTypes
> > > > > API. That API has been implemented in parquet-avro, but not yet
> > > > committed.
> > > > > I would like for this feature to make it into 1.9.0. If you want
> to
> > > test
> > > > > in
> > > > > the mean time, check out the pull request:
> > > > >
> > > > >   https://github.com/apache/parquet-mr/pull/318
> > > > >
> > > > > rb
> > > > >
> > > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> > > <ra...@in.ibm.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > > > test-tool,
> > > > > > that writes data to Parquet-files, which can be imported into
> > > > > hive-tables.
> > > > > > Pl. find attached sample-program, which writes simple
> > > > parquet-data-file:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Using the above program, I could create "parquet-files" with
> > > > data-types:
> > > > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > > > supported
> > > > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > > > successfully.
> > > > > >
> > > > > > Now, I am trying to figure out, how to write "date, timestamp,
> > > decimal
> > > > > > data" into parquet-files.  In this context, I request you
> provide
> > > the
> > > > > > possible options (and/or sample-program, if any..), in this
> > regard.
> > > > > >
> > > > > > Thanks,
> > > > > >  Ravi
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ryan Blue
> > > > > Software Engineer
> > > > > Netflix
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

Regarding my question on compatibility between versions: 1.6.0 & 1.8.2": 

My apologies for the confusion caused. After investigating further, I 
realized that, the functionality is now in different JARs. With the 
version: 1.6.0, I only included the JAR-file: "parquet-avro-1.6.0.jar" 
during build & execution of the programs.

Now, I see that, I should include the JARs: parquet-avro-1.8.2.jar, 
parquet-hadoop-1.8.2.jar at build-time & include the JARs: 
parquet-format-2.3.1.jar, parquet-column-1.8.2.jar, 
parquet-common-1.8.2.jar, parquet-encoding-1.8.2.jar, for running the 
programs). After doing that, I could build my old applications 
successfully (of course, I had to change some of the import-statements 
from "import parquet.avro" to "import org.apache.parquet.avro"...etc) & 
run the tests successfully.

So, my outstanding queries are:

1) I believe, now all my tests are using the "depricatedAPI" for 
AvroParquetWriter. If you have a sample-program using the latest-approach, 
I request you to point me to the same.
2) If you are aware of any approximate date (or month) as to, when 
"parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include 
this fix)" would be officially released (for example: by "june 2016" or 
"dec 2016" or later), then I request you to please let me know. It would 
be very helpful, for my planning.

Many thanks for your support & help, in this regard.

Thanks,
 Ravi



From:   Ravi Tatapudi/India/IBM
To:     dev@parquet.apache.org
Date:   04/05/2016 04:29 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files


Hello Ryan:

I have downloaded the source via the "pull-request-URL: 
https://github.com/apache/parquet-mr/pull/318" (did a "fork" & downloaded 
the source-ZIP-file) & built it using maven. The build completed 
successfully & I got the file: "parquet-avro-1.8.2-SNAPSHOT.jar". When I 
tried to verify "date" data type using this JAR-file, I realized that, the 
existing test-programs are failing with build with this new JAR. 

So far, I have my test-programs built (and run) using 
"parquet-avro-1.6.0.jar". Now, when I try to re-build the test-programs 
using "parquet-avro-1.8.2-SNAPSHOT.jar", I see that, the builds failed. 
After going thro' the source-code, I realized that, there are many changes 
in the API, between "1.6.0" & "1.8.2", because of which the 
sample-programs that built with "1.6.0" are not building now. (It looks 
like, now the "AvroParquetWriter" doesn't have the methods: "write", 
"close"...etc, but using some other approach. Do you know, why these 
methods are removed completely & made incompatible with parquet-avro-1.6.0 
?)

Pl. find below a sample parquet-write program, which is now failing with 
"parquet-avro-1.8.2-snapshot.jar". Do you have any sample 
parquet-write-program that works with "parquet-avro-1.8.2.jar" (to write 
primitive data types such as: "int", "char"..etc, to a parquet-file, as 
shown in the below example) ? If yes, could you please point me to the 
same.

=================================================================================================
public static Schema makeSchema() {
     List<Field> fields = new ArrayList<Field>();
     fields.add(new Field("name", Schema.create(Type.STRING), null, 
null));
     fields.add(new Field("age", Schema.create(Type.INT), null, null));
     fields.add(new Field("dept", Schema.create(Type.STRING), null, 
null));

     Schema schema = Schema.createRecord("filecc", null, "parquet", 
false);
     schema.setFields(fields);
     return(schema);
}

public static GenericData.Record makeRecord (Schema schema, String name, 
int age, String dept) {
     GenericData.Record record = new GenericData.Record(schema);
     record.put("name", name);
     record.put("age", age);
     record.put("dept", dept);
     return(record);
}

public static void main(String[] args) throws IOException, 
InterruptedException, ClassNotFoundException {

        String pqfile = "/tmp/pqtfile1";
        try {
        conf = new Configuration();
        FileSystem fs = FileSystem.getLocal(conf);

        Schema schema = makeSchema() ;
        GenericData.Record rec = makeRecord(schema,"Person A", 21,"ED2") ;
        AvroParquetWriter writer = new AvroParquetWriter(new Path(pqfile), 
schema);
        writer.write(rec);
        writer.close() ;

} catch (Exception e) { e.printStackTrace(); }
=================================================================================================

Thanks,
 Ravi




From:   Ravi Tatapudi/India/IBM
To:     dev@parquet.apache.org
Date:   04/05/2016 10:53 AM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files


Hello Ryan:

Many thanks for the inputs. I will try to build it today & see how it 
goes. 

Could you please let me know, any approximate date (or month) as to, when 
"parquet-avro-1.9.0 (or any other parquet-avro-1.8.x, that would include 
this fix)" would be officially released (for example: by "june 2016" or 
"dec 2016" or later) ? It would be very helpful, for my planning.

Thanks,
 Ravi




From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Date:   04/04/2016 10:05 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



I don't think you can get the artifacts produced by our CI builds, but you
can check out the branch and build it using instructions in the 
repository.

On Mon, Apr 4, 2016 at 5:39 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Regarding the support for "date, timestamp, decimal" data types for
> Parquet-files:
>
> In your earlier mail, you have mentioned the pull-request-URL:
> https://github.com/apache/parquet-mr/pull/318 has the necessary support
> for these data-types (and that it would be released as part of
> parquet-avro-release:1.9.0).
>
> I see that, this fix is included in build# 1247 (& above?). How to get
> this build (or the latest-build), that includes the JAR-file:
> "parquet-avro" including the support for "date,timestamp"..etc. ? Could
> you please let me know.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/14/2016 09:56 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Ravi,
>
> Support for those types in parquet-avro hasn't been committed yet. It's
> implemented in the branch I pointed you to. If you want to use released
> versions, it should be out in 1.9.0.
>
> rb
>
> On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi 
<ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Thanks for the inputs.
> >
> > I am building & running the test-application, primarily using the
> > following JAR-files (for Avro, Parquet-Avro & Hive APIs):
> >
> > 1) avro-1.8.0.jar
> > 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> > maven-repository-URL:
> > http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> > 3) hive-exec-1.2.1.jar
> >
> > Am I supposed to build/run the test, using a different version of the
> > JAR-files ? Could you please let me know.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/11/2016 10:54 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Yes, it is supported in 1.2.1. It went in here:
> >
> >
> >
> >
>
> 
https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b

>
> >
> >
> > Are you using a version of Parquet with that pull request in it? Also,
> if
> > you're using CDH this may not work.
> >
> > rb
> >
> > On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > I am using hive-version: 1.2.1, as indicated below:
> > >
> > > --------------------------------------
> > > $ hive --version
> > > Hive 1.2.1
> > > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > > From source with checksum ab480aca41b24a9c3751b8c023338231
> > > $
> > > --------------------------------------
> > >
> > > As I understand, this version of "hive" supports "date" datatype.
> right
> > ?.
> > > Do you want me to re-test using any other higher-version of hive ? 
Pl.
> > let
> > > me know your thoughts.
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/11/2016 06:18 AM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > What version of Hive are you using? You should make sure date is
> > supported
> > > there.
> > >
> > > rb
> > >
> > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello Ryan:
> > > >
> > > > Many thanks for the reply. I see that, the text-attachment
> containing
> > my
> > > > test-program is not sent to the mail-group, but got filtered out.
> > Hence,
> > > > copying the program-code below:
> > > >
> > > > =================================================================
> > > > import java.io.IOException;
> > > > import java.util.*;
> > > > import org.apache.hadoop.conf.Configuration;
> > > > import org.apache.hadoop.fs.FileSystem;
> > > > import org.apache.hadoop.fs.Path;
> > > > import org.apache.avro.Schema;
> > > > import org.apache.avro.Schema.Type;
> > > > import org.apache.avro.Schema.Field;
> > > > import org.apache.avro.generic.* ;
> > > > import org.apache.avro.LogicalTypes;
> > > > import org.apache.avro.LogicalTypes.*;
> > > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > > import parquet.avro.*;
> > > >
> > > > public class pqtw {
> > > >
> > > > public static Schema makeSchema() {
> > > >      List<Field> fields = new ArrayList<Field>();
> > > >      fields.add(new Field("name", Schema.create(Type.STRING), 
null,
> > > > null));
> > > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> > null));
> > > >
> > > >      Schema date =
> > > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > > >      fields.add(new Field("doj", date, null, null));
> > > >
> > > >      Schema schema = Schema.createRecord("filecc", null, 
"parquet",
> > > > false);
> > > >      schema.setFields(fields);
> > > >
> > > >      return(schema);
> > > > }
> > > >
> > > > public static GenericData.Record makeRecord (Schema schema, String
> > name,
> > > > int age, int doj) {
> > > >      GenericData.Record record = new GenericData.Record(schema);
> > > >      record.put("name", name);
> > > >      record.put("age", age);
> > > >      record.put("doj", doj);
> > > >      return(record);
> > > > }
> > > >
> > > > public static void main(String[] args) throws IOException,
> > > >
> > > >     InterruptedException, ClassNotFoundException {
> > > >
> > > >         String pqfile = "/tmp/pqtfile1";
> > > >
> > > >         try {
> > > >
> > > >         Configuration conf = new Configuration();
> > > >         FileSystem fs = FileSystem.getLocal(conf);
> > > >
> > > >         Schema schema = makeSchema() ;
> > > >         GenericData.Record rec = makeRecord(schema,"abcd", 
21,15000)
> ;
> > > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > > Path(pqfile),
> > > > schema);
> > > >         writer.write(rec);
> > > >         writer.close();
> > > >         }
> > > >         catch (Exception e)
> > > >         {
> > > >                 e.printStackTrace();
> > > >         }
> > > >     }
> > > > }
> > > > =================================================================
> > > >
> > > > With the above logic, I could write the data to parquet-file.
> However,
> > > > when I load the same into a hive-table & select columns, I could
> > select
> > > > the columns: "name", "age" (i.e., VARCHAR, INT columns)
> successfully,
> > > but
> > > > select of "date" column failed with the error given below:
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) 
STORED
> AS
> > > > PARQUET ;
> > > > OK
> > > > Time taken: 0.369 seconds
> > > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > > hive> SELECT name,age from PT1;
> > > > OK
> > > > abcd    21
> > > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > > hive> SELECT doj from PT1;
> > > > OK
> > > > Failed with exception
> > > >
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable
> cannot
> > be
> > > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > > Time taken: 0.167 seconds
> > > > hive>
> > > >
> > > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > >
> > > > Basically, for "date datatype", I am trying to pass an 
integer-value
> > > (for
> > > > the # of days from Unix epoch, 1 January 1970, so that the date
> falls
> > > > somewhere around 2011..etc). Is this the correct approach to 
process
> > > date
> > > > data (or is there any other approach / API to do it) ? Could you
> > please
> > > > let me know your inputs, in this regard ?
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > > >
> > > >
> > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > Mudigonda/India/IBM@IBMIN
> > > > Date:   03/09/2016 10:48 PM
> > > > Subject:        Re: How to write "date, timestamp, decimal" data 
to
> > > > Parquet-files
> > > >
> > > >
> > > >
> > > > Hi Ravi,
> > > >
> > > > Not all of the types are fully-implemented yet. I think Hive only
> has
> > > > partial support. If I remember correctly:
> > > > * Decimal is supported if the backing primitive type is 
fixed-length
> > > > binary
> > > > * Date and Timestamp are supported, but Time has not been
> implemented
> > > yet
> > > >
> > > > For object models you can build applications on (instead of those
> > > embedded
> > > > in SQL), only Avro objects can support those types through its
> > > > LogicalTypes
> > > > API. That API has been implemented in parquet-avro, but not yet
> > > committed.
> > > > I would like for this feature to make it into 1.9.0. If you want 
to
> > test
> > > > in
> > > > the mean time, check out the pull request:
> > > >
> > > >   https://github.com/apache/parquet-mr/pull/318
> > > >
> > > > rb
> > > >
> > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > > test-tool,
> > > > > that writes data to Parquet-files, which can be imported into
> > > > hive-tables.
> > > > > Pl. find attached sample-program, which writes simple
> > > parquet-data-file:
> > > > >
> > > > >
> > > > >
> > > > > Using the above program, I could create "parquet-files" with
> > > data-types:
> > > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > > supported
> > > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > > successfully.
> > > > >
> > > > > Now, I am trying to figure out, how to write "date, timestamp,
> > decimal
> > > > > data" into parquet-files.  In this context, I request you 
provide
> > the
> > > > > possible options (and/or sample-program, if any..), in this
> regard.
> > > > >
> > > > > Thanks,
> > > > >  Ravi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I don't think you can get the artifacts produced by our CI builds, but you
can check out the branch and build it using instructions in the repository.

On Mon, Apr 4, 2016 at 5:39 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Regarding the support for "date, timestamp, decimal" data types for
> Parquet-files:
>
> In your earlier mail, you have mentioned the pull-request-URL:
> https://github.com/apache/parquet-mr/pull/318 has the necessary support
> for these data-types (and that it would be released as part of
> parquet-avro-release:1.9.0).
>
> I see that, this fix is included in build# 1247 (& above?). How to get
> this build (or the latest-build), that includes the JAR-file:
> "parquet-avro" including the support for "date,timestamp"..etc. ? Could
> you please let me know.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/14/2016 09:56 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Ravi,
>
> Support for those types in parquet-avro hasn't been committed yet. It's
> implemented in the branch I pointed you to. If you want to use released
> versions, it should be out in 1.9.0.
>
> rb
>
> On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi <ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Thanks for the inputs.
> >
> > I am building & running the test-application, primarily using the
> > following JAR-files (for Avro, Parquet-Avro & Hive APIs):
> >
> > 1) avro-1.8.0.jar
> > 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> > maven-repository-URL:
> > http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> > 3) hive-exec-1.2.1.jar
> >
> > Am I supposed to build/run the test, using a different version of the
> > JAR-files ? Could you please let me know.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/11/2016 10:54 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Yes, it is supported in 1.2.1. It went in here:
> >
> >
> >
> >
>
> https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b
>
> >
> >
> > Are you using a version of Parquet with that pull request in it? Also,
> if
> > you're using CDH this may not work.
> >
> > rb
> >
> > On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > I am using hive-version: 1.2.1, as indicated below:
> > >
> > > --------------------------------------
> > > $ hive --version
> > > Hive 1.2.1
> > > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > > From source with checksum ab480aca41b24a9c3751b8c023338231
> > > $
> > > --------------------------------------
> > >
> > > As I understand, this version of "hive" supports "date" datatype.
> right
> > ?.
> > > Do you want me to re-test using any other higher-version of hive ? Pl.
> > let
> > > me know your thoughts.
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/11/2016 06:18 AM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > What version of Hive are you using? You should make sure date is
> > supported
> > > there.
> > >
> > > rb
> > >
> > > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello Ryan:
> > > >
> > > > Many thanks for the reply. I see that, the text-attachment
> containing
> > my
> > > > test-program is not sent to the mail-group, but got filtered out.
> > Hence,
> > > > copying the program-code below:
> > > >
> > > > =================================================================
> > > > import java.io.IOException;
> > > > import java.util.*;
> > > > import org.apache.hadoop.conf.Configuration;
> > > > import org.apache.hadoop.fs.FileSystem;
> > > > import org.apache.hadoop.fs.Path;
> > > > import org.apache.avro.Schema;
> > > > import org.apache.avro.Schema.Type;
> > > > import org.apache.avro.Schema.Field;
> > > > import org.apache.avro.generic.* ;
> > > > import org.apache.avro.LogicalTypes;
> > > > import org.apache.avro.LogicalTypes.*;
> > > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > > import parquet.avro.*;
> > > >
> > > > public class pqtw {
> > > >
> > > > public static Schema makeSchema() {
> > > >      List<Field> fields = new ArrayList<Field>();
> > > >      fields.add(new Field("name", Schema.create(Type.STRING), null,
> > > > null));
> > > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> > null));
> > > >
> > > >      Schema date =
> > > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > > >      fields.add(new Field("doj", date, null, null));
> > > >
> > > >      Schema schema = Schema.createRecord("filecc", null, "parquet",
> > > > false);
> > > >      schema.setFields(fields);
> > > >
> > > >      return(schema);
> > > > }
> > > >
> > > > public static GenericData.Record makeRecord (Schema schema, String
> > name,
> > > > int age, int doj) {
> > > >      GenericData.Record record = new GenericData.Record(schema);
> > > >      record.put("name", name);
> > > >      record.put("age", age);
> > > >      record.put("doj", doj);
> > > >      return(record);
> > > > }
> > > >
> > > > public static void main(String[] args) throws IOException,
> > > >
> > > >     InterruptedException, ClassNotFoundException {
> > > >
> > > >         String pqfile = "/tmp/pqtfile1";
> > > >
> > > >         try {
> > > >
> > > >         Configuration conf = new Configuration();
> > > >         FileSystem fs = FileSystem.getLocal(conf);
> > > >
> > > >         Schema schema = makeSchema() ;
> > > >         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000)
> ;
> > > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > > Path(pqfile),
> > > > schema);
> > > >         writer.write(rec);
> > > >         writer.close();
> > > >         }
> > > >         catch (Exception e)
> > > >         {
> > > >                 e.printStackTrace();
> > > >         }
> > > >     }
> > > > }
> > > > =================================================================
> > > >
> > > > With the above logic, I could write the data to parquet-file.
> However,
> > > > when I load the same into a hive-table & select columns, I could
> > select
> > > > the columns: "name", "age" (i.e., VARCHAR, INT columns)
> successfully,
> > > but
> > > > select of "date" column failed with the error given below:
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
> --------------------------------------------------------------------------------
> > > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED
> AS
> > > > PARQUET ;
> > > > OK
> > > > Time taken: 0.369 seconds
> > > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > > hive> SELECT name,age from PT1;
> > > > OK
> > > > abcd    21
> > > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > > hive> SELECT doj from PT1;
> > > > OK
> > > > Failed with exception
> > > >
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable
> cannot
> > be
> > > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > > Time taken: 0.167 seconds
> > > > hive>
> > > >
> > > >
> > >
> > >
> >
> >
>
> --------------------------------------------------------------------------------
> > > >
> > > > Basically, for "date datatype", I am trying to pass an integer-value
> > > (for
> > > > the # of days from Unix epoch, 1 January 1970, so that the date
> falls
> > > > somewhere around 2011..etc). Is this the correct approach to process
> > > date
> > > > data (or is there any other approach / API to do it) ? Could you
> > please
> > > > let me know your inputs, in this regard ?
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > > >
> > > >
> > > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > > To:     Parquet Dev <de...@parquet.apache.org>
> > > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > > Mudigonda/India/IBM@IBMIN
> > > > Date:   03/09/2016 10:48 PM
> > > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > > Parquet-files
> > > >
> > > >
> > > >
> > > > Hi Ravi,
> > > >
> > > > Not all of the types are fully-implemented yet. I think Hive only
> has
> > > > partial support. If I remember correctly:
> > > > * Decimal is supported if the backing primitive type is fixed-length
> > > > binary
> > > > * Date and Timestamp are supported, but Time has not been
> implemented
> > > yet
> > > >
> > > > For object models you can build applications on (instead of those
> > > embedded
> > > > in SQL), only Avro objects can support those types through its
> > > > LogicalTypes
> > > > API. That API has been implemented in parquet-avro, but not yet
> > > committed.
> > > > I would like for this feature to make it into 1.9.0. If you want to
> > test
> > > > in
> > > > the mean time, check out the pull request:
> > > >
> > > >   https://github.com/apache/parquet-mr/pull/318
> > > >
> > > > rb
> > > >
> > > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> > <ra...@in.ibm.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > > test-tool,
> > > > > that writes data to Parquet-files, which can be imported into
> > > > hive-tables.
> > > > > Pl. find attached sample-program, which writes simple
> > > parquet-data-file:
> > > > >
> > > > >
> > > > >
> > > > > Using the above program, I could create "parquet-files" with
> > > data-types:
> > > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > > supported
> > > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > > successfully.
> > > > >
> > > > > Now, I am trying to figure out, how to write "date, timestamp,
> > decimal
> > > > > data" into parquet-files.  In this context, I request you provide
> > the
> > > > > possible options (and/or sample-program, if any..), in this
> regard.
> > > > >
> > > > > Thanks,
> > > > >  Ravi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

Regarding the support for "date, timestamp, decimal" data types for 
Parquet-files:

In your earlier mail, you have mentioned the pull-request-URL: 
https://github.com/apache/parquet-mr/pull/318 has the necessary support 
for these data-types (and that it would be released as part of 
parquet-avro-release:1.9.0). 

I see that, this fix is included in build# 1247 (& above?). How to get 
this build (or the latest-build), that includes the JAR-file: 
"parquet-avro" including the support for "date,timestamp"..etc. ? Could 
you please let me know.

Thanks,
 Ravi



From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas 
Mudigonda/India/IBM@IBMIN
Date:   03/14/2016 09:56 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



Ravi,

Support for those types in parquet-avro hasn't been committed yet. It's
implemented in the branch I pointed you to. If you want to use released
versions, it should be out in 1.9.0.

rb

On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Thanks for the inputs.
>
> I am building & running the test-application, primarily using the
> following JAR-files (for Avro, Parquet-Avro & Hive APIs):
>
> 1) avro-1.8.0.jar
> 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> maven-repository-URL:
> http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> 3) hive-exec-1.2.1.jar
>
> Am I supposed to build/run the test, using a different version of the
> JAR-files ? Could you please let me know.
>
> Thanks,
>  Ravi
>
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/11/2016 10:54 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Yes, it is supported in 1.2.1. It went in here:
>
>
>
> 
https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b

>
>
> Are you using a version of Parquet with that pull request in it? Also, 
if
> you're using CDH this may not work.
>
> rb
>
> On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi 
<ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > I am using hive-version: 1.2.1, as indicated below:
> >
> > --------------------------------------
> > $ hive --version
> > Hive 1.2.1
> > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > From source with checksum ab480aca41b24a9c3751b8c023338231
> > $
> > --------------------------------------
> >
> > As I understand, this version of "hive" supports "date" datatype. 
right
> ?.
> > Do you want me to re-test using any other higher-version of hive ? Pl.
> let
> > me know your thoughts.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/11/2016 06:18 AM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > What version of Hive are you using? You should make sure date is
> supported
> > there.
> >
> > rb
> >
> > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > Many thanks for the reply. I see that, the text-attachment 
containing
> my
> > > test-program is not sent to the mail-group, but got filtered out.
> Hence,
> > > copying the program-code below:
> > >
> > > =================================================================
> > > import java.io.IOException;
> > > import java.util.*;
> > > import org.apache.hadoop.conf.Configuration;
> > > import org.apache.hadoop.fs.FileSystem;
> > > import org.apache.hadoop.fs.Path;
> > > import org.apache.avro.Schema;
> > > import org.apache.avro.Schema.Type;
> > > import org.apache.avro.Schema.Field;
> > > import org.apache.avro.generic.* ;
> > > import org.apache.avro.LogicalTypes;
> > > import org.apache.avro.LogicalTypes.*;
> > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > import parquet.avro.*;
> > >
> > > public class pqtw {
> > >
> > > public static Schema makeSchema() {
> > >      List<Field> fields = new ArrayList<Field>();
> > >      fields.add(new Field("name", Schema.create(Type.STRING), null,
> > > null));
> > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> null));
> > >
> > >      Schema date =
> > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > >      fields.add(new Field("doj", date, null, null));
> > >
> > >      Schema schema = Schema.createRecord("filecc", null, "parquet",
> > > false);
> > >      schema.setFields(fields);
> > >
> > >      return(schema);
> > > }
> > >
> > > public static GenericData.Record makeRecord (Schema schema, String
> name,
> > > int age, int doj) {
> > >      GenericData.Record record = new GenericData.Record(schema);
> > >      record.put("name", name);
> > >      record.put("age", age);
> > >      record.put("doj", doj);
> > >      return(record);
> > > }
> > >
> > > public static void main(String[] args) throws IOException,
> > >
> > >     InterruptedException, ClassNotFoundException {
> > >
> > >         String pqfile = "/tmp/pqtfile1";
> > >
> > >         try {
> > >
> > >         Configuration conf = new Configuration();
> > >         FileSystem fs = FileSystem.getLocal(conf);
> > >
> > >         Schema schema = makeSchema() ;
> > >         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) 
;
> > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > Path(pqfile),
> > > schema);
> > >         writer.write(rec);
> > >         writer.close();
> > >         }
> > >         catch (Exception e)
> > >         {
> > >                 e.printStackTrace();
> > >         }
> > >     }
> > > }
> > > =================================================================
> > >
> > > With the above logic, I could write the data to parquet-file. 
However,
> > > when I load the same into a hive-table & select columns, I could
> select
> > > the columns: "name", "age" (i.e., VARCHAR, INT columns) 
successfully,
> > but
> > > select of "date" column failed with the error given below:
> > >
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED 
AS
> > > PARQUET ;
> > > OK
> > > Time taken: 0.369 seconds
> > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > hive> SELECT name,age from PT1;
> > > OK
> > > abcd    21
> > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > hive> SELECT doj from PT1;
> > > OK
> > > Failed with exception
> > > 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
cannot
> be
> > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > Time taken: 0.167 seconds
> > > hive>
> > >
> > >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > >
> > > Basically, for "date datatype", I am trying to pass an integer-value
> > (for
> > > the # of days from Unix epoch, 1 January 1970, so that the date 
falls
> > > somewhere around 2011..etc). Is this the correct approach to process
> > date
> > > data (or is there any other approach / API to do it) ? Could you
> please
> > > let me know your inputs, in this regard ?
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/09/2016 10:48 PM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > Hi Ravi,
> > >
> > > Not all of the types are fully-implemented yet. I think Hive only 
has
> > > partial support. If I remember correctly:
> > > * Decimal is supported if the backing primitive type is fixed-length
> > > binary
> > > * Date and Timestamp are supported, but Time has not been 
implemented
> > yet
> > >
> > > For object models you can build applications on (instead of those
> > embedded
> > > in SQL), only Avro objects can support those types through its
> > > LogicalTypes
> > > API. That API has been implemented in parquet-avro, but not yet
> > committed.
> > > I would like for this feature to make it into 1.9.0. If you want to
> test
> > > in
> > > the mean time, check out the pull request:
> > >
> > >   https://github.com/apache/parquet-mr/pull/318
> > >
> > > rb
> > >
> > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > test-tool,
> > > > that writes data to Parquet-files, which can be imported into
> > > hive-tables.
> > > > Pl. find attached sample-program, which writes simple
> > parquet-data-file:
> > > >
> > > >
> > > >
> > > > Using the above program, I could create "parquet-files" with
> > data-types:
> > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > supported
> > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > successfully.
> > > >
> > > > Now, I am trying to figure out, how to write "date, timestamp,
> decimal
> > > > data" into parquet-files.  In this context, I request you provide
> the
> > > > possible options (and/or sample-program, if any..), in this 
regard.
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Ravi,

Support for those types in parquet-avro hasn't been committed yet. It's
implemented in the branch I pointed you to. If you want to use released
versions, it should be out in 1.9.0.

rb

On Sun, Mar 13, 2016 at 9:52 PM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Thanks for the inputs.
>
> I am building & running the test-application, primarily using the
> following JAR-files (for Avro, Parquet-Avro & Hive APIs):
>
> 1) avro-1.8.0.jar
> 2) parquet-avro-1.6.0.jar (This is the latest one, found in the
> maven-repository-URL:
> http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0)
> 3) hive-exec-1.2.1.jar
>
> Am I supposed to build/run the test, using a different version of the
> JAR-files ? Could you please let me know.
>
> Thanks,
>  Ravi
>
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/11/2016 10:54 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Yes, it is supported in 1.2.1. It went in here:
>
>
>
> https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b
>
>
> Are you using a version of Parquet with that pull request in it? Also, if
> you're using CDH this may not work.
>
> rb
>
> On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi <ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > I am using hive-version: 1.2.1, as indicated below:
> >
> > --------------------------------------
> > $ hive --version
> > Hive 1.2.1
> > Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> > 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> > Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> > From source with checksum ab480aca41b24a9c3751b8c023338231
> > $
> > --------------------------------------
> >
> > As I understand, this version of "hive" supports "date" datatype. right
> ?.
> > Do you want me to re-test using any other higher-version of hive ? Pl.
> let
> > me know your thoughts.
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/11/2016 06:18 AM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > What version of Hive are you using? You should make sure date is
> supported
> > there.
> >
> > rb
> >
> > On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello Ryan:
> > >
> > > Many thanks for the reply. I see that, the text-attachment containing
> my
> > > test-program is not sent to the mail-group, but got filtered out.
> Hence,
> > > copying the program-code below:
> > >
> > > =================================================================
> > > import java.io.IOException;
> > > import java.util.*;
> > > import org.apache.hadoop.conf.Configuration;
> > > import org.apache.hadoop.fs.FileSystem;
> > > import org.apache.hadoop.fs.Path;
> > > import org.apache.avro.Schema;
> > > import org.apache.avro.Schema.Type;
> > > import org.apache.avro.Schema.Field;
> > > import org.apache.avro.generic.* ;
> > > import org.apache.avro.LogicalTypes;
> > > import org.apache.avro.LogicalTypes.*;
> > > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > > import parquet.avro.*;
> > >
> > > public class pqtw {
> > >
> > > public static Schema makeSchema() {
> > >      List<Field> fields = new ArrayList<Field>();
> > >      fields.add(new Field("name", Schema.create(Type.STRING), null,
> > > null));
> > >      fields.add(new Field("age", Schema.create(Type.INT), null,
> null));
> > >
> > >      Schema date =
> > > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> > >      fields.add(new Field("doj", date, null, null));
> > >
> > >      Schema schema = Schema.createRecord("filecc", null, "parquet",
> > > false);
> > >      schema.setFields(fields);
> > >
> > >      return(schema);
> > > }
> > >
> > > public static GenericData.Record makeRecord (Schema schema, String
> name,
> > > int age, int doj) {
> > >      GenericData.Record record = new GenericData.Record(schema);
> > >      record.put("name", name);
> > >      record.put("age", age);
> > >      record.put("doj", doj);
> > >      return(record);
> > > }
> > >
> > > public static void main(String[] args) throws IOException,
> > >
> > >     InterruptedException, ClassNotFoundException {
> > >
> > >         String pqfile = "/tmp/pqtfile1";
> > >
> > >         try {
> > >
> > >         Configuration conf = new Configuration();
> > >         FileSystem fs = FileSystem.getLocal(conf);
> > >
> > >         Schema schema = makeSchema() ;
> > >         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ;
> > >         AvroParquetWriter writer = new AvroParquetWriter(new
> > Path(pqfile),
> > > schema);
> > >         writer.write(rec);
> > >         writer.close();
> > >         }
> > >         catch (Exception e)
> > >         {
> > >                 e.printStackTrace();
> > >         }
> > >     }
> > > }
> > > =================================================================
> > >
> > > With the above logic, I could write the data to parquet-file. However,
> > > when I load the same into a hive-table & select columns, I could
> select
> > > the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully,
> > but
> > > select of "date" column failed with the error given below:
> > >
> > >
> > >
> >
> >
>
> --------------------------------------------------------------------------------
> > > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS
> > > PARQUET ;
> > > OK
> > > Time taken: 0.369 seconds
> > > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > > hive> SELECT name,age from PT1;
> > > OK
> > > abcd    21
> > > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > > hive> SELECT doj from PT1;
> > > OK
> > > Failed with exception
> > > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot
> be
> > > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > > Time taken: 0.167 seconds
> > > hive>
> > >
> > >
> >
> >
>
> --------------------------------------------------------------------------------
> > >
> > > Basically, for "date datatype", I am trying to pass an integer-value
> > (for
> > > the # of days from Unix epoch, 1 January 1970, so that the date falls
> > > somewhere around 2011..etc). Is this the correct approach to process
> > date
> > > data (or is there any other approach / API to do it) ? Could you
> please
> > > let me know your inputs, in this regard ?
> > >
> > > Thanks,
> > >  Ravi
> > >
> > >
> > >
> > > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > > To:     Parquet Dev <de...@parquet.apache.org>
> > > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > > Mudigonda/India/IBM@IBMIN
> > > Date:   03/09/2016 10:48 PM
> > > Subject:        Re: How to write "date, timestamp, decimal" data to
> > > Parquet-files
> > >
> > >
> > >
> > > Hi Ravi,
> > >
> > > Not all of the types are fully-implemented yet. I think Hive only has
> > > partial support. If I remember correctly:
> > > * Decimal is supported if the backing primitive type is fixed-length
> > > binary
> > > * Date and Timestamp are supported, but Time has not been implemented
> > yet
> > >
> > > For object models you can build applications on (instead of those
> > embedded
> > > in SQL), only Avro objects can support those types through its
> > > LogicalTypes
> > > API. That API has been implemented in parquet-avro, but not yet
> > committed.
> > > I would like for this feature to make it into 1.9.0. If you want to
> test
> > > in
> > > the mean time, check out the pull request:
> > >
> > >   https://github.com/apache/parquet-mr/pull/318
> > >
> > > rb
> > >
> > > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi
> <ra...@in.ibm.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> > test-tool,
> > > > that writes data to Parquet-files, which can be imported into
> > > hive-tables.
> > > > Pl. find attached sample-program, which writes simple
> > parquet-data-file:
> > > >
> > > >
> > > >
> > > > Using the above program, I could create "parquet-files" with
> > data-types:
> > > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > > supported
> > > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > > successfully.
> > > >
> > > > Now, I am trying to figure out, how to write "date, timestamp,
> decimal
> > > > data" into parquet-files.  In this context, I request you provide
> the
> > > > possible options (and/or sample-program, if any..), in this regard.
> > > >
> > > > Thanks,
> > > >  Ravi
> > > >
> > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

Thanks for the inputs. 

I am building & running the test-application, primarily using the 
following JAR-files (for Avro, Parquet-Avro & Hive APIs):

1) avro-1.8.0.jar
2) parquet-avro-1.6.0.jar (This is the latest one, found in the 
maven-repository-URL: 
http://mvnrepository.com/artifact/com.twitter/parquet-avro/1.6.0) 
3) hive-exec-1.2.1.jar

Am I supposed to build/run the test, using a different version of the 
JAR-files ? Could you please let me know.

Thanks,
 Ravi




From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas 
Mudigonda/India/IBM@IBMIN
Date:   03/11/2016 10:54 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



Yes, it is supported in 1.2.1. It went in here:


https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b


Are you using a version of Parquet with that pull request in it? Also, if
you're using CDH this may not work.

rb

On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> I am using hive-version: 1.2.1, as indicated below:
>
> --------------------------------------
> $ hive --version
> Hive 1.2.1
> Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> From source with checksum ab480aca41b24a9c3751b8c023338231
> $
> --------------------------------------
>
> As I understand, this version of "hive" supports "date" datatype. right 
?.
> Do you want me to re-test using any other higher-version of hive ? Pl. 
let
> me know your thoughts.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/11/2016 06:18 AM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> What version of Hive are you using? You should make sure date is 
supported
> there.
>
> rb
>
> On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi 
<ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Many thanks for the reply. I see that, the text-attachment containing 
my
> > test-program is not sent to the mail-group, but got filtered out. 
Hence,
> > copying the program-code below:
> >
> > =================================================================
> > import java.io.IOException;
> > import java.util.*;
> > import org.apache.hadoop.conf.Configuration;
> > import org.apache.hadoop.fs.FileSystem;
> > import org.apache.hadoop.fs.Path;
> > import org.apache.avro.Schema;
> > import org.apache.avro.Schema.Type;
> > import org.apache.avro.Schema.Field;
> > import org.apache.avro.generic.* ;
> > import org.apache.avro.LogicalTypes;
> > import org.apache.avro.LogicalTypes.*;
> > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > import parquet.avro.*;
> >
> > public class pqtw {
> >
> > public static Schema makeSchema() {
> >      List<Field> fields = new ArrayList<Field>();
> >      fields.add(new Field("name", Schema.create(Type.STRING), null,
> > null));
> >      fields.add(new Field("age", Schema.create(Type.INT), null, 
null));
> >
> >      Schema date =
> > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> >      fields.add(new Field("doj", date, null, null));
> >
> >      Schema schema = Schema.createRecord("filecc", null, "parquet",
> > false);
> >      schema.setFields(fields);
> >
> >      return(schema);
> > }
> >
> > public static GenericData.Record makeRecord (Schema schema, String 
name,
> > int age, int doj) {
> >      GenericData.Record record = new GenericData.Record(schema);
> >      record.put("name", name);
> >      record.put("age", age);
> >      record.put("doj", doj);
> >      return(record);
> > }
> >
> > public static void main(String[] args) throws IOException,
> >
> >     InterruptedException, ClassNotFoundException {
> >
> >         String pqfile = "/tmp/pqtfile1";
> >
> >         try {
> >
> >         Configuration conf = new Configuration();
> >         FileSystem fs = FileSystem.getLocal(conf);
> >
> >         Schema schema = makeSchema() ;
> >         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ;
> >         AvroParquetWriter writer = new AvroParquetWriter(new
> Path(pqfile),
> > schema);
> >         writer.write(rec);
> >         writer.close();
> >         }
> >         catch (Exception e)
> >         {
> >                 e.printStackTrace();
> >         }
> >     }
> > }
> > =================================================================
> >
> > With the above logic, I could write the data to parquet-file. However,
> > when I load the same into a hive-table & select columns, I could 
select
> > the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully,
> but
> > select of "date" column failed with the error given below:
> >
> >
> >
>
> 
--------------------------------------------------------------------------------
> > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS
> > PARQUET ;
> > OK
> > Time taken: 0.369 seconds
> > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > hive> SELECT name,age from PT1;
> > OK
> > abcd    21
> > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > hive> SELECT doj from PT1;
> > OK
> > Failed with exception
> > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot 
be
> > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > Time taken: 0.167 seconds
> > hive>
> >
> >
>
> 
--------------------------------------------------------------------------------
> >
> > Basically, for "date datatype", I am trying to pass an integer-value
> (for
> > the # of days from Unix epoch, 1 January 1970, so that the date falls
> > somewhere around 2011..etc). Is this the correct approach to process
> date
> > data (or is there any other approach / API to do it) ? Could you 
please
> > let me know your inputs, in this regard ?
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/09/2016 10:48 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Hi Ravi,
> >
> > Not all of the types are fully-implemented yet. I think Hive only has
> > partial support. If I remember correctly:
> > * Decimal is supported if the backing primitive type is fixed-length
> > binary
> > * Date and Timestamp are supported, but Time has not been implemented
> yet
> >
> > For object models you can build applications on (instead of those
> embedded
> > in SQL), only Avro objects can support those types through its
> > LogicalTypes
> > API. That API has been implemented in parquet-avro, but not yet
> committed.
> > I would like for this feature to make it into 1.9.0. If you want to 
test
> > in
> > the mean time, check out the pull request:
> >
> >   https://github.com/apache/parquet-mr/pull/318
> >
> > rb
> >
> > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi 
<ra...@in.ibm.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> test-tool,
> > > that writes data to Parquet-files, which can be imported into
> > hive-tables.
> > > Pl. find attached sample-program, which writes simple
> parquet-data-file:
> > >
> > >
> > >
> > > Using the above program, I could create "parquet-files" with
> data-types:
> > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > supported
> > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > successfully.
> > >
> > > Now, I am trying to figure out, how to write "date, timestamp, 
decimal
> > > data" into parquet-files.  In this context, I request you provide 
the
> > > possible options (and/or sample-program, if any..), in this regard.
> > >
> > > Thanks,
> > >  Ravi
> > >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Yes, it is supported in 1.2.1. It went in here:


https://github.com/apache/hive/commit/912b4897ed457cfc447995b124ae84078287530b

Are you using a version of Parquet with that pull request in it? Also, if
you're using CDH this may not work.

rb

On Fri, Mar 11, 2016 at 12:40 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> I am using hive-version: 1.2.1, as indicated below:
>
> --------------------------------------
> $ hive --version
> Hive 1.2.1
> Subversion git://localhost.localdomain/home/sush/dev/hive.git -r
> 243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
> Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
> From source with checksum ab480aca41b24a9c3751b8c023338231
> $
> --------------------------------------
>
> As I understand, this version of "hive" supports "date" datatype. right ?.
> Do you want me to re-test using any other higher-version of hive ? Pl. let
> me know your thoughts.
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/11/2016 06:18 AM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> What version of Hive are you using? You should make sure date is supported
> there.
>
> rb
>
> On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi <ra...@in.ibm.com>
> wrote:
>
> > Hello Ryan:
> >
> > Many thanks for the reply. I see that, the text-attachment containing my
> > test-program is not sent to the mail-group, but got filtered out. Hence,
> > copying the program-code below:
> >
> > =================================================================
> > import java.io.IOException;
> > import java.util.*;
> > import org.apache.hadoop.conf.Configuration;
> > import org.apache.hadoop.fs.FileSystem;
> > import org.apache.hadoop.fs.Path;
> > import org.apache.avro.Schema;
> > import org.apache.avro.Schema.Type;
> > import org.apache.avro.Schema.Field;
> > import org.apache.avro.generic.* ;
> > import org.apache.avro.LogicalTypes;
> > import org.apache.avro.LogicalTypes.*;
> > import org.apache.hadoop.hive.common.type.HiveDecimal;
> > import parquet.avro.*;
> >
> > public class pqtw {
> >
> > public static Schema makeSchema() {
> >      List<Field> fields = new ArrayList<Field>();
> >      fields.add(new Field("name", Schema.create(Type.STRING), null,
> > null));
> >      fields.add(new Field("age", Schema.create(Type.INT), null, null));
> >
> >      Schema date =
> > LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
> >      fields.add(new Field("doj", date, null, null));
> >
> >      Schema schema = Schema.createRecord("filecc", null, "parquet",
> > false);
> >      schema.setFields(fields);
> >
> >      return(schema);
> > }
> >
> > public static GenericData.Record makeRecord (Schema schema, String name,
> > int age, int doj) {
> >      GenericData.Record record = new GenericData.Record(schema);
> >      record.put("name", name);
> >      record.put("age", age);
> >      record.put("doj", doj);
> >      return(record);
> > }
> >
> > public static void main(String[] args) throws IOException,
> >
> >     InterruptedException, ClassNotFoundException {
> >
> >         String pqfile = "/tmp/pqtfile1";
> >
> >         try {
> >
> >         Configuration conf = new Configuration();
> >         FileSystem fs = FileSystem.getLocal(conf);
> >
> >         Schema schema = makeSchema() ;
> >         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ;
> >         AvroParquetWriter writer = new AvroParquetWriter(new
> Path(pqfile),
> > schema);
> >         writer.write(rec);
> >         writer.close();
> >         }
> >         catch (Exception e)
> >         {
> >                 e.printStackTrace();
> >         }
> >     }
> > }
> > =================================================================
> >
> > With the above logic, I could write the data to parquet-file. However,
> > when I load the same into a hive-table & select columns, I could select
> > the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully,
> but
> > select of "date" column failed with the error given below:
> >
> >
> >
>
> --------------------------------------------------------------------------------
> > hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS
> > PARQUET ;
> > OK
> > Time taken: 0.369 seconds
> > hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> > hive> SELECT name,age from PT1;
> > OK
> > abcd    21
> > Time taken: 0.311 seconds, Fetched: 1 row(s)
> > hive> SELECT doj from PT1;
> > OK
> > Failed with exception
> > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
> > cast to org.apache.hadoop.hive.serde2.io.DateWritable
> > Time taken: 0.167 seconds
> > hive>
> >
> >
>
> --------------------------------------------------------------------------------
> >
> > Basically, for "date datatype", I am trying to pass an integer-value
> (for
> > the # of days from Unix epoch, 1 January 1970, so that the date falls
> > somewhere around 2011..etc). Is this the correct approach to process
> date
> > data (or is there any other approach / API to do it) ? Could you please
> > let me know your inputs, in this regard ?
> >
> > Thanks,
> >  Ravi
> >
> >
> >
> > From:   Ryan Blue <rb...@netflix.com.INVALID>
> > To:     Parquet Dev <de...@parquet.apache.org>
> > Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> > Mudigonda/India/IBM@IBMIN
> > Date:   03/09/2016 10:48 PM
> > Subject:        Re: How to write "date, timestamp, decimal" data to
> > Parquet-files
> >
> >
> >
> > Hi Ravi,
> >
> > Not all of the types are fully-implemented yet. I think Hive only has
> > partial support. If I remember correctly:
> > * Decimal is supported if the backing primitive type is fixed-length
> > binary
> > * Date and Timestamp are supported, but Time has not been implemented
> yet
> >
> > For object models you can build applications on (instead of those
> embedded
> > in SQL), only Avro objects can support those types through its
> > LogicalTypes
> > API. That API has been implemented in parquet-avro, but not yet
> committed.
> > I would like for this feature to make it into 1.9.0. If you want to test
> > in
> > the mean time, check out the pull request:
> >
> >   https://github.com/apache/parquet-mr/pull/318
> >
> > rb
> >
> > On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi <ra...@in.ibm.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am Ravi Tatapudi, from IBM-India. I am working on a simple
> test-tool,
> > > that writes data to Parquet-files, which can be imported into
> > hive-tables.
> > > Pl. find attached sample-program, which writes simple
> parquet-data-file:
> > >
> > >
> > >
> > > Using the above program, I could create "parquet-files" with
> data-types:
> > > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> > supported
> > > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > > successfully.
> > >
> > > Now, I am trying to figure out, how to write "date, timestamp, decimal
> > > data" into parquet-files.  In this context, I request you provide the
> > > possible options (and/or sample-program, if any..), in this regard.
> > >
> > > Thanks,
> > >  Ravi
> > >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

I am using hive-version: 1.2.1, as indicated below:

--------------------------------------
$ hive --version
Hive 1.2.1
Subversion git://localhost.localdomain/home/sush/dev/hive.git -r 
243e7c1ac39cb7ac8b65c5bc6988f5cc3162f558
Compiled by sush on Fri Jun 19 02:03:48 PDT 2015
>From source with checksum ab480aca41b24a9c3751b8c023338231
$
--------------------------------------

As I understand, this version of "hive" supports "date" datatype. right ?. 
Do you want me to re-test using any other higher-version of hive ? Pl. let 
me know your thoughts.

Thanks,
 Ravi



From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas 
Mudigonda/India/IBM@IBMIN
Date:   03/11/2016 06:18 AM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



What version of Hive are you using? You should make sure date is supported
there.

rb

On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Many thanks for the reply. I see that, the text-attachment containing my
> test-program is not sent to the mail-group, but got filtered out. Hence,
> copying the program-code below:
>
> =================================================================
> import java.io.IOException;
> import java.util.*;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.avro.Schema;
> import org.apache.avro.Schema.Type;
> import org.apache.avro.Schema.Field;
> import org.apache.avro.generic.* ;
> import org.apache.avro.LogicalTypes;
> import org.apache.avro.LogicalTypes.*;
> import org.apache.hadoop.hive.common.type.HiveDecimal;
> import parquet.avro.*;
>
> public class pqtw {
>
> public static Schema makeSchema() {
>      List<Field> fields = new ArrayList<Field>();
>      fields.add(new Field("name", Schema.create(Type.STRING), null,
> null));
>      fields.add(new Field("age", Schema.create(Type.INT), null, null));
>
>      Schema date =
> LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
>      fields.add(new Field("doj", date, null, null));
>
>      Schema schema = Schema.createRecord("filecc", null, "parquet",
> false);
>      schema.setFields(fields);
>
>      return(schema);
> }
>
> public static GenericData.Record makeRecord (Schema schema, String name,
> int age, int doj) {
>      GenericData.Record record = new GenericData.Record(schema);
>      record.put("name", name);
>      record.put("age", age);
>      record.put("doj", doj);
>      return(record);
> }
>
> public static void main(String[] args) throws IOException,
>
>     InterruptedException, ClassNotFoundException {
>
>         String pqfile = "/tmp/pqtfile1";
>
>         try {
>
>         Configuration conf = new Configuration();
>         FileSystem fs = FileSystem.getLocal(conf);
>
>         Schema schema = makeSchema() ;
>         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ;
>         AvroParquetWriter writer = new AvroParquetWriter(new 
Path(pqfile),
> schema);
>         writer.write(rec);
>         writer.close();
>         }
>         catch (Exception e)
>         {
>                 e.printStackTrace();
>         }
>     }
> }
> =================================================================
>
> With the above logic, I could write the data to parquet-file. However,
> when I load the same into a hive-table & select columns, I could select
> the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully, 
but
> select of "date" column failed with the error given below:
>
>
> 
--------------------------------------------------------------------------------
> hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS
> PARQUET ;
> OK
> Time taken: 0.369 seconds
> hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> hive> SELECT name,age from PT1;
> OK
> abcd    21
> Time taken: 0.311 seconds, Fetched: 1 row(s)
> hive> SELECT doj from PT1;
> OK
> Failed with exception
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
> cast to org.apache.hadoop.hive.serde2.io.DateWritable
> Time taken: 0.167 seconds
> hive>
>
> 
--------------------------------------------------------------------------------
>
> Basically, for "date datatype", I am trying to pass an integer-value 
(for
> the # of days from Unix epoch, 1 January 1970, so that the date falls
> somewhere around 2011..etc). Is this the correct approach to process 
date
> data (or is there any other approach / API to do it) ? Could you please
> let me know your inputs, in this regard ?
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/09/2016 10:48 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Hi Ravi,
>
> Not all of the types are fully-implemented yet. I think Hive only has
> partial support. If I remember correctly:
> * Decimal is supported if the backing primitive type is fixed-length
> binary
> * Date and Timestamp are supported, but Time has not been implemented 
yet
>
> For object models you can build applications on (instead of those 
embedded
> in SQL), only Avro objects can support those types through its
> LogicalTypes
> API. That API has been implemented in parquet-avro, but not yet 
committed.
> I would like for this feature to make it into 1.9.0. If you want to test
> in
> the mean time, check out the pull request:
>
>   https://github.com/apache/parquet-mr/pull/318
>
> rb
>
> On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi <ra...@in.ibm.com>
> wrote:
>
> > Hello,
> >
> > I am Ravi Tatapudi, from IBM-India. I am working on a simple 
test-tool,
> > that writes data to Parquet-files, which can be imported into
> hive-tables.
> > Pl. find attached sample-program, which writes simple 
parquet-data-file:
> >
> >
> >
> > Using the above program, I could create "parquet-files" with 
data-types:
> > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> supported
> > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > successfully.
> >
> > Now, I am trying to figure out, how to write "date, timestamp, decimal
> > data" into parquet-files.  In this context, I request you provide the
> > possible options (and/or sample-program, if any..), in this regard.
> >
> > Thanks,
> >  Ravi
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
What version of Hive are you using? You should make sure date is supported
there.

rb

On Thu, Mar 10, 2016 at 3:11 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello Ryan:
>
> Many thanks for the reply. I see that, the text-attachment containing my
> test-program is not sent to the mail-group, but got filtered out. Hence,
> copying the program-code below:
>
> =================================================================
> import java.io.IOException;
> import java.util.*;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.avro.Schema;
> import org.apache.avro.Schema.Type;
> import org.apache.avro.Schema.Field;
> import org.apache.avro.generic.* ;
> import org.apache.avro.LogicalTypes;
> import org.apache.avro.LogicalTypes.*;
> import org.apache.hadoop.hive.common.type.HiveDecimal;
> import parquet.avro.*;
>
> public class pqtw {
>
> public static Schema makeSchema() {
>      List<Field> fields = new ArrayList<Field>();
>      fields.add(new Field("name", Schema.create(Type.STRING), null,
> null));
>      fields.add(new Field("age", Schema.create(Type.INT), null, null));
>
>      Schema date =
> LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
>      fields.add(new Field("doj", date, null, null));
>
>      Schema schema = Schema.createRecord("filecc", null, "parquet",
> false);
>      schema.setFields(fields);
>
>      return(schema);
> }
>
> public static GenericData.Record makeRecord (Schema schema, String name,
> int age, int doj) {
>      GenericData.Record record = new GenericData.Record(schema);
>      record.put("name", name);
>      record.put("age", age);
>      record.put("doj", doj);
>      return(record);
> }
>
> public static void main(String[] args) throws IOException,
>
>     InterruptedException, ClassNotFoundException {
>
>         String pqfile = "/tmp/pqtfile1";
>
>         try {
>
>         Configuration conf = new Configuration();
>         FileSystem fs = FileSystem.getLocal(conf);
>
>         Schema schema = makeSchema() ;
>         GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ;
>         AvroParquetWriter writer = new AvroParquetWriter(new Path(pqfile),
> schema);
>         writer.write(rec);
>         writer.close();
>         }
>         catch (Exception e)
>         {
>                 e.printStackTrace();
>         }
>     }
> }
> =================================================================
>
> With the above logic, I could write the data to parquet-file. However,
> when I load the same into a hive-table & select columns, I could select
> the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully, but
> select of "date" column failed with the error given below:
>
>
> --------------------------------------------------------------------------------
> hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS
> PARQUET ;
> OK
> Time taken: 0.369 seconds
> hive> load data local inpath '/tmp/pqtfile1' into table PT1;
> hive> SELECT name,age from PT1;
> OK
> abcd    21
> Time taken: 0.311 seconds, Fetched: 1 row(s)
> hive> SELECT doj from PT1;
> OK
> Failed with exception
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be
> cast to org.apache.hadoop.hive.serde2.io.DateWritable
> Time taken: 0.167 seconds
> hive>
>
> --------------------------------------------------------------------------------
>
> Basically, for "date datatype", I am trying to pass an integer-value (for
> the # of days from Unix epoch, 1 January 1970, so that the date falls
> somewhere around 2011..etc). Is this the correct approach to process date
> data (or is there any other approach / API to do it) ? Could you please
> let me know your inputs, in this regard ?
>
> Thanks,
>  Ravi
>
>
>
> From:   Ryan Blue <rb...@netflix.com.INVALID>
> To:     Parquet Dev <de...@parquet.apache.org>
> Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas
> Mudigonda/India/IBM@IBMIN
> Date:   03/09/2016 10:48 PM
> Subject:        Re: How to write "date, timestamp, decimal" data to
> Parquet-files
>
>
>
> Hi Ravi,
>
> Not all of the types are fully-implemented yet. I think Hive only has
> partial support. If I remember correctly:
> * Decimal is supported if the backing primitive type is fixed-length
> binary
> * Date and Timestamp are supported, but Time has not been implemented yet
>
> For object models you can build applications on (instead of those embedded
> in SQL), only Avro objects can support those types through its
> LogicalTypes
> API. That API has been implemented in parquet-avro, but not yet committed.
> I would like for this feature to make it into 1.9.0. If you want to test
> in
> the mean time, check out the pull request:
>
>   https://github.com/apache/parquet-mr/pull/318
>
> rb
>
> On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi <ra...@in.ibm.com>
> wrote:
>
> > Hello,
> >
> > I am Ravi Tatapudi, from IBM-India. I am working on a simple test-tool,
> > that writes data to Parquet-files, which can be imported into
> hive-tables.
> > Pl. find attached sample-program, which writes simple parquet-data-file:
> >
> >
> >
> > Using the above program, I could create "parquet-files" with data-types:
> > INT, LONG, STRING, Boolean...etc (i.e., basically all data-types
> supported
> > by "org.apache.avro.Schema.Type) & load it into "hive" tables
> > successfully.
> >
> > Now, I am trying to figure out, how to write "date, timestamp, decimal
> > data" into parquet-files.  In this context, I request you provide the
> > possible options (and/or sample-program, if any..), in this regard.
> >
> > Thanks,
> >  Ravi
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ravi Tatapudi <ra...@in.ibm.com>.
Hello Ryan:

Many thanks for the reply. I see that, the text-attachment containing my 
test-program is not sent to the mail-group, but got filtered out. Hence, 
copying the program-code below:

=================================================================
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.avro.Schema;
import org.apache.avro.Schema.Type;
import org.apache.avro.Schema.Field;
import org.apache.avro.generic.* ;
import org.apache.avro.LogicalTypes;
import org.apache.avro.LogicalTypes.*;
import org.apache.hadoop.hive.common.type.HiveDecimal;
import parquet.avro.*;

public class pqtw {

public static Schema makeSchema() {
     List<Field> fields = new ArrayList<Field>();
     fields.add(new Field("name", Schema.create(Type.STRING), null, 
null));
     fields.add(new Field("age", Schema.create(Type.INT), null, null));

     Schema date = 
LogicalTypes.date().addToSchema(Schema.create(Type.INT)) ;
     fields.add(new Field("doj", date, null, null));

     Schema schema = Schema.createRecord("filecc", null, "parquet", 
false);
     schema.setFields(fields);

     return(schema);
}

public static GenericData.Record makeRecord (Schema schema, String name, 
int age, int doj) {
     GenericData.Record record = new GenericData.Record(schema);
     record.put("name", name);
     record.put("age", age);
     record.put("doj", doj);
     return(record);
}

public static void main(String[] args) throws IOException,

    InterruptedException, ClassNotFoundException {

        String pqfile = "/tmp/pqtfile1";

        try {

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.getLocal(conf);

        Schema schema = makeSchema() ;
        GenericData.Record rec = makeRecord(schema,"abcd", 21,15000) ;
        AvroParquetWriter writer = new AvroParquetWriter(new Path(pqfile), 
schema);
        writer.write(rec);
        writer.close();
        }
        catch (Exception e)
        {
                e.printStackTrace();
        }
    }
}
=================================================================

With the above logic, I could write the data to parquet-file. However, 
when I load the same into a hive-table & select columns, I could select 
the columns: "name", "age" (i.e., VARCHAR, INT columns) successfully, but 
select of "date" column failed with the error given below:

--------------------------------------------------------------------------------
hive> CREATE TABLE PT1 (name varchar(10), age int, doj date) STORED AS 
PARQUET ;
OK
Time taken: 0.369 seconds
hive> load data local inpath '/tmp/pqtfile1' into table PT1;
hive> SELECT name,age from PT1;
OK
abcd    21
Time taken: 0.311 seconds, Fetched: 1 row(s)
hive> SELECT doj from PT1;
OK
Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be 
cast to org.apache.hadoop.hive.serde2.io.DateWritable
Time taken: 0.167 seconds
hive>
--------------------------------------------------------------------------------

Basically, for "date datatype", I am trying to pass an integer-value (for 
the # of days from Unix epoch, 1 January 1970, so that the date falls 
somewhere around 2011..etc). Is this the correct approach to process date 
data (or is there any other approach / API to do it) ? Could you please 
let me know your inputs, in this regard ?

Thanks,
 Ravi



From:   Ryan Blue <rb...@netflix.com.INVALID>
To:     Parquet Dev <de...@parquet.apache.org>
Cc:     Nagesh R Charka/India/IBM@IBMIN, Srinivas 
Mudigonda/India/IBM@IBMIN
Date:   03/09/2016 10:48 PM
Subject:        Re: How to write "date, timestamp, decimal" data to 
Parquet-files



Hi Ravi,

Not all of the types are fully-implemented yet. I think Hive only has
partial support. If I remember correctly:
* Decimal is supported if the backing primitive type is fixed-length 
binary
* Date and Timestamp are supported, but Time has not been implemented yet

For object models you can build applications on (instead of those embedded
in SQL), only Avro objects can support those types through its 
LogicalTypes
API. That API has been implemented in parquet-avro, but not yet committed.
I would like for this feature to make it into 1.9.0. If you want to test 
in
the mean time, check out the pull request:

  https://github.com/apache/parquet-mr/pull/318

rb

On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello,
>
> I am Ravi Tatapudi, from IBM-India. I am working on a simple test-tool,
> that writes data to Parquet-files, which can be imported into 
hive-tables.
> Pl. find attached sample-program, which writes simple parquet-data-file:
>
>
>
> Using the above program, I could create "parquet-files" with data-types:
> INT, LONG, STRING, Boolean...etc (i.e., basically all data-types 
supported
> by "org.apache.avro.Schema.Type) & load it into "hive" tables
> successfully.
>
> Now, I am trying to figure out, how to write "date, timestamp, decimal
> data" into parquet-files.  In this context, I request you provide the
> possible options (and/or sample-program, if any..), in this regard.
>
> Thanks,
>  Ravi
>



-- 
Ryan Blue
Software Engineer
Netflix




Re: How to write "date, timestamp, decimal" data to Parquet-files

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Hi Ravi,

Not all of the types are fully-implemented yet. I think Hive only has
partial support. If I remember correctly:
* Decimal is supported if the backing primitive type is fixed-length binary
* Date and Timestamp are supported, but Time has not been implemented yet

For object models you can build applications on (instead of those embedded
in SQL), only Avro objects can support those types through its LogicalTypes
API. That API has been implemented in parquet-avro, but not yet committed.
I would like for this feature to make it into 1.9.0. If you want to test in
the mean time, check out the pull request:

  https://github.com/apache/parquet-mr/pull/318

rb

On Wed, Mar 9, 2016 at 5:09 AM, Ravi Tatapudi <ra...@in.ibm.com>
wrote:

> Hello,
>
> I am Ravi Tatapudi, from IBM-India. I am working on a simple test-tool,
> that writes data to Parquet-files, which can be imported into hive-tables.
> Pl. find attached sample-program, which writes simple parquet-data-file:
>
>
>
> Using the above program, I could create "parquet-files" with data-types:
> INT, LONG, STRING, Boolean...etc (i.e., basically all data-types supported
> by "org.apache.avro.Schema.Type) & load it into "hive" tables
> successfully.
>
> Now, I am trying to figure out, how to write "date, timestamp, decimal
> data" into parquet-files.  In this context, I request you provide the
> possible options (and/or sample-program, if any..), in this regard.
>
> Thanks,
>  Ravi
>



-- 
Ryan Blue
Software Engineer
Netflix