You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by cheng xu <ch...@intel.com> on 2014/11/18 02:58:13 UTC

Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------

Review request for hive.


Repository: hive-git


Description
-------

This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test


Diffs
-----

  data/files/parquet_types.txt d342062 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
  ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
  ql/src/test/results/clientpositive/parquet_types.q.out 275897c 

Diff: https://reviews.apache.org/r/28147/diff/


Testing
-------

related UT and QL tests passed


Thanks,

cheng xu


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by cheng xu <ch...@intel.com>.

> On Nov. 23, 2014, 10:59 p.m., Mohit Sabharwal wrote:
> > data/files/parquet_types.txt, lines 1-3
> > <https://reviews.apache.org/r/28147/diff/3/?file=772138#file772138line1>
> >
> >     I think this is bit confusing, since the 0b prefix gives the impression that data is read in binary format, whereas it is actually getting read as a string.
> >     
> >     I think we can either write (preferably non-ascii) binary data instead (for example, see: data/files/string.txt) OR alternatively, we could write it legibly in hex, like 68656c6c6f ("hello") and convert it to binary using unhex() in the INSERT OVERWRITE query. What do you think ?

I encode some Chinese words(non-ascii) and use hex function to convert into string like B4F3CAFDBEDD(some Chinese words).


> On Nov. 23, 2014, 10:59 p.m., Mohit Sabharwal wrote:
> > ql/src/test/queries/clientpositive/parquet_types.q, line 48
> > <https://reviews.apache.org/r/28147/diff/3/?file=772143#file772143line48>
> >
> >     No need to unhex here...
> >     
> >     Can just be:
> >     
> >      SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), cbinary FROM parquet_types
> >      
> >     Or you can pass it through hex() if original data has unprintable characters:
> >     
> >      SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), hex(cbinary) FROM parquet_types

I think the statement of "SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, hex(cbinary), m1, l1, st1 FROM parquet_types;" will cover the case of binary. There is no need anymore for checking cbinary again.


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62744
-----------------------------------------------------------


On Nov. 21, 2014, 8:53 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2014, 8:53 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62744
-----------------------------------------------------------



data/files/parquet_types.txt
<https://reviews.apache.org/r/28147/#comment104827>

    I think this is bit confusing, since the 0b prefix gives the impression that data is read in binary format, whereas it is actually getting read as a string.
    
    I think we can either write (preferably non-ascii) binary data instead (for example, see: data/files/string.txt) OR alternatively, we could write it legibly in hex, like 68656c6c6f ("hello") and convert it to binary using unhex() in the INSERT OVERWRITE query. What do you think ?



ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104828>

    If we write hex format (like 68656c6c6f) in parquet_types.q, we can just use unhex() to convert it to binary:
    
    INSERT OVERWRITE TABLE parquet_types
    SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, unhex(cbinary), m1, l1, st1 FROM parquet_types_staging;



ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104830>

    Instead of "select * from parquet_types"... since cbinary column may have unprintable characters, you can pass it through hex() to make it legible:
    
    SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, hex(cbinary), m1, l1, st1 FROM parquet_types;



ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104829>

    No need to unhex here...
    
    Can just be:
    
     SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), cbinary FROM parquet_types
     
    Or you can pass it through hex() if original data has unprintable characters:
    
     SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), hex(cbinary) FROM parquet_types


- Mohit Sabharwal


On Nov. 21, 2014, 8:53 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2014, 8:53 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62754
-----------------------------------------------------------

Ship it!


Thanks for the changes!

- Mohit Sabharwal


On Nov. 24, 2014, 4:16 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2014, 4:16 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by cheng xu <ch...@intel.com>.

> On Nov. 24, 2014, 6:17 p.m., Brock Noland wrote:
> > LGTM! One comment below.

Sorry for my late response. Update the patch by removing unnecessary import statements and simpfying the type convert methods.


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62822
-----------------------------------------------------------


On Nov. 24, 2014, 4:16 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2014, 4:16 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62822
-----------------------------------------------------------


LGTM! One comment below.


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
<https://reviews.apache.org/r/28147/#comment104950>

    What's the purpose of Binary.fromByteArray(binaryBytes).getBytes() in this line? Doesn't it just end up returning the byte array?


- Brock Noland


On Nov. 24, 2014, 4:16 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2014, 4:16 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------

(Updated Nov. 28, 2014, 2:19 a.m.)


Review request for hive.


Repository: hive-git


Description
-------

This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test


Diffs (updated)
-----

  data/files/parquet_types.txt d342062 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
  ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
  ql/src/test/results/clientpositive/parquet_types.q.out 275897c 

Diff: https://reviews.apache.org/r/28147/diff/


Testing
-------

related UT and QL tests passed


Thanks,

cheng xu


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------

(Updated Nov. 24, 2014, 4:16 a.m.)


Review request for hive.


Changes
-------

summary:
1. add some non-ascii data into the test cases
2. regenerate the output


Repository: hive-git


Description
-------

This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test


Diffs (updated)
-----

  data/files/parquet_types.txt d342062 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
  ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
  ql/src/test/results/clientpositive/parquet_types.q.out 275897c 

Diff: https://reviews.apache.org/r/28147/diff/


Testing
-------

related UT and QL tests passed


Thanks,

cheng xu


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------

(Updated Nov. 21, 2014, 8:53 a.m.)


Review request for hive.


Changes
-------

remove needless cast function from the hql statement


Repository: hive-git


Description
-------

This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test


Diffs (updated)
-----

  data/files/parquet_types.txt d342062 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
  ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
  ql/src/test/results/clientpositive/parquet_types.q.out 275897c 

Diff: https://reviews.apache.org/r/28147/diff/


Testing
-------

related UT and QL tests passed


Thanks,

cheng xu


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62133
-----------------------------------------------------------

Ship it!


Ship It!

- Mohit Sabharwal


On Nov. 19, 2014, 2:01 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 2:01 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------

(Updated Nov. 19, 2014, 2:01 a.m.)


Review request for hive.


Changes
-------

accept Mickaël Lacour's comments and add null test for binary type in ParquetHiveSerDe


Repository: hive-git


Description
-------

This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test


Diffs (updated)
-----

  data/files/parquet_types.txt d342062 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
  ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
  ql/src/test/results/clientpositive/parquet_types.q.out 275897c 

Diff: https://reviews.apache.org/r/28147/diff/


Testing
-------

related UT and QL tests passed


Thanks,

cheng xu


Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe

Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review61946
-----------------------------------------------------------


LGTM. Mickaël Lacour's suggestion of adding a null value test is a great one.

- Mohit Sabharwal


On Nov. 18, 2014, 1:58 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 1:58 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
> 
> 
> Diffs
> -----
> 
>   data/files/parquet_types.txt d342062 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864 
>   ql/src/test/queries/clientpositive/parquet_types.q 22585c3 
>   ql/src/test/results/clientpositive/parquet_types.q.out 275897c 
> 
> Diff: https://reviews.apache.org/r/28147/diff/
> 
> 
> Testing
> -------
> 
> related UT and QL tests passed
> 
> 
> Thanks,
> 
> cheng xu
> 
>