You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by cheng xu <ch...@intel.com> on 2014/11/18 02:58:13 UTC
Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------
Review request for hive.
Repository: hive-git
Description
-------
This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test
Diffs
-----
data/files/parquet_types.txt d342062
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
ql/src/test/queries/clientpositive/parquet_types.q 22585c3
ql/src/test/results/clientpositive/parquet_types.q.out 275897c
Diff: https://reviews.apache.org/r/28147/diff/
Testing
-------
related UT and QL tests passed
Thanks,
cheng xu
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by cheng xu <ch...@intel.com>.
> On Nov. 23, 2014, 10:59 p.m., Mohit Sabharwal wrote:
> > data/files/parquet_types.txt, lines 1-3
> > <https://reviews.apache.org/r/28147/diff/3/?file=772138#file772138line1>
> >
> > I think this is bit confusing, since the 0b prefix gives the impression that data is read in binary format, whereas it is actually getting read as a string.
> >
> > I think we can either write (preferably non-ascii) binary data instead (for example, see: data/files/string.txt) OR alternatively, we could write it legibly in hex, like 68656c6c6f ("hello") and convert it to binary using unhex() in the INSERT OVERWRITE query. What do you think ?
I encode some Chinese words(non-ascii) and use hex function to convert into string like B4F3CAFDBEDD(some Chinese words).
> On Nov. 23, 2014, 10:59 p.m., Mohit Sabharwal wrote:
> > ql/src/test/queries/clientpositive/parquet_types.q, line 48
> > <https://reviews.apache.org/r/28147/diff/3/?file=772143#file772143line48>
> >
> > No need to unhex here...
> >
> > Can just be:
> >
> > SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), cbinary FROM parquet_types
> >
> > Or you can pass it through hex() if original data has unprintable characters:
> >
> > SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), hex(cbinary) FROM parquet_types
I think the statement of "SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, hex(cbinary), m1, l1, st1 FROM parquet_types;" will cover the case of binary. There is no need anymore for checking cbinary again.
- cheng
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62744
-----------------------------------------------------------
On Nov. 21, 2014, 8:53 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
>
> (Updated Nov. 21, 2014, 8:53 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
>
>
> Diffs
> -----
>
> data/files/parquet_types.txt d342062
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
> ql/src/test/queries/clientpositive/parquet_types.q 22585c3
> ql/src/test/results/clientpositive/parquet_types.q.out 275897c
>
> Diff: https://reviews.apache.org/r/28147/diff/
>
>
> Testing
> -------
>
> related UT and QL tests passed
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62744
-----------------------------------------------------------
data/files/parquet_types.txt
<https://reviews.apache.org/r/28147/#comment104827>
I think this is bit confusing, since the 0b prefix gives the impression that data is read in binary format, whereas it is actually getting read as a string.
I think we can either write (preferably non-ascii) binary data instead (for example, see: data/files/string.txt) OR alternatively, we could write it legibly in hex, like 68656c6c6f ("hello") and convert it to binary using unhex() in the INSERT OVERWRITE query. What do you think ?
ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104828>
If we write hex format (like 68656c6c6f) in parquet_types.q, we can just use unhex() to convert it to binary:
INSERT OVERWRITE TABLE parquet_types
SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, unhex(cbinary), m1, l1, st1 FROM parquet_types_staging;
ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104830>
Instead of "select * from parquet_types"... since cbinary column may have unprintable characters, you can pass it through hex() to make it legible:
SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, hex(cbinary), m1, l1, st1 FROM parquet_types;
ql/src/test/queries/clientpositive/parquet_types.q
<https://reviews.apache.org/r/28147/#comment104829>
No need to unhex here...
Can just be:
SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), cbinary FROM parquet_types
Or you can pass it through hex() if original data has unprintable characters:
SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), hex(cbinary) FROM parquet_types
- Mohit Sabharwal
On Nov. 21, 2014, 8:53 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
>
> (Updated Nov. 21, 2014, 8:53 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
>
>
> Diffs
> -----
>
> data/files/parquet_types.txt d342062
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
> ql/src/test/queries/clientpositive/parquet_types.q 22585c3
> ql/src/test/results/clientpositive/parquet_types.q.out 275897c
>
> Diff: https://reviews.apache.org/r/28147/diff/
>
>
> Testing
> -------
>
> related UT and QL tests passed
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62754
-----------------------------------------------------------
Ship it!
Thanks for the changes!
- Mohit Sabharwal
On Nov. 24, 2014, 4:16 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
>
> (Updated Nov. 24, 2014, 4:16 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
>
>
> Diffs
> -----
>
> data/files/parquet_types.txt d342062
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
> ql/src/test/queries/clientpositive/parquet_types.q 22585c3
> ql/src/test/results/clientpositive/parquet_types.q.out 275897c
>
> Diff: https://reviews.apache.org/r/28147/diff/
>
>
> Testing
> -------
>
> related UT and QL tests passed
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by cheng xu <ch...@intel.com>.
> On Nov. 24, 2014, 6:17 p.m., Brock Noland wrote:
> > LGTM! One comment below.
Sorry for my late response. Update the patch by removing unnecessary import statements and simpfying the type convert methods.
- cheng
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62822
-----------------------------------------------------------
On Nov. 24, 2014, 4:16 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
>
> (Updated Nov. 24, 2014, 4:16 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
>
>
> Diffs
> -----
>
> data/files/parquet_types.txt d342062
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
> ql/src/test/queries/clientpositive/parquet_types.q 22585c3
> ql/src/test/results/clientpositive/parquet_types.q.out 275897c
>
> Diff: https://reviews.apache.org/r/28147/diff/
>
>
> Testing
> -------
>
> related UT and QL tests passed
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62822
-----------------------------------------------------------
LGTM! One comment below.
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
<https://reviews.apache.org/r/28147/#comment104950>
What's the purpose of Binary.fromByteArray(binaryBytes).getBytes() in this line? Doesn't it just end up returning the byte array?
- Brock Noland
On Nov. 24, 2014, 4:16 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
>
> (Updated Nov. 24, 2014, 4:16 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
>
>
> Diffs
> -----
>
> data/files/parquet_types.txt d342062
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
> ql/src/test/queries/clientpositive/parquet_types.q 22585c3
> ql/src/test/results/clientpositive/parquet_types.q.out 275897c
>
> Diff: https://reviews.apache.org/r/28147/diff/
>
>
> Testing
> -------
>
> related UT and QL tests passed
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------
(Updated Nov. 28, 2014, 2:19 a.m.)
Review request for hive.
Repository: hive-git
Description
-------
This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test
Diffs (updated)
-----
data/files/parquet_types.txt d342062
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
ql/src/test/queries/clientpositive/parquet_types.q 22585c3
ql/src/test/results/clientpositive/parquet_types.q.out 275897c
Diff: https://reviews.apache.org/r/28147/diff/
Testing
-------
related UT and QL tests passed
Thanks,
cheng xu
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------
(Updated Nov. 24, 2014, 4:16 a.m.)
Review request for hive.
Changes
-------
summary:
1. add some non-ascii data into the test cases
2. regenerate the output
Repository: hive-git
Description
-------
This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test
Diffs (updated)
-----
data/files/parquet_types.txt d342062
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
ql/src/test/queries/clientpositive/parquet_types.q 22585c3
ql/src/test/results/clientpositive/parquet_types.q.out 275897c
Diff: https://reviews.apache.org/r/28147/diff/
Testing
-------
related UT and QL tests passed
Thanks,
cheng xu
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------
(Updated Nov. 21, 2014, 8:53 a.m.)
Review request for hive.
Changes
-------
remove needless cast function from the hql statement
Repository: hive-git
Description
-------
This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test
Diffs (updated)
-----
data/files/parquet_types.txt d342062
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe73
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
ql/src/test/queries/clientpositive/parquet_types.q 22585c3
ql/src/test/results/clientpositive/parquet_types.q.out 275897c
Diff: https://reviews.apache.org/r/28147/diff/
Testing
-------
related UT and QL tests passed
Thanks,
cheng xu
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review62133
-----------------------------------------------------------
Ship it!
Ship It!
- Mohit Sabharwal
On Nov. 19, 2014, 2:01 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
>
> (Updated Nov. 19, 2014, 2:01 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
>
>
> Diffs
> -----
>
> data/files/parquet_types.txt d342062
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
> ql/src/test/queries/clientpositive/parquet_types.q 22585c3
> ql/src/test/results/clientpositive/parquet_types.q.out 275897c
>
> Diff: https://reviews.apache.org/r/28147/diff/
>
>
> Testing
> -------
>
> related UT and QL tests passed
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/
-----------------------------------------------------------
(Updated Nov. 19, 2014, 2:01 a.m.)
Review request for hive.
Changes
-------
accept Mickaël Lacour's comments and add null test for binary type in ParquetHiveSerDe
Repository: hive-git
Description
-------
This patch includes:
1. binary support for ParquetHiveSerde
2. related test cases both in unit and ql test
Diffs (updated)
-----
data/files/parquet_types.txt d342062
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
ql/src/test/queries/clientpositive/parquet_types.q 22585c3
ql/src/test/results/clientpositive/parquet_types.q.out 275897c
Diff: https://reviews.apache.org/r/28147/diff/
Testing
-------
related UT and QL tests passed
Thanks,
cheng xu
Re: Review Request 28147: HIVE-7073:Implement Binary in ParquetSerDe
Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28147/#review61946
-----------------------------------------------------------
LGTM. Mickaël Lacour's suggestion of adding a null value test is a great one.
- Mohit Sabharwal
On Nov. 18, 2014, 1:58 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28147/
> -----------------------------------------------------------
>
> (Updated Nov. 18, 2014, 1:58 a.m.)
>
>
> Review request for hive.
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> This patch includes:
> 1. binary support for ParquetHiveSerde
> 2. related test cases both in unit and ql test
>
>
> Diffs
> -----
>
> data/files/parquet_types.txt d342062
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 472de8f
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java d5aae3b
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java c57dd99
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 8ac7864
> ql/src/test/queries/clientpositive/parquet_types.q 22585c3
> ql/src/test/results/clientpositive/parquet_types.q.out 275897c
>
> Diff: https://reviews.apache.org/r/28147/diff/
>
>
> Testing
> -------
>
> related UT and QL tests passed
>
>
> Thanks,
>
> cheng xu
>
>