You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by cheng xu <ch...@intel.com> on 2016/06/15 03:34:53 UTC
Re: Review Request 48716: HIVE-13873 Column pruning for nested fields
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/
-----------------------------------------------------------
(Updated June 15, 2016, 11:34 a.m.)
Review request for hive and Xuefu Zhang.
Summary (updated)
-----------------
HIVE-13873 Column pruning for nested fields
Bugs: HIVE-13873
https://issues.apache.org/jira/browse/HIVE-13873
Repository: hive-git
Description
-------
Add group projection support for Parquet and this is the initial patch sharing my thoughts.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3
ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00
ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d
ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION
ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION
serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30
Diff: https://reviews.apache.org/r/48716/diff/
Testing
-------
Newly added qtest passed.
Thanks,
cheng xu
Re: Review Request 48716: HIVE-13873 Column pruning for nested fields
Posted by cheng xu <ch...@intel.com>.
> On July 6, 2016, 10:48 p.m., Aihua Xu wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java, line 122
> > <https://reviews.apache.org/r/48716/diff/1/?file=1419370#file1419370line122>
> >
> > Just try to understand the logic (not too familiar with Parquet). So the underneath parquet already supports "hive.io.file.readgroup.paths" or this is totally within hive? How are the struct data stored in parquet and pruned with the group path in general?
Parquet doesn't support this configuration. We reconstruct the requested schema in Hive side by pruning unneeded columns like other projection does.
- cheng
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
-----------------------------------------------------------
On June 15, 2016, 11:34 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> -----------------------------------------------------------
>
> (Updated June 15, 2016, 11:34 a.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-13873
> https://issues.apache.org/jira/browse/HIVE-13873
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add group projection support for Parquet and this is the initial patch sharing my thoughts.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815
> ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3
> ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957
> ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506
> ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00
> ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d
> ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION
> ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION
> serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30
>
> Diff: https://reviews.apache.org/r/48716/diff/
>
>
> Testing
> -------
>
> Newly added qtest passed.
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 48716: HIVE-13873 Column pruning for nested fields
Posted by Aihua Xu <ax...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java (line 282)
<https://reviews.apache.org/r/48716/#comment206371>
Is this only used for your debug purpose?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java (line 305)
<https://reviews.apache.org/r/48716/#comment206372>
This is to handle a.col case? How about multiple level of nested structure?
ql/src/test/queries/clientpositive/parquet_struct.q (line 4)
<https://reviews.apache.org/r/48716/#comment206373>
Better to add "explain select..." in the test case.
Also add a multiple level of nested structure case.
serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java (line 122)
<https://reviews.apache.org/r/48716/#comment206375>
Just try to understand the logic (not too familiar with Parquet). So the underneath parquet already supports "hive.io.file.readgroup.paths" or this is totally within hive? How are the struct data stored in parquet and pruned with the group path in general?
- Aihua Xu
On June 15, 2016, 3:34 a.m., cheng xu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> -----------------------------------------------------------
>
> (Updated June 15, 2016, 3:34 a.m.)
>
>
> Review request for hive and Xuefu Zhang.
>
>
> Bugs: HIVE-13873
> https://issues.apache.org/jira/browse/HIVE-13873
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Add group projection support for Parquet and this is the initial patch sharing my thoughts.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815
> ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3
> ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957
> ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506
> ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00
> ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d
> ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION
> ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION
> serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30
>
> Diff: https://reviews.apache.org/r/48716/diff/
>
>
> Testing
> -------
>
> Newly added qtest passed.
>
>
> Thanks,
>
> cheng xu
>
>
Re: Review Request 48716: HIVE-13873 Column pruning for nested fields
Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/
-----------------------------------------------------------
(Updated July 8, 2016, 9:40 a.m.)
Review request for hive and Xuefu Zhang.
Changes
-------
Remove unnecessary changes
Bugs: HIVE-13873
https://issues.apache.org/jira/browse/HIVE-13873
Repository: hive-git
Description
-------
Add group projection support for Parquet and this is the initial patch sharing my thoughts.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67
ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00
ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d
ql/src/test/queries/clientpositive/parquet_nested_field_pruning.q PRE-CREATION
ql/src/test/results/clientpositive/parquet_nested_field_pruning.q.out PRE-CREATION
serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30
Diff: https://reviews.apache.org/r/48716/diff/
Testing
-------
Newly added qtest passed.
Thanks,
cheng xu
Re: Review Request 48716: HIVE-13873 Column pruning for nested fields
Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/
-----------------------------------------------------------
(Updated July 8, 2016, 9:35 a.m.)
Review request for hive and Xuefu Zhang.
Changes
-------
Changes include:
1. skip isList cases and add TODO for the next step
2. add more tests
3. address some comments from Aihua
Bugs: HIVE-13873
https://issues.apache.org/jira/browse/HIVE-13873
Repository: hive-git
Description
-------
Add group projection support for Parquet and this is the initial patch sharing my thoughts.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67
ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957
ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00
ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d
ql/src/test/queries/clientpositive/parquet_nested_field_pruning.q PRE-CREATION
ql/src/test/results/clientpositive/parquet_nested_field_pruning.q.out PRE-CREATION
serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30
Diff: https://reviews.apache.org/r/48716/diff/
Testing
-------
Newly added qtest passed.
Thanks,
cheng xu