You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Aihua Xu <ax...@cloudera.com> on 2016/07/06 14:48:10 UTC

Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java (line 282)
<https://reviews.apache.org/r/48716/#comment206371>

    Is this only used for your debug purpose?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java (line 305)
<https://reviews.apache.org/r/48716/#comment206372>

    This is to handle a.col case? How about multiple level of nested structure?



ql/src/test/queries/clientpositive/parquet_struct.q (line 4)
<https://reviews.apache.org/r/48716/#comment206373>

    Better to add "explain select..." in the test case.
    
    Also add a multiple level of nested structure case.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java (line 122)
<https://reviews.apache.org/r/48716/#comment206375>

    Just try to understand the logic (not too familiar with Parquet). So the underneath parquet already supports "hive.io.file.readgroup.paths" or this is totally within hive? How are the struct data stored in parquet and pruned with the group path in general?


- Aihua Xu


On June 15, 2016, 3:34 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> -----------------------------------------------------------
> 
> (Updated June 15, 2016, 3:34 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-13873
>     https://issues.apache.org/jira/browse/HIVE-13873
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add group projection support for Parquet and this is the initial patch sharing my thoughts.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30 
> 
> Diff: https://reviews.apache.org/r/48716/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest passed.
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

Posted by cheng xu <ch...@intel.com>.

> On July 6, 2016, 10:48 p.m., Aihua Xu wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java, line 122
> > <https://reviews.apache.org/r/48716/diff/1/?file=1419370#file1419370line122>
> >
> >     Just try to understand the logic (not too familiar with Parquet). So the underneath parquet already supports "hive.io.file.readgroup.paths" or this is totally within hive? How are the struct data stored in parquet and pruned with the group path in general?

Parquet doesn't support this configuration. We reconstruct the requested schema in Hive side by pruning unneeded columns like other projection does.


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
-----------------------------------------------------------


On June 15, 2016, 11:34 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> -----------------------------------------------------------
> 
> (Updated June 15, 2016, 11:34 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-13873
>     https://issues.apache.org/jira/browse/HIVE-13873
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add group projection support for Parquet and this is the initial patch sharing my thoughts.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30 
> 
> Diff: https://reviews.apache.org/r/48716/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest passed.
> 
> 
> Thanks,
> 
> cheng xu
> 
>