You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by cheng xu <ch...@intel.com> on 2016/06/15 03:34:53 UTC

Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/
-----------------------------------------------------------

(Updated June 15, 2016, 11:34 a.m.)


Review request for hive and Xuefu Zhang.


Summary (updated)
-----------------

HIVE-13873 Column pruning for nested fields


Bugs: HIVE-13873
    https://issues.apache.org/jira/browse/HIVE-13873


Repository: hive-git


Description
-------

Add group projection support for Parquet and this is the initial patch sharing my thoughts.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30 

Diff: https://reviews.apache.org/r/48716/diff/


Testing
-------

Newly added qtest passed.


Thanks,

cheng xu


Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

Posted by cheng xu <ch...@intel.com>.

> On July 6, 2016, 10:48 p.m., Aihua Xu wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java, line 122
> > <https://reviews.apache.org/r/48716/diff/1/?file=1419370#file1419370line122>
> >
> >     Just try to understand the logic (not too familiar with Parquet). So the underneath parquet already supports "hive.io.file.readgroup.paths" or this is totally within hive? How are the struct data stored in parquet and pruned with the group path in general?

Parquet doesn't support this configuration. We reconstruct the requested schema in Hive side by pruning unneeded columns like other projection does.


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
-----------------------------------------------------------


On June 15, 2016, 11:34 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> -----------------------------------------------------------
> 
> (Updated June 15, 2016, 11:34 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-13873
>     https://issues.apache.org/jira/browse/HIVE-13873
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add group projection support for Parquet and this is the initial patch sharing my thoughts.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30 
> 
> Diff: https://reviews.apache.org/r/48716/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest passed.
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

Posted by Aihua Xu <ax...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/#review140991
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java (line 282)
<https://reviews.apache.org/r/48716/#comment206371>

    Is this only used for your debug purpose?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java (line 305)
<https://reviews.apache.org/r/48716/#comment206372>

    This is to handle a.col case? How about multiple level of nested structure?



ql/src/test/queries/clientpositive/parquet_struct.q (line 4)
<https://reviews.apache.org/r/48716/#comment206373>

    Better to add "explain select..." in the test case.
    
    Also add a multiple level of nested structure case.



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java (line 122)
<https://reviews.apache.org/r/48716/#comment206375>

    Just try to understand the logic (not too familiar with Parquet). So the underneath parquet already supports "hive.io.file.readgroup.paths" or this is totally within hive? How are the struct data stored in parquet and pruned with the group path in general?


- Aihua Xu


On June 15, 2016, 3:34 a.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48716/
> -----------------------------------------------------------
> 
> (Updated June 15, 2016, 3:34 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-13873
>     https://issues.apache.org/jira/browse/HIVE-13873
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Add group projection support for Parquet and this is the initial patch sharing my thoughts.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 23abec3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 24bf506 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cfedf35 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java 74a1a82 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/test/queries/clientpositive/parquet_struct.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_struct.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30 
> 
> Diff: https://reviews.apache.org/r/48716/diff/
> 
> 
> Testing
> -------
> 
> Newly added qtest passed.
> 
> 
> Thanks,
> 
> cheng xu
> 
>


Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/
-----------------------------------------------------------

(Updated July 8, 2016, 9:40 a.m.)


Review request for hive and Xuefu Zhang.


Changes
-------

Remove unnecessary changes


Bugs: HIVE-13873
    https://issues.apache.org/jira/browse/HIVE-13873


Repository: hive-git


Description
-------

Add group projection support for Parquet and this is the initial patch sharing my thoughts.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/test/queries/clientpositive/parquet_nested_field_pruning.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_nested_field_pruning.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30 

Diff: https://reviews.apache.org/r/48716/diff/


Testing
-------

Newly added qtest passed.


Thanks,

cheng xu


Re: Review Request 48716: HIVE-13873 Column pruning for nested fields

Posted by cheng xu <ch...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48716/
-----------------------------------------------------------

(Updated July 8, 2016, 9:35 a.m.)


Review request for hive and Xuefu Zhang.


Changes
-------

Changes include:
1. skip isList cases and add TODO for the next step
2. add more tests
3. address some comments from Aihua


Bugs: HIVE-13873
    https://issues.apache.org/jira/browse/HIVE-13873


Repository: hive-git


Description
-------

Add group projection support for Parquet and this is the initial patch sharing my thoughts.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java dff1815 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 6afe957 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 227a051 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java db923fa 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java a89aa4d 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 3e38cc7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcCtx.java 611a6b7 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java a2a7f00 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
  ql/src/test/queries/clientpositive/parquet_nested_field_pruning.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_nested_field_pruning.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 0c7ac30 

Diff: https://reviews.apache.org/r/48716/diff/


Testing
-------

Newly added qtest passed.


Thanks,

cheng xu