You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Franklin Hu <fr...@fb.com> on 2011/06/17 22:45:46 UTC

Review Request: HIVE-2035 Use block level merge on rcfile if intermediate merge is needed

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/
-----------------------------------------------------------

Review request for hive.


Summary
-------

For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.


This addresses bug HIVE-2035.
    https://issues.apache.org/jira/browse/HIVE-2035


Diffs
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415 
  trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/935/diff


Testing
-------


Thanks,

Franklin


Re: Review Request: HIVE-2035 Use block level merge on rcfile if intermediate merge is needed

Posted by Siying Dong <si...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/#review864
-----------------------------------------------------------



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java
<https://reviews.apache.org/r/935/#comment1889>

    It doesn't seem to be a RuntimeException



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
<https://reviews.apache.org/r/935/#comment1890>

    why not "inputDepth--"?



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
<https://reviews.apache.org/r/935/#comment1891>

    should we just throw an exception instead of return a magic null?



trunk/ql/src/test/queries/clientpositive/rcfile_insert.q
<https://reviews.apache.org/r/935/#comment1893>

    Will it launch a merge job? If it launches, it seems a bug in Hive that CombineHiveInputFormat doesn't span to multiple partitions when it needs to.



trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q
<https://reviews.apache.org/r/935/#comment1892>

    It doesn't seem to launch merge jobs. If it launches. It seems to be a bug.


- Siying


On 2011-06-17 20:45:46, Franklin Hu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/935/
> -----------------------------------------------------------
> 
> (Updated 2011-06-17 20:45:46)
> 
> 
> Review request for hive.
> 
> 
> Summary
> -------
> 
> For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.
> 
> 
> This addresses bug HIVE-2035.
>     https://issues.apache.org/jira/browse/HIVE-2035
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415 
>   trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/935/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Franklin
> 
>


Re: Review Request: HIVE-2035 Use block level merge on rcfile if intermediate merge is needed

Posted by Franklin Hu <fr...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/
-----------------------------------------------------------

(Updated 2011-06-23 18:56:14.903379)


Review request for hive.


Changes
-------

Add max and min split size configs to unit tests


Summary
-------

For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.


This addresses bug HIVE-2035.
    https://issues.apache.org/jira/browse/HIVE-2035


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1139014 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1139014 
  trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/935/diff


Testing
-------


Thanks,

Franklin


Re: Review Request: HIVE-2035 Use block level merge on rcfile if intermediate merge is needed

Posted by Siying Dong <si...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/#review875
-----------------------------------------------------------


Can you make sure that in the test cases, the query need the merge step?

- Siying


On 2011-06-20 19:20:53, Franklin Hu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/935/
> -----------------------------------------------------------
> 
> (Updated 2011-06-20 19:20:53)
> 
> 
> Review request for hive.
> 
> 
> Summary
> -------
> 
> For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.
> 
> 
> This addresses bug HIVE-2035.
>     https://issues.apache.org/jira/browse/HIVE-2035
> 
> 
> Diffs
> -----
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090 
>   trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION 
>   trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/935/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Franklin
> 
>


Re: Review Request: HIVE-2035 Use block level merge on rcfile if intermediate merge is needed

Posted by Franklin Hu <fr...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/
-----------------------------------------------------------

(Updated 2011-06-20 19:20:53.263299)


Review request for hive.


Changes
-------

Throw error at compile time for bad rcfile merge input format class rather than at runtime, remove bad test, stylistic fixes


Summary
-------

For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.


This addresses bug HIVE-2035.
    https://issues.apache.org/jira/browse/HIVE-2035


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090 
  trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/935/diff


Testing
-------


Thanks,

Franklin