You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Franklin Hu <fr...@fb.com> on 2011/06/17 22:45:46 UTC
Review Request: HIVE-2035 Use block level merge on rcfile if intermediate
merge is needed
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/
-----------------------------------------------------------
Review request for hive.
Summary
-------
For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.
This addresses bug HIVE-2035.
https://issues.apache.org/jira/browse/HIVE-2035
Diffs
-----
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415
trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/935/diff
Testing
-------
Thanks,
Franklin
Re: Review Request: HIVE-2035 Use block level merge on rcfile if
intermediate merge is needed
Posted by Siying Dong <si...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/#review864
-----------------------------------------------------------
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java
<https://reviews.apache.org/r/935/#comment1889>
It doesn't seem to be a RuntimeException
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
<https://reviews.apache.org/r/935/#comment1890>
why not "inputDepth--"?
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
<https://reviews.apache.org/r/935/#comment1891>
should we just throw an exception instead of return a magic null?
trunk/ql/src/test/queries/clientpositive/rcfile_insert.q
<https://reviews.apache.org/r/935/#comment1893>
Will it launch a merge job? If it launches, it seems a bug in Hive that CombineHiveInputFormat doesn't span to multiple partitions when it needs to.
trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q
<https://reviews.apache.org/r/935/#comment1892>
It doesn't seem to launch merge jobs. If it launches. It seems to be a bug.
- Siying
On 2011-06-17 20:45:46, Franklin Hu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/935/
> -----------------------------------------------------------
>
> (Updated 2011-06-17 20:45:46)
>
>
> Review request for hive.
>
>
> Summary
> -------
>
> For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.
>
>
> This addresses bug HIVE-2035.
> https://issues.apache.org/jira/browse/HIVE-2035
>
>
> Diffs
> -----
>
> trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1134415
> trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1134415
> trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_insert.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_insert.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/935/diff
>
>
> Testing
> -------
>
>
> Thanks,
>
> Franklin
>
>
Re: Review Request: HIVE-2035 Use block level merge on rcfile if
intermediate merge is needed
Posted by Franklin Hu <fr...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/
-----------------------------------------------------------
(Updated 2011-06-23 18:56:14.903379)
Review request for hive.
Changes
-------
Add max and min split size configs to unit tests
Summary
-------
For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.
This addresses bug HIVE-2035.
https://issues.apache.org/jira/browse/HIVE-2035
Diffs (updated)
-----
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1139014
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1139014
trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/935/diff
Testing
-------
Thanks,
Franklin
Re: Review Request: HIVE-2035 Use block level merge on rcfile if
intermediate merge is needed
Posted by Siying Dong <si...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/#review875
-----------------------------------------------------------
Can you make sure that in the test cases, the query need the merge step?
- Siying
On 2011-06-20 19:20:53, Franklin Hu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/935/
> -----------------------------------------------------------
>
> (Updated 2011-06-20 19:20:53)
>
>
> Review request for hive.
>
>
> Summary
> -------
>
> For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.
>
>
> This addresses bug HIVE-2035.
> https://issues.apache.org/jira/browse/HIVE-2035
>
>
> Diffs
> -----
>
> trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090
> trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090
> trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
> trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/935/diff
>
>
> Testing
> -------
>
>
> Thanks,
>
> Franklin
>
>
Re: Review Request: HIVE-2035 Use block level merge on rcfile if
intermediate merge is needed
Posted by Franklin Hu <fr...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/935/
-----------------------------------------------------------
(Updated 2011-06-20 19:20:53.263299)
Review request for hive.
Changes
-------
Throw error at compile time for bad rcfile merge input format class rather than at runtime, remove bad test, stylistic fixes
Summary
-------
For a table stored as RCFile, intermediate results are sometimes merged if those files are below a certain threshold. For RCFiles, we can do a block level merge that does not deserialize the blocks and is more efficient. This patch leverages the existing code used to merge for ALTER TABLE ... CONCATENATE.
This addresses bug HIVE-2035.
https://issues.apache.org/jira/browse/HIVE-2035
Diffs (updated)
-----
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 1136090
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1136090
trunk/ql/src/test/queries/clientpositive/rcfile_createas1.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge1.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge2.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge3.q PRE-CREATION
trunk/ql/src/test/queries/clientpositive/rcfile_merge4.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge1.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge2.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge3.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/rcfile_merge4.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/935/diff
Testing
-------
Thanks,
Franklin