You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Kevin Wilfong <ke...@fb.com> on 2011/08/16 02:24:28 UTC
Review Request: Make compression used between map reduce tasks configurable.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1516/
-----------------------------------------------------------
Review request for hive and Ning Zhang.
Summary
-------
I added a field to MapredWork and MapredLocalWork which indicates whether it is intermediate or not. By intermediate, I mean that if the query is an insert, there is at least one other map reduce task that is guaranteed to happen before the move. If the query is not an insert, intermediate applies to them all. I determine this by defaulting the flag to true, and setting it to false when the tasks to move the data into a table or file are generated.
If the work for a map reduce task (local or otherwise) is intermediate, then we set the compression to be used on the output of the reduce to some configured value, the default is LZO.
This addresses bug HIVE-2374.
https://issues.apache.org/jira/browse/HIVE-2374
Diffs
-----
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1157918
trunk/conf/hive-default.xml 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1157918
trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1157918
trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsIntermediateHook.java PRE-CREATION
trunk/ql/src/test/queries/clientpositive/intermediate_compression.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/intermediate_compression.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1516/diff
Testing
-------
I added a test query and hook to verify that the is intermediate flag is set properly in the MapredWork/MapredLocalWork.
I also added a test to TestExecDriver which checks that the correct compression is used on the output of the reduce for each value of the is intermediate flag.
Thanks,
Kevin
Re: Review Request: Make compression used between map reduce tasks
configurable.
Posted by Kevin Wilfong <ke...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1516/
-----------------------------------------------------------
(Updated 2011-09-07 01:34:05.392697)
Review request for hive, Yongqiang He and Ning Zhang.
Summary
-------
I added a field to MapredWork and MapredLocalWork which indicates whether it is intermediate or not. By intermediate, I mean that if the query is an insert, there is at least one other map reduce task that is guaranteed to happen before the move. If the query is not an insert, intermediate applies to them all. I determine this by defaulting the flag to true, and setting it to false when the tasks to move the data into a table or file are generated.
If the work for a map reduce task (local or otherwise) is intermediate, then we set the compression to be used on the output of the reduce to some configured value, the default is LZO.
This addresses bug HIVE-2374.
https://issues.apache.org/jira/browse/HIVE-2374
Diffs
-----
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1164667
trunk/conf/hive-default.xml 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1164667
trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1164667
trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsIntermediateHook.java PRE-CREATION
trunk/ql/src/test/queries/clientpositive/intermediate_compression.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/auto_join0.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join10.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join11.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join12.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join13.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join15.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join16.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join18.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join2.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join20.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join21.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join22.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join23.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join24.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join26.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join27.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join28.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join29.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join30.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join31.q.out 1164667
trunk/ql/src/test/results/clientpositive/cluster.q.out 1164667
trunk/ql/src/test/results/clientpositive/ctas.q.out 1164667
trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby1.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby10.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby11.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby1_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby1_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby2_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby3.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby3_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby4.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby5.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby6.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby6_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8_map.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8_noskew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby9.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_auto_self_join.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_bitmap3.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_bitmap_auto.q.out 1164667
trunk/ql/src/test/results/clientpositive/innerjoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/input14_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/input1_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/input25.q.out 1164667
trunk/ql/src/test/results/clientpositive/input26.q.out 1164667
trunk/ql/src/test/results/clientpositive/input39.q.out 1164667
trunk/ql/src/test/results/clientpositive/input3_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/input4_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/insert_into3.q.out 1164667
trunk/ql/src/test/results/clientpositive/intermediate_compression.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/join0.q.out 1164667
trunk/ql/src/test/results/clientpositive/join13.q.out 1164667
trunk/ql/src/test/results/clientpositive/join15.q.out 1164667
trunk/ql/src/test/results/clientpositive/join18.q.out 1164667
trunk/ql/src/test/results/clientpositive/join18_multi_distinct.q.out 1164667
trunk/ql/src/test/results/clientpositive/join19.q.out 1164667
trunk/ql/src/test/results/clientpositive/join2.q.out 1164667
trunk/ql/src/test/results/clientpositive/join20.q.out 1164667
trunk/ql/src/test/results/clientpositive/join21.q.out 1164667
trunk/ql/src/test/results/clientpositive/join22.q.out 1164667
trunk/ql/src/test/results/clientpositive/join23.q.out 1164667
trunk/ql/src/test/results/clientpositive/join29.q.out 1164667
trunk/ql/src/test/results/clientpositive/join30.q.out 1164667
trunk/ql/src/test/results/clientpositive/join31.q.out 1164667
trunk/ql/src/test/results/clientpositive/join32.q.out 1164667
trunk/ql/src/test/results/clientpositive/join33.q.out 1164667
trunk/ql/src/test/results/clientpositive/join35.q.out 1164667
trunk/ql/src/test/results/clientpositive/join38.q.out 1164667
trunk/ql/src/test/results/clientpositive/join40.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_hive_626.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_reorder.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_reorder2.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_reorder3.q.out 1164667
trunk/ql/src/test/results/clientpositive/lateral_view.q.out 1164667
trunk/ql/src/test/results/clientpositive/lineage1.q.out 1164667
trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out 1164667
trunk/ql/src/test/results/clientpositive/mapjoin_distinct.q.out 1164667
trunk/ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/mapjoin_subquery.q.out 1164667
trunk/ql/src/test/results/clientpositive/merge4.q.out 1164667
trunk/ql/src/test/results/clientpositive/multi_insert.q.out 1164667
trunk/ql/src/test/results/clientpositive/multigroupby_singlemr.q.out 1164667
trunk/ql/src/test/results/clientpositive/no_hooks.q.out 1164667
trunk/ql/src/test/results/clientpositive/nullgroup.q.out 1164667
trunk/ql/src/test/results/clientpositive/nullgroup2.q.out 1164667
trunk/ql/src/test/results/clientpositive/nullgroup4.q.out 1164667
trunk/ql/src/test/results/clientpositive/parallel.q.out 1164667
trunk/ql/src/test/results/clientpositive/pcr.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_gby2.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_gby_join.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_join2.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_repeated_alias.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_udf_case.q.out 1164667
trunk/ql/src/test/results/clientpositive/regex_col.q.out 1164667
trunk/ql/src/test/results/clientpositive/sample8.q.out 1164667
trunk/ql/src/test/results/clientpositive/semijoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/skewjoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/stats1.q.out 1164667
trunk/ql/src/test/results/clientpositive/udf_case_column_pruning.q.out 1164667
trunk/ql/src/test/results/clientpositive/udf_explode.q.out 1164667
trunk/ql/src/test/results/clientpositive/udtf_explode.q.out 1164667
trunk/ql/src/test/results/clientpositive/udtf_json_tuple.q.out 1164667
trunk/ql/src/test/results/clientpositive/union10.q.out 1164667
trunk/ql/src/test/results/clientpositive/union11.q.out 1164667
trunk/ql/src/test/results/clientpositive/union12.q.out 1164667
trunk/ql/src/test/results/clientpositive/union14.q.out 1164667
trunk/ql/src/test/results/clientpositive/union15.q.out 1164667
trunk/ql/src/test/results/clientpositive/union17.q.out 1164667
trunk/ql/src/test/results/clientpositive/union18.q.out 1164667
trunk/ql/src/test/results/clientpositive/union19.q.out 1164667
trunk/ql/src/test/results/clientpositive/union20.q.out 1164667
trunk/ql/src/test/results/clientpositive/union22.q.out 1164667
trunk/ql/src/test/results/clientpositive/union3.q.out 1164667
trunk/ql/src/test/results/clientpositive/union4.q.out 1164667
trunk/ql/src/test/results/clientpositive/union5.q.out 1164667
trunk/ql/src/test/results/clientpositive/union6.q.out 1164667
trunk/ql/src/test/results/clientpositive/union7.q.out 1164667
Diff: https://reviews.apache.org/r/1516/diff
Testing
-------
I added a test query and hook to verify that the is intermediate flag is set properly in the MapredWork/MapredLocalWork.
I also added a test to TestExecDriver which checks that the correct compression is used on the output of the reduce for each value of the is intermediate flag.
Thanks,
Kevin
Re: Review Request: Make compression used between map reduce tasks
configurable.
Posted by Kevin Wilfong <ke...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1516/
-----------------------------------------------------------
(Updated 2011-09-02 18:56:14.079373)
Review request for hive and Ning Zhang.
Changes
-------
Made changes as suggested by nzhang.
I made the description of hive.exec.inter.mapred.compression.codec much more detailed, and added a simple example.
I also set hive.exec.compress.intermediate to default to true, but I let hive.exec.inter.mapred.compression.codec default to the Hadoop default value, so that the existing unit tests hit my new code path. Note that my new unit tests check that if hive.exec.inter.mapred.compression.codec is set to something other than the Hadoop default value, it is used as intended.
This change required that I update the output of any tests that are affected by the change to hive.exec.compress.intermediate
Summary
-------
I added a field to MapredWork and MapredLocalWork which indicates whether it is intermediate or not. By intermediate, I mean that if the query is an insert, there is at least one other map reduce task that is guaranteed to happen before the move. If the query is not an insert, intermediate applies to them all. I determine this by defaulting the flag to true, and setting it to false when the tasks to move the data into a table or file are generated.
If the work for a map reduce task (local or otherwise) is intermediate, then we set the compression to be used on the output of the reduce to some configured value, the default is LZO.
This addresses bug HIVE-2374.
https://issues.apache.org/jira/browse/HIVE-2374
Diffs (updated)
-----
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1164667
trunk/conf/hive-default.xml 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 1164667
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1164667
trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1164667
trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsIntermediateHook.java PRE-CREATION
trunk/ql/src/test/queries/clientpositive/intermediate_compression.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/auto_join0.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join10.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join11.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join12.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join13.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join15.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join16.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join18.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join18_multi_distinct.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join2.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join20.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join21.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join22.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join23.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join24.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join26.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join27.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join28.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join29.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join30.q.out 1164667
trunk/ql/src/test/results/clientpositive/auto_join31.q.out 1164667
trunk/ql/src/test/results/clientpositive/cluster.q.out 1164667
trunk/ql/src/test/results/clientpositive/ctas.q.out 1164667
trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby1.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby10.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby11.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby1_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby1_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby2_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby3.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby3_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby4.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby5.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby6.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby6_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8_map.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8_map_skew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby8_noskew.q.out 1164667
trunk/ql/src/test/results/clientpositive/groupby9.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_auto_self_join.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_bitmap3.q.out 1164667
trunk/ql/src/test/results/clientpositive/index_bitmap_auto.q.out 1164667
trunk/ql/src/test/results/clientpositive/innerjoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/input14_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/input1_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/input25.q.out 1164667
trunk/ql/src/test/results/clientpositive/input26.q.out 1164667
trunk/ql/src/test/results/clientpositive/input39.q.out 1164667
trunk/ql/src/test/results/clientpositive/input3_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/input4_limit.q.out 1164667
trunk/ql/src/test/results/clientpositive/insert_into3.q.out 1164667
trunk/ql/src/test/results/clientpositive/intermediate_compression.q.out PRE-CREATION
trunk/ql/src/test/results/clientpositive/join0.q.out 1164667
trunk/ql/src/test/results/clientpositive/join13.q.out 1164667
trunk/ql/src/test/results/clientpositive/join15.q.out 1164667
trunk/ql/src/test/results/clientpositive/join18.q.out 1164667
trunk/ql/src/test/results/clientpositive/join18_multi_distinct.q.out 1164667
trunk/ql/src/test/results/clientpositive/join19.q.out 1164667
trunk/ql/src/test/results/clientpositive/join2.q.out 1164667
trunk/ql/src/test/results/clientpositive/join20.q.out 1164667
trunk/ql/src/test/results/clientpositive/join21.q.out 1164667
trunk/ql/src/test/results/clientpositive/join22.q.out 1164667
trunk/ql/src/test/results/clientpositive/join23.q.out 1164667
trunk/ql/src/test/results/clientpositive/join29.q.out 1164667
trunk/ql/src/test/results/clientpositive/join30.q.out 1164667
trunk/ql/src/test/results/clientpositive/join31.q.out 1164667
trunk/ql/src/test/results/clientpositive/join32.q.out 1164667
trunk/ql/src/test/results/clientpositive/join33.q.out 1164667
trunk/ql/src/test/results/clientpositive/join35.q.out 1164667
trunk/ql/src/test/results/clientpositive/join38.q.out 1164667
trunk/ql/src/test/results/clientpositive/join40.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_hive_626.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_reorder.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_reorder2.q.out 1164667
trunk/ql/src/test/results/clientpositive/join_reorder3.q.out 1164667
trunk/ql/src/test/results/clientpositive/lateral_view.q.out 1164667
trunk/ql/src/test/results/clientpositive/lineage1.q.out 1164667
trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out 1164667
trunk/ql/src/test/results/clientpositive/mapjoin_distinct.q.out 1164667
trunk/ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/mapjoin_subquery.q.out 1164667
trunk/ql/src/test/results/clientpositive/merge4.q.out 1164667
trunk/ql/src/test/results/clientpositive/multi_insert.q.out 1164667
trunk/ql/src/test/results/clientpositive/multigroupby_singlemr.q.out 1164667
trunk/ql/src/test/results/clientpositive/no_hooks.q.out 1164667
trunk/ql/src/test/results/clientpositive/nullgroup.q.out 1164667
trunk/ql/src/test/results/clientpositive/nullgroup2.q.out 1164667
trunk/ql/src/test/results/clientpositive/nullgroup4.q.out 1164667
trunk/ql/src/test/results/clientpositive/parallel.q.out 1164667
trunk/ql/src/test/results/clientpositive/pcr.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_clusterby.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_gby2.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_gby_join.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_join2.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_repeated_alias.q.out 1164667
trunk/ql/src/test/results/clientpositive/ppd_udf_case.q.out 1164667
trunk/ql/src/test/results/clientpositive/regex_col.q.out 1164667
trunk/ql/src/test/results/clientpositive/sample8.q.out 1164667
trunk/ql/src/test/results/clientpositive/semijoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/skewjoin.q.out 1164667
trunk/ql/src/test/results/clientpositive/stats1.q.out 1164667
trunk/ql/src/test/results/clientpositive/udf_case_column_pruning.q.out 1164667
trunk/ql/src/test/results/clientpositive/udf_explode.q.out 1164667
trunk/ql/src/test/results/clientpositive/udtf_explode.q.out 1164667
trunk/ql/src/test/results/clientpositive/udtf_json_tuple.q.out 1164667
trunk/ql/src/test/results/clientpositive/union10.q.out 1164667
trunk/ql/src/test/results/clientpositive/union11.q.out 1164667
trunk/ql/src/test/results/clientpositive/union12.q.out 1164667
trunk/ql/src/test/results/clientpositive/union14.q.out 1164667
trunk/ql/src/test/results/clientpositive/union15.q.out 1164667
trunk/ql/src/test/results/clientpositive/union17.q.out 1164667
trunk/ql/src/test/results/clientpositive/union18.q.out 1164667
trunk/ql/src/test/results/clientpositive/union19.q.out 1164667
trunk/ql/src/test/results/clientpositive/union20.q.out 1164667
trunk/ql/src/test/results/clientpositive/union22.q.out 1164667
trunk/ql/src/test/results/clientpositive/union3.q.out 1164667
trunk/ql/src/test/results/clientpositive/union4.q.out 1164667
trunk/ql/src/test/results/clientpositive/union5.q.out 1164667
trunk/ql/src/test/results/clientpositive/union6.q.out 1164667
trunk/ql/src/test/results/clientpositive/union7.q.out 1164667
Diff: https://reviews.apache.org/r/1516/diff
Testing
-------
I added a test query and hook to verify that the is intermediate flag is set properly in the MapredWork/MapredLocalWork.
I also added a test to TestExecDriver which checks that the correct compression is used on the output of the reduce for each value of the is intermediate flag.
Thanks,
Kevin
Re: Review Request: Make compression used between map reduce tasks
configurable.
Posted by Ning Zhang <nz...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1516/#review1658
-----------------------------------------------------------
Still looking at the tests. Here are my partial comments.
trunk/conf/hive-default.xml
<https://reviews.apache.org/r/1516/#comment3701>
There is another parameter hive.exec.compress.intermediate which controls whether to compress the intermediate data between MR jobs. Can you check if it's turned on by default? I think we should turned that on and do this change so that the unit tests are actually covering your new code path.
trunk/conf/hive-default.xml
<https://reviews.apache.org/r/1516/#comment3700>
Please be more specific (an example would help) about when this codec is going to be used.
- Ning
On 2011-08-16 00:24:28, Kevin Wilfong wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1516/
> -----------------------------------------------------------
>
> (Updated 2011-08-16 00:24:28)
>
>
> Review request for hive and Ning Zhang.
>
>
> Summary
> -------
>
> I added a field to MapredWork and MapredLocalWork which indicates whether it is intermediate or not. By intermediate, I mean that if the query is an insert, there is at least one other map reduce task that is guaranteed to happen before the move. If the query is not an insert, intermediate applies to them all. I determine this by defaulting the flag to true, and setting it to false when the tasks to move the data into a table or file are generated.
>
> If the work for a map reduce task (local or otherwise) is intermediate, then we set the compression to be used on the output of the reduce to some configured value, the default is LZO.
>
>
> This addresses bug HIVE-2374.
> https://issues.apache.org/jira/browse/HIVE-2374
>
>
> Diffs
> -----
>
> trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1157918
> trunk/conf/hive-default.xml 1157918
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1157918
> trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 1157918
> trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1157918
> trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 1157918
> trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1157918
> trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1157918
> trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsIntermediateHook.java PRE-CREATION
> trunk/ql/src/test/queries/clientpositive/intermediate_compression.q PRE-CREATION
> trunk/ql/src/test/results/clientpositive/intermediate_compression.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/1516/diff
>
>
> Testing
> -------
>
> I added a test query and hook to verify that the is intermediate flag is set properly in the MapredWork/MapredLocalWork.
>
> I also added a test to TestExecDriver which checks that the correct compression is used on the output of the reduce for each value of the is intermediate flag.
>
>
> Thanks,
>
> Kevin
>
>