You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org> on 2011/08/16 02:25:29 UTC
[jira] [Commented] (HIVE-2374) Make compression used between map
reduce tasks configurable.
[ https://issues.apache.org/jira/browse/HIVE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085452#comment-13085452 ]
jiraposter@reviews.apache.org commented on HIVE-2374:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1516/
-----------------------------------------------------------
Review request for hive and Ning Zhang.
Summary
-------
I added a field to MapredWork and MapredLocalWork which indicates whether it is intermediate or not. By intermediate, I mean that if the query is an insert, there is at least one other map reduce task that is guaranteed to happen before the move. If the query is not an insert, intermediate applies to them all. I determine this by defaulting the flag to true, and setting it to false when the tasks to move the data into a table or file are generated.
If the work for a map reduce task (local or otherwise) is intermediate, then we set the compression to be used on the output of the reduce to some configured value, the default is LZO.
This addresses bug HIVE-2374.
https://issues.apache.org/jira/browse/HIVE-2374
Diffs
-----
trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1157918
trunk/conf/hive-default.xml 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 1157918
trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1157918
trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1157918
trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyIsIntermediateHook.java PRE-CREATION
trunk/ql/src/test/queries/clientpositive/intermediate_compression.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/intermediate_compression.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1516/diff
Testing
-------
I added a test query and hook to verify that the is intermediate flag is set properly in the MapredWork/MapredLocalWork.
I also added a test to TestExecDriver which checks that the correct compression is used on the output of the reduce for each value of the is intermediate flag.
Thanks,
Kevin
> Make compression used between map reduce tasks configurable.
> ------------------------------------------------------------
>
> Key: HIVE-2374
> URL: https://issues.apache.org/jira/browse/HIVE-2374
> Project: Hive
> Issue Type: Improvement
> Reporter: Kevin Wilfong
> Assignee: Kevin Wilfong
> Attachments: HIVE-2374.1.patch.txt
>
>
> We want to allow the compression between map reduce tasks to be configurable, similar to the way it is between the map and reduce jobs is configurable.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira