You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jothi Padmanabhan (JIRA)" <ji...@apache.org> on 2009/05/15 09:18:45 UTC
[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining
map out compression on intermediate files
[ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------
Status: Patch Available (was: Open)
> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
> Key: HADOOP-5539
> URL: https://issues.apache.org/jira/browse/HADOOP-5539
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.1
> Environment: 0.19.2-dev, r753365
> Reporter: Billy Pearson
> Priority: Blocker
> Fix For: 0.19.2
>
> Attachments: 5539.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed.
> when this happens it outputs files called intermediate.x files these
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added
> {code}
> // added my me
> if (codec != null){
> LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> } else {
> LOG.info("intermediate." + passNo + " used codec: Null");
> }
> // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.