You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Billy Pearson (JIRA)" <ji...@apache.org> on 2009/03/20 07:52:50 UTC

[jira] Created: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

o.a.h.mapred.Merger not maintaining map out compression on intermediate files
-----------------------------------------------------------------------------

                 Key: HADOOP-5539
                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.19.1
         Environment: 0.19.2-dev, r753365 
            Reporter: Billy Pearson
             Fix For: 0.19.2, 0.20.0


hadoop-site.xml :
mapred.compress.map.output = true

map output files are compressed but when the in memory merger closes 
on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 

when this happens it outputs files called intermediate.x files these 
do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
passes the codec but I added some logging and its always null map output compression set true or false.

This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
I thank this is just and oversight of the codec not getting set correctly for the on disk merges.

{code}
2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
{code}

I added 
{code}
          // added my me
	   if (codec != null){
	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
	   } else {
	     LOG.info("intermediate." + passNo + " used codec: Null");
	   }
	   // end added by me
{code}
Just before the creation of the writer o.a.h.mapred.Merger.class line 432
and it outputs the second line above.

I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.

I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------

    Status: Patch Available  (was: Open)

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688106#action_12688106 ] 

Billy Pearson edited comment on HADOOP-5539 at 3/22/09 2:00 AM:
----------------------------------------------------------------

this should fix the problem I had to make a few new constructors. I left the old constructors that these files where using because not sure if any other tasks using these. this patch will apply to 0.19-branch I have not worked any on trunk so might need to try dry-run before applying to trunk. tested on my end and working correctly now with this patch.


      was (Author: viper799):
    this should fix the problem I had to make a few new constructors. I left the old constructors that these files where using because not sure if any other tasks using these. this patch will apply to 0.19.0 I have not worked any on trunk so might need to try dry-run before applying to trunk. tested on my end and working correctly now with this patch.

  
> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708368#action_12708368 ] 

Jothi Padmanabhan commented on HADOOP-5539:
-------------------------------------------

The patch looks good. A few minor points:   

# The new MergeQueue constructor could call the existing constructor and then set the codec later.
{code}
public MergeQueue(Configuration conf, FileSystem fs, 
            List<Segment<K, V>> segments, RawComparator<K> comparator,
            Progressable reporter, boolean sortSegments, CompressionCodec codec) {
          this(conf, fs, segments, comparator, reporter, sortSegments);
          this.codec = codec;
        }
{code}

# For the new merge methods, should we place the Codec argument after the valueClass argument (instead of being the last argument) to maintain consistency with the other method that does take the codec argument?

Would you be able to provide patches for trunk and 20-branch as well?

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HADOOP-5539:
----------------------------------

    Fix Version/s: 0.20.0
                   0.19.2

added back to read map

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------

    Status: Patch Available  (was: Open)

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Stefan Will (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683973#action_12683973 ] 

Stefan Will commented on HADOOP-5539:
-------------------------------------

I'd like to see this fixed as well since one reason I've enabled map output compression is to reduce disk space usage by the mapreduce framework. It appears that currently the map outputs are simply decompressed as soon as they have been downloaded by the reducer.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2, 0.20.0
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson reassigned HADOOP-5539:
-------------------------------------

    Assignee:     (was: Billy Pearson)

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714297#action_12714297 ] 

Billy Pearson commented on HADOOP-5539:
---------------------------------------

No I do not need it my version patch with my original patch for 0.19 but other might sense there is still a lot of older version in production that will update to 0.19 branch now that it has a few minor releases on it.


> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------

    Status: Open  (was: Patch Available)

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-5539:
----------------------------------

    Fix Version/s:     (was: 0.19.2)
                       (was: 0.20.0)
           Status: Patch Available  (was: Open)

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Blocker
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------

    Attachment: hadoop-5539-branch20.patch

Patch for the 20 branch

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688559#action_12688559 ] 

Chris Douglas commented on HADOOP-5539:
---------------------------------------

bq. added back to read map 

It's a blocker; it will be resolved and backported to 0.20 at least. The road map isn't; the PA queue defines the set of patches that can be committed. The fix version is usually set when it's actually resolved, so where it was committed is documented.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Blocker
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HADOOP-5539:
----------------------------------

    Fix Version/s:     (was: 0.20.0)
           Status: Patch Available  (was: Open)

this should fix the problem I had to make a few new constructors. I left the old constructors that these files where using because not sure if any other tasks using these. this patch will apply to 0.19.0 I have not worked any on trunk so might need to try dry-run before applying to trunk. tested on my end and working correctly now with this patch.


> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710881#action_12710881 ] 

Hadoop QA commented on HADOOP-5539:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408232/hadoop-5539.patch
  against trunk revision 776352.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/console

This message is automatically generated.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HADOOP-5539:
----------------------------------

    Attachment: 5539.patch

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718616#action_12718616 ] 

Hudson commented on HADOOP-5539:
--------------------------------

Integrated in Hadoop-trunk #863 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/])
    

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688098#action_12688098 ] 

Billy Pearson commented on HADOOP-5539:
---------------------------------------

I verified that it is doing the same on map task no intermediate.x file from o.a.h.mapred.Merger are getting compressed.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2, 0.20.0
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------

    Attachment: hadoop-5539.patch

Updated the patch to trunk

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-5539:
--------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.19.2)
                   0.20.1
         Assignee: Jothi Padmanabhan
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks Jothi and Billy!

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688544#action_12688544 ] 

Billy Pearson edited comment on HADOOP-5539 at 3/23/09 8:14 PM:
----------------------------------------------------------------

Someone can use my patch as a starting point.

The ReduceTask.java call that is the problem is line 2145
The maptask.java call that is the problem is line 1269

I use streaming without a combiner so that should be looked at also to see if it uses o.a.h.mapred.Merger

the basic problem is the codec is not passed from these function to the merger so its always null the call to 
o.a.h.mapred.Merger should include codec somehow if compression is not used then codec is null 
in both ReduceTask and Maptask.

I thank this is a major bug that effects all MR jobs with disk bandwidth that uses compression.


      was (Author: viper799):
    someone can use my patch as a starting point.

the ReduceTask.java call that is the problem is line 2145
the maptask.java call that is the problem is line 1269

I use streaming without a combiner so that should be looked at also to see if it uses o.a.h.mapred.Merger

  
> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-5539:
--------------------------------------

    Attachment: hadoop-5539-v1.patch

Patch updated to trunk

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Ravi Gummadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710817#action_12710817 ] 

Ravi Gummadi commented on HADOOP-5539:
--------------------------------------

Patch looks good.

This patch clashes with HADOOP-5572. Need to update this patch once HADOOP-5572 gets committed.

HADOOP-5572  changes mergeParts() to call merge() with boolean sortSegments ----- this avoids one new signature of merge() from your patch.


> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714037#action_12714037 ] 

Billy Pearson commented on HADOOP-5539:
---------------------------------------

no commit for 0.19 branch?

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688108#action_12688108 ] 

Hadoop QA commented on HADOOP-5539:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12403378/5539.patch
  against trunk revision 756858.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/118/console

This message is automatically generated.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-5539:
----------------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s:     (was: 0.19.2)
         Assignee: Billy Pearson

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Blocker
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-5539:
----------------------------------

    Fix Version/s: 0.19.2
           Status: Open  (was: Patch Available)

Oh, I see; the patch is for 0.19. My mistake. 

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688107#action_12688107 ] 

Hadoop QA commented on HADOOP-5539:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12403378/5539.patch
  against trunk revision 756858.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/117/console

This message is automatically generated.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Ravi Gummadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711568#action_12711568 ] 

Ravi Gummadi commented on HADOOP-5539:
--------------------------------------

Patch looks good.
+1

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HADOOP-5539:
----------------------------------

    Status: Open  (was: Patch Available)

someone can use my patch as a starting point.

the ReduceTask.java call that is the problem is line 2145
the maptask.java call that is the problem is line 1269

I use streaming without a combiner so that should be looked at also to see if it uses o.a.h.mapred.Merger


> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HADOOP-5539:
----------------------------------

    Status: Open  (was: Patch Available)

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708570#action_12708570 ] 

Billy Pearson commented on HADOOP-5539:
---------------------------------------

I got to many thing going on right now to make a new patch fill free to mod my patch to work the way you want and use it to build a patch for trunk I would like to see this fixed in 0.20.1 if at all possible. this will be the one thing holding me up from upgrading to hbase 0.20 when it becomes ready.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Billy Pearson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Pearson updated HADOOP-5539:
----------------------------------

    Status: Patch Available  (was: Open)

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712400#action_12712400 ] 

Hadoop QA commented on HADOOP-5539:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408652/hadoop-5539-v1.patch
  against trunk revision 777761.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/console

This message is automatically generated.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714001#action_12714001 ] 

Nigel Daley commented on HADOOP-5539:
-------------------------------------

Why no unit test?  Why no javadoc for new methods?

If you tested this manually, what steps did you perform?

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710333#action_12710333 ] 

Jothi Padmanabhan commented on HADOOP-5539:
-------------------------------------------

Could somebody review this patch? Thanks.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714282#action_12714282 ] 

Jothi Padmanabhan commented on HADOOP-5539:
-------------------------------------------

bq. Why no unit test? If you tested this manually, what steps did you perform?

It is pretty difficult to write a unit test for this patch as this patch just enables compression during intermediate merges. The files that are created during the intermediate merges are consumed soon after they are created and the final merged file was compressed even without this patch. I did the same test as Billy had done -- add print statements in the framework code (Merger.java) to verify if compression was turned on during intermediate merges.

bq. Why no javadoc for new methods?

The newly added methods are in Merger, which is a mapred package private class

bq. no commit for 0.19 branch?
Billy, from this comment https://issues.apache.org/jira/browse/HADOOP-5539?focusedCommentId=12708570&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708570, we thought you needed this only for 0.20. If you need it for 0.19 branch as well, I can generate a patch for that too.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.