You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2007/04/03 01:53:32 UTC

[jira] Created: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Map/reduce job gets OutOfMemoryException when set map out to be compressed
--------------------------------------------------------------------------

                 Key: HADOOP-1193
                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.2
            Reporter: Hairong Kuang
         Assigned To: Arun C Murthy
             Fix For: 0.13.0


One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Attachment: HADOOP-1193_3_20070611.patch

Thanks for the review Devaraj, patched-anew incorporating the comments...

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491475 ] 

Hairong Kuang commented on HADOOP-1193:
---------------------------------------

More details about the failed job:

1. It uses record-level compression
2. mapred.child.java.opts is set to be the default value: --Xmx512m
3. For the mapout, each key is a text and very small, but each value is a jute record, with an average size of approximate 25K. Some might be as big as mega bytes.

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486182 ] 

Hairong Kuang commented on HADOOP-1193:
---------------------------------------

Forgot to mention that the cluster was run with the natvie compression lib.

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Status: Open  (was: Patch Available)

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Marco Nicosia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Nicosia updated HADOOP-1193:
----------------------------------

    Priority: Blocker  (was: Major)

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch, HADOOP-1193_4_20070614.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Attachment: HADOOP-1193_2_20070524.patch

Here is an updated version of the patch with the changes I made to BigMapOutput to help test it (basically made it extend ToolBase and added an option to create the large map input too)...

I have tested this with large map inputs (>2G) and seems to hold up well i.e. the codec pool ensure we create only 1 compressor and very small no. of decompressors (less than 10) even for extremely large map inputs (>2G).

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>         Assigned To: Arun C Murthy
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504876 ] 

Hadoop QA commented on HADOOP-1193:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12359765/HADOOP-1193_4_20070614.patch applied and successfully tested against trunk revision r547159.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/284/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/284/console

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch, HADOOP-1193_4_20070614.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1193:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Arun!

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch, HADOOP-1193_4_20070614.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Attachment: HADOOP-1193_4_20070614.patch

Updated patch to reflect changes to trunk...

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch, HADOOP-1193_4_20070614.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498994 ] 

Devaraj Das commented on HADOOP-1193:
-------------------------------------

One comment - it would be nice to have the 'tmpReader' logic be triggered when such a flag is passed in the constructor of Reader. Apart from that there are some whitespace changes which can be removed.



> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>         Assigned To: Arun C Murthy
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Status: Patch Available  (was: Open)

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch, HADOOP-1193_4_20070614.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506043 ] 

Hadoop QA commented on HADOOP-1193:
-----------------------------------

Integrated in Hadoop-Nightly #127 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/127/])

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch, HADOOP-1193_4_20070614.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Attachment: HADOOP-1193_1_20070517.patch

Here is a patch while I continue further testing... Hairong could you try to see if it works for you? Thanks!

Basically I went ahead and implemented a 'codec pool' to reuse the direct-buffer based codecs so as to not create too many of them... 

Results while trying to sort 1Million records via TestSequenceFile with RECORD compression:

                                     trunk           H-1193
Compressors:          1382                  3
Decompressors:      1520                 12
-----------------------------------------------------
Total:                            2902                 15

Results are even more dramatic for BLOCK compression (we need 4 codecs per Reader with BLOCK compression for key, keyLen, val & valLen) ... in fact I have gone ahead and bumped up the default direct buffer size for zlib to 64K from 1K which should lead to improved performance too, on the back of this patch.

Appreciate any review/feedback.

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>         Assigned To: Arun C Murthy
>         Attachments: HADOOP-1193_1_20070517.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1193:
----------------------------------

    Fix Version/s: 0.14.0
           Status: Patch Available  (was: Open)

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1193) Map/reduce job gets OutOfMemoryException when set map out to be compressed

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503875 ] 

Hadoop QA commented on HADOOP-1193:
-----------------------------------

-1, build or testing failed

2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12359395/HADOOP-1193_3_20070611.patch against trunk revision r546310.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/269/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/269/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-1193
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1193
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Hairong Kuang
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1193_1_20070517.patch, HADOOP-1193_2_20070524.patch, HADOOP-1193_3_20070611.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map out to be compressed. But it worked fine with release 0.10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.