You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2008/09/03 22:53:44 UTC

[jira] Created: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Separate spill thresholds for serialization/accounting in MapTask
-----------------------------------------------------------------

                 Key: HADOOP-4063
                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Chris Douglas
            Priority: Minor
             Fix For: 0.19.0
         Attachments: 4063-0.patch



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4063:
----------------------------------

    Resolution: Won't Fix
        Status: Resolved  (was: Patch Available)

In explaining how to configure MapTask to a few users, many asked why the spill thresholds weren't separately configurable. It's easy enough to accommodate the request, but there aren't many good use cases for it.

Auto-tuning these parameters is awkward, but we'll probably get there eventually.

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628668#action_12628668 ] 

Doug Cutting commented on HADOOP-4063:
--------------------------------------

It would certainly be better if folks never had to worry about such parameters as this, if we could correctly optimize things automatically rather than adding more finicky configuration options for folks to set to strange values that cause bizzare failures.

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628524#action_12628524 ] 

Hadoop QA commented on HADOOP-4063:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389457/4063-1.patch
  against trunk revision 692271.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3176/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3176/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3176/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3176/console

This message is automatically generated.

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4063:
----------------------------------

    Attachment: 4063-0.patch

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628547#action_12628547 ] 

Owen O'Malley commented on HADOOP-4063:
---------------------------------------

I don't understand the motivation for this patch. It is an incompatible change that I wouldn't expect anyone to need. What is the use case?

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4063:
----------------------------------

    Attachment: 4063-1.patch

Forgot to update the test case

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4063:
----------------------------------

    Status: Open  (was: Patch Available)

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4063:
----------------------------------

    Description: In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4063:
----------------------------------

    Status: Patch Available  (was: Open)

Submitting to hudson

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4063:
----------------------------------

    Assignee: Chris Douglas
      Status: Patch Available  (was: Open)

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4063) Separate spill thresholds for serialization/accounting in MapTask

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628583#action_12628583 ] 

Hadoop QA commented on HADOOP-4063:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389457/4063-1.patch
  against trunk revision 692335.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3183/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3183/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3183/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3183/console

This message is automatically generated.

> Separate spill thresholds for serialization/accounting in MapTask
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4063
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4063
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 4063-0.patch, 4063-1.patch
>
>
> In MapTask, there is a single parameter controlling the threshold for starting a spill thread concurrently with collection. However, some users may want to set different thresholds for the serialization buffer (holding record bytes) and the accounting buffer (holding record metadata).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.