You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2012/05/05 01:47:48 UTC

[jira] [Created] (MAPREDUCE-4229) Intern counter names in the JT

Todd Lipcon created MAPREDUCE-4229:
--------------------------------------

             Summary: Intern counter names in the JT
                 Key: MAPREDUCE-4229
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: jobtracker
    Affects Versions: 1.0.2
            Reporter: Todd Lipcon


In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4229) Counter names' memory usage can be decreased by interning

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483179#comment-13483179 ] 

Hudson commented on MAPREDUCE-4229:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #414 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/414/])
    Updating credits for MAPREDUCE-4229. (Revision 1401487)
svn merge -c 1401483 FIXES: MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401485)

     Result = SUCCESS
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401487
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401485
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestStringInterner.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/CountersStrings.java

                
> Counter names' memory usage can be decreased by interning
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Miomir Boljanovic
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482719#comment-13482719 ] 

Hudson commented on MAPREDUCE-4229:
-----------------------------------

Integrated in Hadoop-trunk-Commit #2918 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2918/])
    Updating credits for MAPREDUCE-4229. (Revision 1401493)

     Result = SUCCESS
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401493
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269428#comment-13269428 ] 

Harsh J commented on MAPREDUCE-4229:
------------------------------------

Sounds similar in approach to HDFS-1110, which was done for filenames on DFS.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Daryn Sharp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482494#comment-13482494 ] 

Daryn Sharp commented on MAPREDUCE-4229:
----------------------------------------

I'd suggest maybe using a hard intern for some of the well-defined {{TaskAttemptImpl}} fields, but that could be a future jira.

Only nit is "substirng" is misspelled in many places.  Otherwise, +1.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269754#comment-13269754 ] 

Steve Loughran commented on MAPREDUCE-4229:
-------------------------------------------

If it doesn't go near PermGen then it would be useful & not another source of pain, which makes it more appealing to me
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469404#comment-13469404 ] 

Miomir Boljanovic commented on MAPREDUCE-4229:
----------------------------------------------

Hi Robert, thanks for the feedback. I will implement another patch (against branch-0.23) and try to address the issues you brought up.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462259#comment-13462259 ] 

Miomir Boljanovic commented on MAPREDUCE-4229:
----------------------------------------------

Patch involves memory saving achieved with the use of Guava's Interner implementation, 
including tests to verify only one copy of each distinct counter name string is stored.
Please review.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Counter names' memory usage can be decreased by interning

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483195#comment-13483195 ] 

Hudson commented on MAPREDUCE-4229:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #1205 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1205/])
    Updating credits for MAPREDUCE-4229. (Revision 1401493)
MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401473)
MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401467)

     Result = FAILURE
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401493
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401473
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestStringInterner.java

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401467
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/CountersStrings.java

                
> Counter names' memory usage can be decreased by interning
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Miomir Boljanovic
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417751#comment-13417751 ] 

Miomir Boljanovic commented on MAPREDUCE-4229:
----------------------------------------------

What is current status of this issue? 
If nobody else is already working on it, I would be very willing to start.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482467#comment-13482467 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

The patch looks good to me and I am +1 for it, but because I wrote some of it myself it would be nice if someone else could take a look at it too.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482525#comment-13482525 ] 

Hadoop QA commented on MAPREDUCE-4229:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12550502/mr-4229.txt
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2962//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2962//console

This message is automatically generated.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479420#comment-13479420 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

I rand some benchmarks looking at the Job History server using a jhist file for a job that had 9416 maps and 500 reducers.  I then used a combination of YourKit and jhat to look at the heap savings.

For Jhat I did the OQL {noformat}select sum(map(heap.objects("java.lang.String"),"sizeof(it)")){noformat} to get the size of all of the strings currently reachable on the heap.

I saw that nothing changed in between the base and the first patch.  Both of them had 22MB of strings in the heap.  Looking at the code that was changed to do interning, the only code that uses it was rumen.  It is still a good change, but it did not have the impact I was looking for.  So I implemented the patch I just attached which adds in interning of Strings that are parsed out of the jhist file.  This reduced the 22MB of strings to 3MB of strings.

I want to do something similar for the AM, but it is more difficult to look at, and I don't think I will have time in the near future. So if someone else could review this we can check it in and file a follow up JIRA for looking at the AM. 
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Daryn Sharp (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daryn Sharp updated MAPREDUCE-4229:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.5
                   2.0.3-alpha
                   3.0.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

I've committed to trunk, branch-2, and branch-23.  Thanks Miomir and Bobby!
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4229:
-------------------------------------------

    Attachment: mr-4229.txt

substring fixed
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Daryn Sharp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482667#comment-13482667 ] 

Daryn Sharp commented on MAPREDUCE-4229:
----------------------------------------

+1 Will commit shortly.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Miomir Boljanovic updated MAPREDUCE-4229:
-----------------------------------------

    Status: Patch Available  (was: Open)
    
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 2.0.2-alpha, 1.0.2, 3.0.0
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417795#comment-13417795 ] 

Todd Lipcon commented on MAPREDUCE-4229:
----------------------------------------

Hey Miomir. I dont think anyone's working on it. Go for it! I'd suggest looking at Guava's [Interner|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Interners.html#newWeakInterner()] implementation
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478314#comment-13478314 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

Sorry it has taken me so long to respond. I have been a bit swamped lately.  The new patch looks really good.  It is simple and looks like it could help a lot with the memory usage.  Do you have any actual heap comparisons that you can show us?  I just have a difficult time checking in a "performance" fix without some test, manual or otherwise to show the impact it is having and if there is still more that could be done in a follow up JIRA.  I know that YourKit profiler has some nice Heap Dump analysis to look for duplicate strings. If you have some numbers ready that would be great otherwise I will try and find some time this week to see if I can come up with anything.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4229:
-------------------------------------------

     Target Version/s: 0.23.5  (was: 1.1.0)
    Affects Version/s: 2.0.2-alpha
                       3.0.0
               Status: Open  (was: Patch Available)

Canceling patch until my concerns are addressed, and updating target/affects fields appropriately.  I targeted this for 0.23.5 because I would love to see this on branch-0.23.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Counter names' memory usage can be decreased by interning

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4229:
-----------------------------------------------

    Summary: Counter names' memory usage can be decreased by interning  (was: Intern counter names in the JT)

Changing description to reflect the problem and the corresponding code changes.
                
> Counter names' memory usage can be decreased by interning
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421842#comment-13421842 ] 

Miomir Boljanovic commented on MAPREDUCE-4229:
----------------------------------------------

Hi Todd, Thanks for suggesting Guava's Interner

With the Interner, we can canonicalize counter names without filling up PermGen store, thus we don't need to use GSet<String>.

Judging from the issue description, there are a few counters where string instances are created every time counter name or display name is queried. But so far, I managed to identify only following one:

  @InterfaceAudience.Private
  public static class FSCounter extends AbstractCounter {
    final String scheme;
    final FileSystemCounter key;
    private long value;

    public FSCounter(String scheme, FileSystemCounter ref) {
      this.scheme = scheme;
      key = ref;
    }

    @Override
    public String getName() {
      return NAME_JOINER.join(scheme, key.name());
    }

    @Override
    public String getDisplayName() {
      return DISP_JOINER.join(scheme, localizeCounterName(key.name()));
    }


Could perhaps point me to some of the remaining ones?
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480854#comment-13480854 ] 

Hadoop QA commented on MAPREDUCE-4229:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12550156/MAPREDUCE-4229-branch-0.23.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2951//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2951//console

This message is automatically generated.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4229) Counter names' memory usage can be decreased by interning

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli reassigned MAPREDUCE-4229:
--------------------------------------------------

    Assignee: Miomir Boljanovic
    
> Counter names' memory usage can be decreased by interning
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Miomir Boljanovic
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467138#comment-13467138 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

I have a couple of questions/suggestions for the patch.

Why are we using a strong interner and not a weak interner?  By using the strong version I think we will have a memory leak in the history server.  When someone declares a custom counter name it will never go away, even after that job's jhist file has been deleted out of HDFS and the counters are no longer accessible.  I think this is even worse for some of the strings that we are interning that contain the value of the counter, not just the name of the counter.  They will probably be different every time the function is called, causing some potentially very large memory leaks.  But I am not really all that sure they are called from a long running process.

It seems kind of odd where the interning calls are happening.  I doubt it is going to save any memory at all. For example we save the name of a counter inside a counter class instance and then only intern it when we return the name from the getter method.  If the intern call actually did anything it would not allow for the original name to be garbage collected, because it is still pointed to by the counter instance.  The only time we really need to intern a string is when that string is the result of reading data from a stream.  So this would be for all RPC calls and anything that parses out a string from a file.  In most other cases, like with strings that come directly from an ENUM or quoted string in the code they will already be interned by the runtime environment and adding this extra layer will only slow things down and actually use more memory.

I think it would be preferable to start out just interning the names of the counters and counter groups as they are read from a stream, like in the case of parsing the job history files.  Once that happens we can go back and evaluate if there are other places, like through RPC, that are using a lot of memory.  I would hold off on the RPC, because I am not really sure how clean it is to insert this into the protocol buffer bridge code that we use.  I think PB plays games with lazy parsing of the data and if we are not careful it could slow things down, or cause more memory usage.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482699#comment-13482699 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

Daryn, could you update the CHANGES.txt to include Miomir Boljanovic too.  He did much of the core work for this patch and deserves a lot of the credit for it.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Counter names' memory usage can be decreased by interning

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482801#comment-13482801 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4229:
----------------------------------------------------

Bobby/Daryn, can one of you file tickets for any pending items on this issue? Thanks.
                
> Counter names' memory usage can be decreased by interning
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Miomir Boljanovic
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Miomir Boljanovic updated MAPREDUCE-4229:
-----------------------------------------

    Attachment: MAPREDUCE-4229-branch-0.23.patch
    
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482688#comment-13482688 ] 

Hudson commented on MAPREDUCE-4229:
-----------------------------------

Integrated in Hadoop-trunk-Commit #2916 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2916/])
    MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401473)
MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401467)

     Result = SUCCESS
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401473
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestStringInterner.java

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401467
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/CountersStrings.java

                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461297#comment-13461297 ] 

Hadoop QA commented on MAPREDUCE-4229:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12546184/MAPREDUCE-4229.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified test files.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2869//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2869//console

This message is automatically generated.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479396#comment-13479396 ] 

Miomir Boljanovic commented on MAPREDUCE-4229:
----------------------------------------------

Sorry, this has been a hectic week for me too. Unfortunately, I don't have any figures to share yet but will try to capture some heap dumps over the weekend.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Miomir Boljanovic updated MAPREDUCE-4229:
-----------------------------------------

    Status: Patch Available  (was: Open)
    
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477435#comment-13477435 ] 

Hadoop QA commented on MAPREDUCE-4229:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12549396/MAPREDUCE-4229-branch-0.23.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2932//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2932//console

This message is automatically generated.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479436#comment-13479436 ] 

Hadoop QA commented on MAPREDUCE-4229:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12549767/MR-4229.txt
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2941//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2941//console

This message is automatically generated.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467139#comment-13467139 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

Also the patch is against trunk not 1.X.  It would be good to update the affects version and target version fields appropriately.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Counter names' memory usage can be decreased by interning

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483240#comment-13483240 ] 

Hudson commented on MAPREDUCE-4229:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1235 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1235/])
    Updating credits for MAPREDUCE-4229. (Revision 1401493)
MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401473)
MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401467)

     Result = SUCCESS
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401493
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401473
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestStringInterner.java

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401467
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/CountersStrings.java

                
> Counter names' memory usage can be decreased by interning
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Miomir Boljanovic
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-4229:
-------------------------------------------

    Attachment: MR-4229.txt

Patch that reduces memory consumption on History Server.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Miomir Boljanovic updated MAPREDUCE-4229:
-----------------------------------------

    Attachment: MAPREDUCE-4229.patch
    
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229.patch
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Miomir Boljanovic updated MAPREDUCE-4229:
-----------------------------------------

    Attachment: MAPREDUCE-4229-branch-0.23.patch
    
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269341#comment-13269341 ] 

Todd Lipcon commented on MAPREDUCE-4229:
----------------------------------------

Hi Steve. I am suggesting using a GettableSet<String> per JobInProgress, rather than actually calling String.intern. That way the stuff might make it to old-gen, but won't ever go to perm.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4229) Counter names' memory usage can be decreased by interning

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482812#comment-13482812 ] 

Hudson commented on MAPREDUCE-4229:
-----------------------------------

Integrated in Hadoop-Yarn-trunk #13 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/13/])
    Updating credits for MAPREDUCE-4229. (Revision 1401493)
MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401473)
MAPREDUCE-4229. Intern counter names in the JT (bobby via daryn) (Revision 1401467)

     Result = FAILURE
daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401493
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401473
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringInterner.java
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestStringInterner.java

daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1401467
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/CountersStrings.java

                
> Counter names' memory usage can be decreased by interning
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Assignee: Miomir Boljanovic
>             Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, mr-4229.txt, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480168#comment-13480168 ] 

Robert Joseph Evans commented on MAPREDUCE-4229:
------------------------------------------------

I just thought of one more thing we should do.  We should make the StringInterner as @Public and @Stable.  The API is simple enough I don't see much of a problem locking it down.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Miomir Boljanovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480937#comment-13480937 ] 

Miomir Boljanovic commented on MAPREDUCE-4229:
----------------------------------------------

Patch to annotate StringInterner with @Public and @Stable
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
>
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269335#comment-13269335 ] 

Steve Loughran commented on MAPREDUCE-4229:
-------------------------------------------

This could be good on large systems, but as the strings go into the PermGen space, people are going to have to be playing with the JVM options to make that large enough, so it's not without some consequences.
                
> Intern counter names in the JT
> ------------------------------
>
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Todd Lipcon
>
> In our experience, most of the memory in production JTs goes to storing counter names (String objects and character arrays). Since most counter names are reused again and again, it would be a big memory savings to keep a hash set of already-used counter names within a job, and refer to the same object from all tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira