You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org> on 2012/01/07 21:15:39 UTC

[jira] [Commented] (MAPREDUCE-3511) Counters occupy a good part of AM heap

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182081#comment-13182081 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3511:
----------------------------------------------------

Sid and I did some experiments again with AMScalability benchmark (100K 1second maps), and found that counters are one of the biggest culprits of the job slowdown. Counters are occupying lots of heap, causing full GCs very frequently and slowing down the AM.

I have a raw patch that I am cleaning up now which moves the counters storage to the MRV1 mapreduce.Counters.

Regarding Robert's concern above about the doubling of heap on an incoming RPC for all counters: I checked and there is only one API which tries to obtain all TaskReports - {{getTaskReports()}}. This call is *supposed to* be rare and the blowup of heap on that rare event is unavoidable if we want to optimize for the general case. The best we can do is lock that call so that only one call can go through to the AM at any time.

As for my *supposed to* be rare comment, unfortunately this api got sneaked into the "job -list" path via MAPREDUCE-2789. None of the commands like "job -list" should go to each AM anyways for performance reasons. I will fix it as part of MAPREDUCE-3476.
                
> Counters occupy a good part of AM heap
> --------------------------------------
>
>                 Key: MAPREDUCE-3511
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3511
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Siddharth Seth
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>
> Per task counters seem to be occupying a good part of an AMs heap. Looks like more than 50% of what's used by a TaskAttemptImpl object.
> This could be optimized by interning strings or possibly using mrv1 counters which are optimized. Currently counters are converted from mrv1 to mrv2 format for in memory storage. The conversion could be delayed till it's actually required for RPC transfers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira