You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2012/06/01 21:20:22 UTC

[jira] [Created] (MAPREDUCE-4303) Look at using String.intern to dedupe some Strings

Robert Joseph Evans created MAPREDUCE-4303:
----------------------------------------------

             Summary: Look at using String.intern to dedupe some Strings
                 Key: MAPREDUCE-4303
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: applicationmaster
    Affects Versions: 2.0.0-alpha, 0.23.3
            Reporter: Robert Joseph Evans


MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are other places where it is not as simple to remove the duplicates.  In these cases the source of the strings is an incoming RPC call or from parsing and reading in a file.  The only real way to dedupe these is to either use String.intern() which if not used properly could result in the permgen space being filled up, or by playing games with our own cache, and trying to do the same sort of thing as String.intern, but in the heap.

The following are some that I saw lots of duplicate strings that we should look at doing something about.

TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
MapTaskAttemptImpl.diagnostics
The keys to Counters.groups
GenericGroup.displayName
The keys to GenericGroup.counters
and GenericCounter.displayName

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-4303) Look at using String.intern to dedupe some Strings

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans resolved MAPREDUCE-4303.
--------------------------------------------

    Resolution: Duplicate
    
> Look at using String.intern to dedupe some Strings
> --------------------------------------------------
>
>                 Key: MAPREDUCE-4303
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster
>    Affects Versions: 0.23.3, 2.0.0-alpha
>            Reporter: Robert Joseph Evans
>
> MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are other places where it is not as simple to remove the duplicates.  In these cases the source of the strings is an incoming RPC call or from parsing and reading in a file.  The only real way to dedupe these is to either use String.intern() which if not used properly could result in the permgen space being filled up, or by playing games with our own cache, and trying to do the same sort of thing as String.intern, but in the heap.
> The following are some that I saw lots of duplicate strings that we should look at doing something about.
> TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
> MapTaskAttemptImpl.diagnostics
> The keys to Counters.groups
> GenericGroup.displayName
> The keys to GenericGroup.counters
> and GenericCounter.displayName

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira