You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Bhallamudi Venkata Siva Kamesh <ka...@imaginea.com> on 2012/07/12 15:02:50 UTC

Can we use String.intern inside WritableUtils#readString()?

Hi All,
  I noticed that WritableUtils.readString(), while deserializing the  
strings, creates a string object every time. But there may be  
applications, which serialize a small no of the strings, a huge number  
of times. So while deserializing them, this may lead to OOMs sometimes.

I think using intern() will reduce the creation of the number of  
String objects. Please correct me if my understading is wrong.

-- 
Thanks&Regards,
Bh.V.S.Kamesh,
+91-9652725948






Re: Can we use String.intern inside WritableUtils#readString()?

Posted by Robert Evans <ev...@yahoo-inc.com>.
Yes I filed a JIRA for something like this a while ago MAPREDUCE-4303.  I
have not done anything with it for this very reason.  There are some
potential fixes for this, we could keep a somewhat small weak reference
cache of these strings so that if a string is read multiple times it is
dedupped and if it is collected we don't force it to stay around too long
and it is not placed in the permgen space. But that is not a small change.
 If you want to take over that JIRA feel free, otherwise I will get around
to it eventually.

--Bobby Evans

On 7/12/12 1:27 PM, "Ramkumar Vadali" <ra...@gmail.com> wrote:

>String.intern() should be used with caution. The intern'ed strings go to
>the "perm gen" space in the java process, which is limited. You could
>easily run out of that space and get OOM errors even when the total usage
>is well below the Xmx value. A better way would be to have a Map<String,
>String> that de-deplicates string objects
>
>Ramkumar
>
>On Thu, Jul 12, 2012 at 6:02 AM, Bhallamudi Venkata Siva Kamesh <
>kamesh.b@imaginea.com> wrote:
>
>> Hi All,
>>  I noticed that WritableUtils.readString(), while deserializing the
>> strings, creates a string object every time. But there may be
>>applications,
>> which serialize a small no of the strings, a huge number of times. So
>>while
>> deserializing them, this may lead to OOMs sometimes.
>>
>> I think using intern() will reduce the creation of the number of String
>> objects. Please correct me if my understading is wrong.
>>
>> --
>> Thanks&Regards,
>> Bh.V.S.Kamesh,
>> +91-9652725948
>>
>>
>>
>>
>>
>>


Re: Can we use String.intern inside WritableUtils#readString()?

Posted by Ramkumar Vadali <ra...@gmail.com>.
String.intern() should be used with caution. The intern'ed strings go to
the "perm gen" space in the java process, which is limited. You could
easily run out of that space and get OOM errors even when the total usage
is well below the Xmx value. A better way would be to have a Map<String,
String> that de-deplicates string objects

Ramkumar

On Thu, Jul 12, 2012 at 6:02 AM, Bhallamudi Venkata Siva Kamesh <
kamesh.b@imaginea.com> wrote:

> Hi All,
>  I noticed that WritableUtils.readString(), while deserializing the
> strings, creates a string object every time. But there may be applications,
> which serialize a small no of the strings, a huge number of times. So while
> deserializing them, this may lead to OOMs sometimes.
>
> I think using intern() will reduce the creation of the number of String
> objects. Please correct me if my understading is wrong.
>
> --
> Thanks&Regards,
> Bh.V.S.Kamesh,
> +91-9652725948
>
>
>
>
>
>