You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Kay Kay <ka...@gmail.com> on 2009/12/30 09:36:18 UTC

clearing o.a.h.io.Text

In o.a.h.io.Text - the clear method currently just resets length to 0, 
while not doing anything about the bytes internally.

Curious to know the thoughts behind the decision (to let the internal 
bytes to be reused for future appends  vs. memory leaks due to not 
clearing them ) ?  Thanks.

$ svn diff
Index: src/java/org/apache/hadoop/io/Text.java
===================================================================
--- src/java/org/apache/hadoop/io/Text.java    (revision 894545)
+++ src/java/org/apache/hadoop/io/Text.java    (working copy)
@@ -224,6 +224,7 @@
    */
   public void clear() {
     length = 0;
+    bytes = EMPTY_BYTES;
   }


Re: clearing o.a.h.io.Text

Posted by Owen O'Malley <ow...@gmail.com>.
On Jan 1, 2010, at 2:39 AM, Kay Kay <ka...@gmail.com> wrote:

>
> I believe that behavior would be surprising to the user if they were
> expecting the object resources to be released entirely, by calling the
> clear() method.

I disagree. Clear only promises to reset to the empty string. It  
doesn't imply freeing resources.

> May be - clear() can reset the internal byte buffer and another method
> provided - called reset() / rewind() that can reuse the existing  
> internal
> buffer while resetting the length variable only.

Changing semantics of Text methods is very difficult. Clear is exactly  
the right verb for what it does. A patch that makes the Javadoc clear  
would be appriciated.

Once we have setCapacity, a lot of these issues go away.

txt.setCapacity(0)

Is very clear what your intent is.

-- Owen

Re: clearing o.a.h.io.Text

Posted by Kay Kay <ka...@gmail.com>.
On Thu, Dec 31, 2009 at 11:03 PM, Owen O'Malley <om...@apache.org> wrote:

>
> On Dec 30, 2009, at 12:36 AM, Kay Kay wrote:
>
>  In o.a.h.io.Text - the clear method currently just resets length to 0,
>> while not doing anything about the bytes internally.
>>
>> Curious to know the thoughts behind the decision (to let the internal
>> bytes to be reused for future appends  vs. memory leaks due to not clearing
>> them ) ?
>>
>
> The byte array that backs up the Text object is always reused.


I believe that behavior would be surprising to the user if they were
expecting the object resources to be released entirely, by calling the
clear() method.

May be - clear() can reset the internal byte buffer and another method
provided - called reset() / rewind() that can reuse the existing internal
buffer while resetting the length variable only.




> It might make sense to have a setCapacity method on Text that is similar to
> BytesWritable's. With such a method, it would be possible to shrink the size
> of the backing array.
>
>
HADOOP-6476 in place for this.



> -- Owen
>

Re: clearing o.a.h.io.Text

Posted by Owen O'Malley <om...@apache.org>.
On Dec 30, 2009, at 12:36 AM, Kay Kay wrote:

> In o.a.h.io.Text - the clear method currently just resets length to  
> 0, while not doing anything about the bytes internally.
>
> Curious to know the thoughts behind the decision (to let the  
> internal bytes to be reused for future appends  vs. memory leaks due  
> to not clearing them ) ?

The byte array that backs up the Text object is always reused. It  
might make sense to have a setCapacity method on Text that is similar  
to BytesWritable's. With such a method, it would be possible to shrink  
the size of the backing array.

-- Owen