You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Adriano Crestani <ad...@apache.org> on 2010/07/13 09:59:43 UTC

Cloning TermAttribute objects

Hi,

Why TermAttributeImpl.clone() method uses buff.clone() instead of
System.arrayCopy to clone its internal buffer? Performance reasons?

I have the following scenario:

...
public boolean incrementToken() {
...
String twoHundredKCharsString = "abc....";
String smallString = "test";

termAttribute.setTermBuffer(twoHundredKCharsString);
State largeStringState = captureState();

termAttribute.setTermBuffer(smallString);
State smallStringState = captureState();

...
}
...

And guess what?! smallStringState has a TermAttribute object that
holds an internal buffer of 200k chars in size!!!

I was googling and found out that using cloning and arrayCopy has the
same performance for small arrays, and cloning just performs better
for large arrays.

So, if large string inputs are not a real scenario, why not use
arrayCopy instead of clone? But in case it's a real scenario, Lucene
should definitely not be copying the entire buffer for small strings.

Maybe TermAttribute interface could expose a method like
shrinkBuffer(), so the user could invoke when it needs to.

Thoughts?

Best Regards,
Adriano Crestani

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Cloning TermAttribute objects

Posted by Adriano Crestani <ad...@gmail.com>.

Keeping this thread alive.

I would appreciate a response from the community about this issue.

Thanks in advance,
Adriano Crestani

On Tue, Jul 13, 2010 at 3:59 AM, Adriano Crestani
<ad...@apache.org> wrote:
> Hi,
>
> Why TermAttributeImpl.clone() method uses buff.clone() instead of
> System.arrayCopy to clone its internal buffer? Performance reasons?
>
> I have the following scenario:
>
> ...
> public boolean incrementToken() {
> ...
> String twoHundredKCharsString = "abc....";
> String smallString = "test";
>
> termAttribute.setTermBuffer(twoHundredKCharsString);
> State largeStringState = captureState();
>
> termAttribute.setTermBuffer(smallString);
> State smallStringState = captureState();
>
> ...
> }
> ...
>
> And guess what?! smallStringState has a TermAttribute object that
> holds an internal buffer of 200k chars in size!!!
>
> I was googling and found out that using cloning and arrayCopy has the
> same performance for small arrays, and cloning just performs better
> for large arrays.
>
> So, if large string inputs are not a real scenario, why not use
> arrayCopy instead of clone? But in case it's a real scenario, Lucene
> should definitely not be copying the entire buffer for small strings.
>
> Maybe TermAttribute interface could expose a method like
> shrinkBuffer(), so the user could invoke when it needs to.
>
> Thoughts?
>
> Best Regards,
> Adriano Crestani
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org