You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2014/11/08 05:25:49 UTC

Why is PayloadAttribute.clone() shallow?

Hi

Someone ran into this in the context of working with TeeSinkTokenFilter -
when the state is cloned for each of the sinks, if you have a
PayloadAttribute on the stream, it's shallow cloned and thus all sinks
share the same byte[].

At first I thought this is just a bug in PA.clone(), and that it should
have been implemented using deepCopyOf, but then I noticed that it's
actually documented that if you require deep cloning, you should implement
your own PA (or at least override .clone()).

But then, CharTermAttribute.clone() documents that it does a shallow clone,
but actually implements a *deep* clone.

So now I think PA.clone() should indeed be implemented as a deep clone...

What do you think?

Shai

Re: Why is PayloadAttribute.clone() shallow?

Posted by Shai Erera <se...@gmail.com>.
Thanks Uwe. I'll open an issue and fix in trunk and 5x.

Shai

On Sat Nov 08 2014 at 12:17:15 PM Uwe Schindler <uw...@thetaphi.de> wrote:

> +1
>
>
>
> This is actually a bug in my opinion. The reason for this reach back to
> the change from pure byte[] to BytesRef. Before it was also shallow, but
> the general use-case was to not change the byte[] contents (so see it as
> final), but assign new byte[] to it – but with the change to BytesRef there
> is one more indirection, leading to confusion. In fact
> AttributeImpl.clone() should always be a deep clone, otherwise saving state
> and cloning attributes is incorrect.
>
>
>
> And we should fix the documentation for CTA and PA. Lucene 5 is the ideal
> place to fixup this behavior, so we dont break code unexpected.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Saturday, November 08, 2014 5:26 AM
> *To:* dev@lucene.apache.org
> *Subject:* Why is PayloadAttribute.clone() shallow?
>
>
>
> Hi
>
> Someone ran into this in the context of working with TeeSinkTokenFilter -
> when the state is cloned for each of the sinks, if you have a
> PayloadAttribute on the stream, it's shallow cloned and thus all sinks
> share the same byte[].
>
> At first I thought this is just a bug in PA.clone(), and that it should
> have been implemented using deepCopyOf, but then I noticed that it's
> actually documented that if you require deep cloning, you should implement
> your own PA (or at least override .clone()).
>
> But then, CharTermAttribute.clone() documents that it does a shallow
> clone, but actually implements a *deep* clone.
>
> So now I think PA.clone() should indeed be implemented as a deep clone...
>
> What do you think?
>
> Shai
>

RE: Why is PayloadAttribute.clone() shallow?

Posted by Uwe Schindler <uw...@thetaphi.de>.
+1

 

This is actually a bug in my opinion. The reason for this reach back to the change from pure byte[] to BytesRef. Before it was also shallow, but the general use-case was to not change the byte[] contents (so see it as final), but assign new byte[] to it – but with the change to BytesRef there is one more indirection, leading to confusion. In fact AttributeImpl.clone() should always be a deep clone, otherwise saving state and cloning attributes is incorrect.

 

And we should fix the documentation for CTA and PA. Lucene 5 is the ideal place to fixup this behavior, so we dont break code unexpected.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Shai Erera [mailto:serera@gmail.com] 
Sent: Saturday, November 08, 2014 5:26 AM
To: dev@lucene.apache.org
Subject: Why is PayloadAttribute.clone() shallow?

 

Hi

Someone ran into this in the context of working with TeeSinkTokenFilter - when the state is cloned for each of the sinks, if you have a PayloadAttribute on the stream, it's shallow cloned and thus all sinks share the same byte[].

At first I thought this is just a bug in PA.clone(), and that it should have been implemented using deepCopyOf, but then I noticed that it's actually documented that if you require deep cloning, you should implement your own PA (or at least override .clone()).

But then, CharTermAttribute.clone() documents that it does a shallow clone, but actually implements a *deep* clone.

So now I think PA.clone() should indeed be implemented as a deep clone...

What do you think?

Shai