You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by sebb <se...@gmail.com> on 2008/09/16 13:49:06 UTC

[LANG] - suggestions for new Identity Hash code classes

I think I've found a fix to LANG-459 - which is to borrow some code
from Apache Axis.

The required code is a single short class - IDKey (e.g.
http://www.jdocs.com/axis/1.4/org/apache/axis/utils/IDKey.html)

It could be added as a private class of HashCodeBuilder, but it has
wider use, so should probably be added as a public class.

Seems to me it would also be useful to add IdentityHashMap and
IdentityHashSet classes as well, given that LANG still targets Java
1.3.

WDYT?

And do these classes need a new package, or would the top-level package be OK?

S///

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [LANG] - suggestions for new Identity Hash code classes

Posted by Jörg Schaible <Jo...@scalaris.com>.
sebb wrote:
> I think I've found a fix to LANG-459 - which is to borrow some code
> from Apache Axis. 
> 
> The required code is a single short class - IDKey (e.g.
> http://www.jdocs.com/axis/1.4/org/apache/axis/utils/IDKey.html)
> 
> It could be added as a private class of HashCodeBuilder, but it has
> wider use, so should probably be added as a public class.
> 
> Seems to me it would also be useful to add IdentityHashMap and
> IdentityHashSet classes as well, given that LANG still targets Java
> 1.3.
> 
> WDYT?
> 
> And do these classes need a new package, or would the
> top-level package be OK?

Top leve is fine with me.

- Jörg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by sebb <se...@gmail.com>.
On 16/09/2008, sebb <se...@gmail.com> wrote:
> I think I've found a fix to LANG-459 - which is to borrow some code
>  from Apache Axis.
>
>  The required code is a single short class - IDKey (e.g.
>  http://www.jdocs.com/axis/1.4/org/apache/axis/utils/IDKey.html)
>
>  It could be added as a private class of HashCodeBuilder, but it has
>  wider use, so should probably be added as a public class.
>
>  Seems to me it would also be useful to add IdentityHashMap and
>  IdentityHashSet classes as well, given that LANG still targets Java
>  1.3.

Just noticed that Commons Collection already has IdentityMap, so no
point adding it to Lang.

>  WDYT?
>
>  And do these classes need a new package, or would the top-level package be OK?
>
>
>  S///
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by sebb <se...@gmail.com>.
On 16/09/2008, Jörg Schaible <Jo...@scalaris.com> wrote:
> Hi Simon,
>
>  Simon Kitching wrote:
>  [snip]
>
> > One other concern is regarding garbage collection. From a
>  > brief look at
>  > the code, this thread-local registry object contains a set of Integer
>  > hashcodes. With this change, the set will now contain real hard
>  > references to objects, preventing them from being garbage-collected
>  > while they are in the set. Does this matter?
>
>
> Good catch. I am quite sure this will matter, unfortunately. HashCodeBuilder.reflectionHashCode(Object) is widely used and if a new lang version suddenly keeps those values, we might prevent redeployments that worked so far and causing memory leaks. Using a WeakReference instead? We're getting slow :-/
>
>  - Jörg
>
>  BTW: The current implementation of HashCodeBuilder.reflectionHashCode(Object) is not wrong, its the class implementation that uses EqualsBuilder.reflectionEquals(o, this) together with it.

I disagree: the current implementation assumes that it can uniquely
map objects to Integers. This assumption does not depend on the
definition of equals.

> We might therefore also create a new method HashCodeBuilder.reflectionHashCodeRespectingEquals(Object) ...
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [LANG] - suggestions for new Identity Hash code classes

Posted by Jörg Schaible <Jo...@scalaris.com>.
Hi Simon,

Simon Kitching wrote:
[snip]
> One other concern is regarding garbage collection. From a
> brief look at
> the code, this thread-local registry object contains a set of Integer
> hashcodes. With this change, the set will now contain real hard
> references to objects, preventing them from being garbage-collected
> while they are in the set. Does this matter?

Good catch. I am quite sure this will matter, unfortunately. HashCodeBuilder.reflectionHashCode(Object) is widely used and if a new lang version suddenly keeps those values, we might prevent redeployments that worked so far and causing memory leaks. Using a WeakReference instead? We're getting slow :-/

- Jörg

BTW: The current implementation of HashCodeBuilder.reflectionHashCode(Object) is not wrong, its the class implementation that uses EqualsBuilder.reflectionEquals(o, this) together with it. We might therefore also create a new method HashCodeBuilder.reflectionHashCodeRespectingEquals(Object) ...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [LANG] - suggestions for new Identity Hash code classes

Posted by Jörg Schaible <Jo...@scalaris.com>.
sebb wrote:
> On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
[snip]
>>  One other concern is regarding garbage collection. From a brief
>> look at the code, this thread-local registry object contains a set
>> of Integer hashcodes. With this change, the set will now contain
>> real hard references to objects, preventing them from being
>> garbage-collected while they are in the set. Does this matter?
> 
> The entries are only retained in the registry during the hashcode
> calculation - if any were left behind they could mess up subsequent
> calls in the same thread. 
> 
> As far as I can see, the code always removes any items that are added,
> but it would be worth double-checking that.

In that case forget my other comment ;-)

- Jörg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by sebb <se...@gmail.com>.
On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
> sebb schrieb:
>
> > On 16/09/2008, Jörg Schaible <Jo...@scalaris.com> wrote:
> >
> >
> > > sebb wrote:
> > >  > On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
> > >
> > > [snip]
> > >
> > >
> > >
> > > >
> > > > >  It's not quite clear to me why a threadlocal is being used here.
> I'm
> > > > >
> > > > >
> > > >
> > >  >> guessing it is for performance to avoid recreating the HashSet
> > >  >> object.
> > >  >
> > >  > I don't know for sure, but that is probably the case.
> > >  >
> > >  >> Or is
> > >  >> it to "tunnel" the registry set between cooperating classes?
> > >  >
> > >  > Not at present, although the registy access methods are
> > >  > package protected.
> > >  >
> > >  > That may be a mistake...
> > >
> > >
> > > My guess: It's used in case of a circular object graph ...
> > >
> > >
> > >
> >
> > Yes, they are used in the enclosing class for just that purpose.
> >
> > But as far as I can see there is no need for the methods to be
> > anything other than private - and the contents of the HashMap only
> > contain anything during the calculation of the hashcode, so it's not
> > obvious how the methods could be called anyway.
> >
> > I meant it was probably a mistake for the methods to be package protected.
> >
> >
>  In that case, I wonder why it bothers to use a thread-local at all, rather
> than just passing the registry object around as a parameter...
>
>  Is the performance benefit of avoiding creation of a HashSet object really
> measurable given that this class is doing significant amounts of reflection?
>
>  Because this class never calls ThreadLocal.remove(), the created HashSet
> object will remain attached to every thread that ever was active when a
> HashCodeBuilder call was made. That's not *too* bad, ie it seems to me that
> it would be acceptable as long as there was a corresponding benefit
> somewhere. But I can't see the benefit ATM..

Ditto.

I think it would make also sense to privatise the add/remove etc methods.

However, this would change the API.
But as that part of the API seems to be useless, I would not object to
it being removed.

>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by Simon Kitching <sk...@apache.org>.
sebb schrieb:
> On 16/09/2008, Jörg Schaible <Jo...@scalaris.com> wrote:
>   
>> sebb wrote:
>>  > On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
>>
>> [snip]
>>
>>     
>>>>  It's not quite clear to me why a threadlocal is being used here. I'm
>>>>         
>>  >> guessing it is for performance to avoid recreating the HashSet
>>  >> object.
>>  >
>>  > I don't know for sure, but that is probably the case.
>>  >
>>  >> Or is
>>  >> it to "tunnel" the registry set between cooperating classes?
>>  >
>>  > Not at present, although the registy access methods are
>>  > package protected.
>>  >
>>  > That may be a mistake...
>>
>>
>> My guess: It's used in case of a circular object graph ...
>>
>>     
>
> Yes, they are used in the enclosing class for just that purpose.
>
> But as far as I can see there is no need for the methods to be
> anything other than private - and the contents of the HashMap only
> contain anything during the calculation of the hashcode, so it's not
> obvious how the methods could be called anyway.
>
> I meant it was probably a mistake for the methods to be package protected.
>   
In that case, I wonder why it bothers to use a thread-local at all, 
rather than just passing the registry object around as a parameter...

Is the performance benefit of avoiding creation of a HashSet object 
really measurable given that this class is doing significant amounts of 
reflection?

Because this class never calls ThreadLocal.remove(), the created HashSet 
object will remain attached to every thread that ever was active when a 
HashCodeBuilder call was made. That's not *too* bad, ie it seems to me 
that it would be acceptable as long as there was a corresponding benefit 
somewhere. But I can't see the benefit ATM..


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by sebb <se...@gmail.com>.
On 16/09/2008, Jörg Schaible <Jo...@scalaris.com> wrote:
> sebb wrote:
>  > On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
>
> [snip]
>
> >>  It's not quite clear to me why a threadlocal is being used here. I'm
>  >> guessing it is for performance to avoid recreating the HashSet
>  >> object.
>  >
>  > I don't know for sure, but that is probably the case.
>  >
>  >> Or is
>  >> it to "tunnel" the registry set between cooperating classes?
>  >
>  > Not at present, although the registy access methods are
>  > package protected.
>  >
>  > That may be a mistake...
>
>
> My guess: It's used in case of a circular object graph ...
>

Yes, they are used in the enclosing class for just that purpose.

But as far as I can see there is no need for the methods to be
anything other than private - and the contents of the HashMap only
contain anything during the calculation of the hashcode, so it's not
obvious how the methods could be called anyway.

I meant it was probably a mistake for the methods to be package protected.

>  - Jörg
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


RE: [LANG] - suggestions for new Identity Hash code classes

Posted by Jörg Schaible <Jo...@scalaris.com>.
sebb wrote:
> On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
[snip]
>>  It's not quite clear to me why a threadlocal is being used here. I'm
>> guessing it is for performance to avoid recreating the HashSet
>> object. 
> 
> I don't know for sure, but that is probably the case.
> 
>> Or is
>> it to "tunnel" the registry set between cooperating classes?
> 
> Not at present, although the registy access methods are
> package protected.
> 
> That may be a mistake...

My guess: It's used in case of a circular object graph ...

- Jörg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by sebb <se...@gmail.com>.
On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
> sebb schrieb:
>
> > On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
> >
> >
> > >  One other concern is regarding garbage collection. From a brief look at
> the
> > > code, this thread-local registry object contains a set of Integer
> hashcodes.
> > > With this change, the set will now contain real hard references to
> objects,
> > > preventing them from being garbage-collected while they are in the set.
> Does
> > > this matter?
> > >
> > >
> >
> > The entries are only retained in the registry during the hashcode
> > calculation - if any were left behind they could mess up subsequent
> > calls in the same thread.
> >
> > As far as I can see, the code always removes any items that are added,
> > but it would be worth double-checking that.
> >
> >
>
>  I did suspect that the objects are removed from the registry before normal
> return. So then the IDKey approach should be fine.
>
>  But it would seem to me to be safer to have a call to registry.remove() to
> ensure that the threadlocal object is completely cleared, rather than just
> relying on the code to correctly balance adds/removes on the HashSet.

The add and remove are in the same try block; remove is in the finally clause.

> Or a call to getRegistry().clear().

Could do, but seems like overkill.

> Or at least a comment on the registry stating
> that it is always empty on return from the reflectionHashCode method etc.

OK.

>  It's not quite clear to me why a threadlocal is being used here. I'm
> guessing it is for performance to avoid recreating the HashSet object.

I don't know for sure, but that is probably the case.

> Or is
> it to "tunnel" the registry set between cooperating classes?

Not at present, although the registy access methods are package protected.

That may be a mistake...

> If the latter,
> then registry.remove() seems a good idea when it is no longer used, to avoid
> leaking an object per thread.
>
>  Sorry for the vague comments; I don't have time to really get to grips with
> the code ATM..

NP, it's always useful to thrash such things out at the design stage
rather than after deployment ...

>  Regards,
>
>  Simon
>
>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by Simon Kitching <sk...@apache.org>.
sebb schrieb:
> On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
>   
>
>>  One other concern is regarding garbage collection. From a brief look at the
>> code, this thread-local registry object contains a set of Integer hashcodes.
>> With this change, the set will now contain real hard references to objects,
>> preventing them from being garbage-collected while they are in the set. Does
>> this matter?
>>     
>
> The entries are only retained in the registry during the hashcode
> calculation - if any were left behind they could mess up subsequent
> calls in the same thread.
>
> As far as I can see, the code always removes any items that are added,
> but it would be worth double-checking that.
>   

I did suspect that the objects are removed from the registry before 
normal return. So then the IDKey approach should be fine.

But it would seem to me to be safer to have a call to registry.remove() 
to ensure that the threadlocal object is completely cleared, rather than 
just relying on the code to correctly balance adds/removes on the 
HashSet. Or a call to getRegistry().clear(). Or at least a comment on 
the registry stating that it is always empty on return from the 
reflectionHashCode method etc.

It's not quite clear to me why a threadlocal is being used here. I'm 
guessing it is for performance to avoid recreating the HashSet object. 
Or is it to "tunnel" the registry set between cooperating classes? If 
the latter, then registry.remove() seems a good idea when it is no 
longer used, to avoid leaking an object per thread.

Sorry for the vague comments; I don't have time to really get to grips 
with the code ATM..

Regards,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by sebb <se...@gmail.com>.
On 16/09/2008, Simon Kitching <sk...@apache.org> wrote:
> sebb schrieb:
>
> > I think I've found a fix to LANG-459 - which is to borrow some code
> > from Apache Axis.
> >
> > The required code is a single short class - IDKey (e.g.
> >
> http://www.jdocs.com/axis/1.4/org/apache/axis/utils/IDKey.html)
> >
> > It could be added as a private class of HashCodeBuilder, but it has
> > wider use, so should probably be added as a public class.
> >
> >
>  Seems reasonable to me.  It does look like it will solve LANG-459. And as
> you say, this problem may be reasonably common.
>
>  What about making the IDKey class final? One thing that does concern me is
> that HashSet lookup using an Integer object as the key should be pretty
> fast. Using an IDKey object as the key may not perform as well. Making IDKey
> final would at least help. And I cannot see any reason why someone would
> want to subclass IDKey.
>

Good idea. I already made the fields final.

>  One other concern is regarding garbage collection. From a brief look at the
> code, this thread-local registry object contains a set of Integer hashcodes.
> With this change, the set will now contain real hard references to objects,
> preventing them from being garbage-collected while they are in the set. Does
> this matter?

The entries are only retained in the registry during the hashcode
calculation - if any were left behind they could mess up subsequent
calls in the same thread.

As far as I can see, the code always removes any items that are added,
but it would be worth double-checking that.

> > Seems to me it would also be useful to add IdentityHashMap and
> > IdentityHashSet classes as well, given that LANG still targets Java
> > 1.3.
> >
> > WDYT?
> >
> > And do these classes need a new package, or would the top-level package be
> OK?
> >
> >
>
>  No opinion.
>
>  Cheers,
>  Simon
>
>
>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>  For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [LANG] - suggestions for new Identity Hash code classes

Posted by Simon Kitching <sk...@apache.org>.
sebb schrieb:
> I think I've found a fix to LANG-459 - which is to borrow some code
> from Apache Axis.
>
> The required code is a single short class - IDKey (e.g.
> http://www.jdocs.com/axis/1.4/org/apache/axis/utils/IDKey.html)
>
> It could be added as a private class of HashCodeBuilder, but it has
> wider use, so should probably be added as a public class.
>   
Seems reasonable to me.  It does look like it will solve LANG-459. And 
as you say, this problem may be reasonably common.

What about making the IDKey class final? One thing that does concern me 
is that HashSet lookup using an Integer object as the key should be 
pretty fast. Using an IDKey object as the key may not perform as well. 
Making IDKey final would at least help. And I cannot see any reason why 
someone would want to subclass IDKey.

One other concern is regarding garbage collection. From a brief look at 
the code, this thread-local registry object contains a set of Integer 
hashcodes. With this change, the set will now contain real hard 
references to objects, preventing them from being garbage-collected 
while they are in the set. Does this matter?
> Seems to me it would also be useful to add IdentityHashMap and
> IdentityHashSet classes as well, given that LANG still targets Java
> 1.3.
>
> WDYT?
>
> And do these classes need a new package, or would the top-level package be OK?
>   

No opinion.

Cheers,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org