You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Henry Story <he...@bblfish.net> on 2004/02/09 22:50:24 UTC

An improvement to AbstractHashedMap, and a new HashedSet class

Hi,

A little background to this contribution is in order. I was looking at 
an interesting framework for Value Objects written by Dirk Riehle, 
available under the lgpl at http://www.jvalue.org/, when I found a 
small bug in his code. It was obvious that what was needed was a 
WeakHashedSet - a HashSet that would wrap its contents in 
WeakReference's. Given the difficulty of extending java.util classes I 
opted to look at the apache collections package instead.

I first decided to create a HashedSet class, an equivalent of 
java.net.HashSet. The best way to do this I realised, would be to 
extend the AbstractHashedMap. The JavaDoc for this class says:

 > This class implements all the features necessary for a subclass 
hash-based map.
 > Key-value entries are stored in instances of the HashEntry class,  
which can be
 > overridden and replaced. The iterators can similarly be replaced,  
without the
 > need to replace the KeySet, EntrySet and Values view classes.

But in fact it is not quite as extensible as it could be because the 
HashEntry is currently a class, not an interface. Let me explain by 
taking the simpler example of rewriting the java.util.HashSet class.

HashSet
     map -> HashMap
              entries -> HashEntries[]
	                      key -> Object
                           value -> Object
	                      next -> HashEntry

This picture show that HashSet contains a pointer to a HashMap, which 
contains a pointer to an Array of HashEntries. Each HashEntry contains 
a key pointer, a value pointer and a pointer to the next HashEntry.

A HashSet uses the HashMap because the keys in a HashMap are unique. As 
a result the value pointer in the HashEntry has really no role to play. 
The way AbstractHashedMap is written in the apache collection means 
that to simply follow the above pattern would waste one pointer for 
every HashEntry, that is 4 bytes wasted for every entry.

The solution is simple.
  1. I turned the HashEntry inner class into an inner interface of 
AbstractHashedMap.
  2. I added a few methods to get/set the key and the value. These 
methods should be inlined by all modern compilers, so I don't think 
there is any efficiency problem here.
  3. I then wrote a HashedSet which has an inner class that extends 
AbstractHashedMap, but which takes a modified HashEntry class where the 
key and the value are the same object. The resulting 
net.bblfish.util.HashedSet is thus more memory efficient than 
java.util.HashSet.

HashedSet
     map -> SpecialisedHashMap
              entries -> HashEntries[]
	                      key -> Object
                           next -> HashEntry


Perhaps it would be worth adding these changes to the library.

If the change to 
org.apache.commons.collections.map.AbstractHashedMap.HashEntry is 
accepted then it will make it relatively easy to write a WeakHashedSet. 
It will be just a matter of extending WeakReference by implementing the 
HashEntry methods.

WeakHashedSet
     map -> SpeacialWeakHashMap
              entries -> WeakHashEntries[] extends WeakReference
                                           implements HashEntry
	                      key -> Object
                           next -> HashEntry


Henry

I have placed the changes in a tar file at:
http://bblfish.net/tmp/changed.tar


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: new HashedSet + WeakHashedSet classes

Posted by Henry Story <he...@bblfish.net>.
The source is now available online under the src directory as well as 
the original
changes.tar file.

http://bblfish.net/java/collections/src

Henry

On 13 Feb 2004, at 10:06, Henry Story wrote:

> I have now finished the two new classes HashedSet and WeakHashedSet. I 
> have added some tests for them, though I should add a lot more.
>
> To make this work I needed to make a few changes to AbstractHashedMap. 
> These changes had a few repercussions on other classes. But they now 
> run all the tests successfully.
>
> They classes with UML diagrams and some explanations are available on 
> my web site at:
> http://bblfish.net/java/collections/
>
>
> Please let me know if this is the wrong place to post this.
>
> Thanks,
>
> Henry Story


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: new HashedSet + WeakHashedSet classes

Posted by Henry Story <he...@bblfish.net>.
The source is now available online under the src directory as well as 
the original
changes.tar file.

http://bblfish.net/java/collections/src

Henry

On 13 Feb 2004, at 10:06, Henry Story wrote:

> I have now finished the two new classes HashedSet and WeakHashedSet. I 
> have added some tests for them, though I should add a lot more.
>
> To make this work I needed to make a few changes to AbstractHashedMap. 
> These changes had a few repercussions on other classes. But they now 
> run all the tests successfully.
>
> They classes with UML diagrams and some explanations are available on 
> my web site at:
> http://bblfish.net/java/collections/
>
>
> Please let me know if this is the wrong place to post this.
>
> Thanks,
>
> Henry Story


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


new HashedSet + WeakHashedSet classes

Posted by Henry Story <he...@bblfish.net>.
I have now finished the two new classes HashedSet and WeakHashedSet. I 
have added some tests for them, though I should add a lot more.

To make this work I needed to make a few changes to AbstractHashedMap. 
These changes had a few repercussions on other classes. But they now 
run all the tests successfully.

They classes with UML diagrams and some explanations are available on 
my web site at:
http://bblfish.net/java/collections/


Please let me know if this is the wrong place to post this.

Thanks,

Henry Story


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


new HashedSet + WeakHashedSet classes

Posted by Henry Story <he...@bblfish.net>.
I have now finished the two new classes HashedSet and WeakHashedSet. I 
have added some tests for them, though I should add a lot more.

To make this work I needed to make a few changes to AbstractHashedMap. 
These changes had a few repercussions on other classes. But they now 
run all the tests successfully.

They classes with UML diagrams and some explanations are available on 
my web site at:
http://bblfish.net/java/collections/


Please let me know if this is the wrong place to post this.

Thanks,

Henry Story


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: new HashedSet + WeakHashedSet classes

Posted by "matthew.hawthorne" <ma...@apache.org>.
Henry Story wrote:
> I have now finished the two new classes HashedSet and WeakHashedSet. I 
> have added some tests for them, though I should add a lot more.
> 
> To make this work I needed to make a few changes to AbstractHashedMap. 
> These changes had a few repercussions on other classes. But they now run 
> all the tests successfully.
> 
> They classes with UML diagrams and some explanations are available on my 
> web site at:
> http://bblfish.net/java/collections/


You don't need to cross post.  Since your talking about new classes and 
submitting patches, this discussion probably belongs on the developer 
list, not the user list.  And Stephen is subscribed to the list, so you 
shouldn't need to email him directly.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: An improvement to AbstractHashedMap, and a new HashedSet class

Posted by Henry Story <he...@bblfish.net>.
I am using the collections library from CVS and did not have a problem 
with the minor re-factoring. After a few quick cosmetic changes, 
everything works fine as far as I can tell.

I have placed a couple of UML diagrams on my web site with the code to 
help explain what I am doing. You can find the files at: 
http://bblfish.net/java/collections/

The collections.jpg picture is a quick sketch of the current 
collections.map class hierarchy.  As you can see AbstractHashedMap 
contains a collection of HashEntry objects. Each of these have a key 
and a value. Various subclasses of AbstractHashedMap implement 
different specialisations of hashes, some of which require 
specialisations of HashedEntry (eg IdentityMap requires an 
IdentityEntry).
This diagram does not show the inner class relationships. For example 
HashedEntry is an inner class of AbstractHashedMap.

What I am proposing is really only a very small change. (Even if the 
next diagram looks very different)

If you looks at collections-altered.jpg you will see that the only 
novelty is that HashEntry is now an interface, with most of the code 
that it used to contain now in DefaultHashEntry. I have shown the 
packages nested inside one another to make clear the static inner 
class/interface relationships. As you see DefaultHashEntry has three 
fields, but ReflexiveHashEntry only has two, since it is a data 
structure that is used for a set where they map.key = map.value.

The only change needed is the interface change - making HashEntry an 
interface.
This allows us to bypass the default implementation of HashEntry.

I just added the HashedSet as an example of how it could come in useful.

On 10 Feb 2004, at 18:57, Stephen Colebourne wrote:

> Unfortunately, as version 3 of [collections] has now been released with
> HashEntry as a class, that cannot be changed to an interface.
>
> I would however, support the addition of a similar AbstractHashedSet 
> class
> to [collections], preferably with a default HashedSet implementation.
>
> Stephen
>
>
> ----- Original Message -----
> From: "Henry Story" <he...@bblfish.net>
>
>
>> Hi,
>>
>> A little background to this contribution is in order. I was looking at
>> an interesting framework for Value Objects written by Dirk Riehle,
>> available under the lgpl at http://www.jvalue.org/, when I found a
>> small bug in his code. It was obvious that what was needed was a
>> WeakHashedSet - a HashSet that would wrap its contents in
>> WeakReference's. Given the difficulty of extending java.util classes I
>> opted to look at the apache collections package instead.
>>
>> I first decided to create a HashedSet class, an equivalent of
>> java.net.HashSet. The best way to do this I realised, would be to
>> extend the AbstractHashedMap. The JavaDoc for this class says:
>>
>>> This class implements all the features necessary for a subclass
>> hash-based map.
>>> Key-value entries are stored in instances of the HashEntry class,
>> which can be
>>> overridden and replaced. The iterators can similarly be replaced,
>> without the
>>> need to replace the KeySet, EntrySet and Values view classes.
>>
>> But in fact it is not quite as extensible as it could be because the
>> HashEntry is currently a class, not an interface. Let me explain by
>> taking the simpler example of rewriting the java.util.HashSet class.
>>
>> HashSet
>>      map -> HashMap
>>               entries -> HashEntries[]
>>                       key -> Object
>>                            value -> Object
>>                       next -> HashEntry
>>
>> This picture show that HashSet contains a pointer to a HashMap, which
>> contains a pointer to an Array of HashEntries. Each HashEntry contains
>> a key pointer, a value pointer and a pointer to the next HashEntry.
>>
>> A HashSet uses the HashMap because the keys in a HashMap are unique. 
>> As
>> a result the value pointer in the HashEntry has really no role to 
>> play.
>> The way AbstractHashedMap is written in the apache collection means
>> that to simply follow the above pattern would waste one pointer for
>> every HashEntry, that is 4 bytes wasted for every entry.
>>
>> The solution is simple.
>>   1. I turned the HashEntry inner class into an inner interface of
>> AbstractHashedMap.
>>   2. I added a few methods to get/set the key and the value. These
>> methods should be inlined by all modern compilers, so I don't think
>> there is any efficiency problem here.
>>   3. I then wrote a HashedSet which has an inner class that extends
>> AbstractHashedMap, but which takes a modified HashEntry class where 
>> the
>> key and the value are the same object. The resulting
>> net.bblfish.util.HashedSet is thus more memory efficient than
>> java.util.HashSet.
>>
>> HashedSet
>>      map -> SpecialisedHashMap
>>               entries -> HashEntries[]
>>                       key -> Object
>>                            next -> HashEntry
>>
>>
>> Perhaps it would be worth adding these changes to the library.
>>
>> If the change to
>> org.apache.commons.collections.map.AbstractHashedMap.HashEntry is
>> accepted then it will make it relatively easy to write a 
>> WeakHashedSet.
>> It will be just a matter of extending WeakReference by implementing 
>> the
>> HashEntry methods.
>>
>> WeakHashedSet
>>      map -> SpeacialWeakHashMap
>>               entries -> WeakHashEntries[] extends WeakReference
>>                                            implements HashEntry
>>                       key -> Object
>>                            next -> HashEntry
>>
>>
>> Henry


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: An improvement to AbstractHashedMap, and a new HashedSet class

Posted by Henry Story <he...@bblfish.net>.
I am using the collections library from CVS and did not have a problem 
with the minor re-factoring. After a few quick cosmetic changes, 
everything works fine as far as I can tell.

I have placed a couple of UML diagrams on my web site with the code to 
help explain what I am doing. You can find the files at: 
http://bblfish.net/java/collections/

The collections.jpg picture is a quick sketch of the current 
collections.map class hierarchy.  As you can see AbstractHashedMap 
contains a collection of HashEntry objects. Each of these have a key 
and a value. Various subclasses of AbstractHashedMap implement 
different specialisations of hashes, some of which require 
specialisations of HashedEntry (eg IdentityMap requires an 
IdentityEntry).
This diagram does not show the inner class relationships. For example 
HashedEntry is an inner class of AbstractHashedMap.

What I am proposing is really only a very small change. (Even if the 
next diagram looks very different)

If you looks at collections-altered.jpg you will see that the only 
novelty is that HashEntry is now an interface, with most of the code 
that it used to contain now in DefaultHashEntry. I have shown the 
packages nested inside one another to make clear the static inner 
class/interface relationships. As you see DefaultHashEntry has three 
fields, but ReflexiveHashEntry only has two, since it is a data 
structure that is used for a set where they map.key = map.value.

The only change needed is the interface change - making HashEntry an 
interface.
This allows us to bypass the default implementation of HashEntry.

I just added the HashedSet as an example of how it could come in useful.

On 10 Feb 2004, at 18:57, Stephen Colebourne wrote:

> Unfortunately, as version 3 of [collections] has now been released with
> HashEntry as a class, that cannot be changed to an interface.
>
> I would however, support the addition of a similar AbstractHashedSet 
> class
> to [collections], preferably with a default HashedSet implementation.
>
> Stephen
>
>
> ----- Original Message -----
> From: "Henry Story" <he...@bblfish.net>
>
>
>> Hi,
>>
>> A little background to this contribution is in order. I was looking at
>> an interesting framework for Value Objects written by Dirk Riehle,
>> available under the lgpl at http://www.jvalue.org/, when I found a
>> small bug in his code. It was obvious that what was needed was a
>> WeakHashedSet - a HashSet that would wrap its contents in
>> WeakReference's. Given the difficulty of extending java.util classes I
>> opted to look at the apache collections package instead.
>>
>> I first decided to create a HashedSet class, an equivalent of
>> java.net.HashSet. The best way to do this I realised, would be to
>> extend the AbstractHashedMap. The JavaDoc for this class says:
>>
>>> This class implements all the features necessary for a subclass
>> hash-based map.
>>> Key-value entries are stored in instances of the HashEntry class,
>> which can be
>>> overridden and replaced. The iterators can similarly be replaced,
>> without the
>>> need to replace the KeySet, EntrySet and Values view classes.
>>
>> But in fact it is not quite as extensible as it could be because the
>> HashEntry is currently a class, not an interface. Let me explain by
>> taking the simpler example of rewriting the java.util.HashSet class.
>>
>> HashSet
>>      map -> HashMap
>>               entries -> HashEntries[]
>>                       key -> Object
>>                            value -> Object
>>                       next -> HashEntry
>>
>> This picture show that HashSet contains a pointer to a HashMap, which
>> contains a pointer to an Array of HashEntries. Each HashEntry contains
>> a key pointer, a value pointer and a pointer to the next HashEntry.
>>
>> A HashSet uses the HashMap because the keys in a HashMap are unique. 
>> As
>> a result the value pointer in the HashEntry has really no role to 
>> play.
>> The way AbstractHashedMap is written in the apache collection means
>> that to simply follow the above pattern would waste one pointer for
>> every HashEntry, that is 4 bytes wasted for every entry.
>>
>> The solution is simple.
>>   1. I turned the HashEntry inner class into an inner interface of
>> AbstractHashedMap.
>>   2. I added a few methods to get/set the key and the value. These
>> methods should be inlined by all modern compilers, so I don't think
>> there is any efficiency problem here.
>>   3. I then wrote a HashedSet which has an inner class that extends
>> AbstractHashedMap, but which takes a modified HashEntry class where 
>> the
>> key and the value are the same object. The resulting
>> net.bblfish.util.HashedSet is thus more memory efficient than
>> java.util.HashSet.
>>
>> HashedSet
>>      map -> SpecialisedHashMap
>>               entries -> HashEntries[]
>>                       key -> Object
>>                            next -> HashEntry
>>
>>
>> Perhaps it would be worth adding these changes to the library.
>>
>> If the change to
>> org.apache.commons.collections.map.AbstractHashedMap.HashEntry is
>> accepted then it will make it relatively easy to write a 
>> WeakHashedSet.
>> It will be just a matter of extending WeakReference by implementing 
>> the
>> HashEntry methods.
>>
>> WeakHashedSet
>>      map -> SpeacialWeakHashMap
>>               entries -> WeakHashEntries[] extends WeakReference
>>                                            implements HashEntry
>>                       key -> Object
>>                            next -> HashEntry
>>
>>
>> Henry


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: An improvement to AbstractHashedMap, and a new HashedSet class

Posted by Stephen Colebourne <sc...@btopenworld.com>.
Unfortunately, as version 3 of [collections] has now been released with
HashEntry as a class, that cannot be changed to an interface.

I would however, support the addition of a similar AbstractHashedSet class
to [collections], preferably with a default HashedSet implementation.

Stephen


----- Original Message -----
From: "Henry Story" <he...@bblfish.net>
To: <co...@jakarta.apache.org>
Sent: Monday, February 09, 2004 9:50 PM
Subject: An improvement to AbstractHashedMap, and a new HashedSet class


> Hi,
>
> A little background to this contribution is in order. I was looking at
> an interesting framework for Value Objects written by Dirk Riehle,
> available under the lgpl at http://www.jvalue.org/, when I found a
> small bug in his code. It was obvious that what was needed was a
> WeakHashedSet - a HashSet that would wrap its contents in
> WeakReference's. Given the difficulty of extending java.util classes I
> opted to look at the apache collections package instead.
>
> I first decided to create a HashedSet class, an equivalent of
> java.net.HashSet. The best way to do this I realised, would be to
> extend the AbstractHashedMap. The JavaDoc for this class says:
>
>  > This class implements all the features necessary for a subclass
> hash-based map.
>  > Key-value entries are stored in instances of the HashEntry class,
> which can be
>  > overridden and replaced. The iterators can similarly be replaced,
> without the
>  > need to replace the KeySet, EntrySet and Values view classes.
>
> But in fact it is not quite as extensible as it could be because the
> HashEntry is currently a class, not an interface. Let me explain by
> taking the simpler example of rewriting the java.util.HashSet class.
>
> HashSet
>      map -> HashMap
>               entries -> HashEntries[]
>                       key -> Object
>                            value -> Object
>                       next -> HashEntry
>
> This picture show that HashSet contains a pointer to a HashMap, which
> contains a pointer to an Array of HashEntries. Each HashEntry contains
> a key pointer, a value pointer and a pointer to the next HashEntry.
>
> A HashSet uses the HashMap because the keys in a HashMap are unique. As
> a result the value pointer in the HashEntry has really no role to play.
> The way AbstractHashedMap is written in the apache collection means
> that to simply follow the above pattern would waste one pointer for
> every HashEntry, that is 4 bytes wasted for every entry.
>
> The solution is simple.
>   1. I turned the HashEntry inner class into an inner interface of
> AbstractHashedMap.
>   2. I added a few methods to get/set the key and the value. These
> methods should be inlined by all modern compilers, so I don't think
> there is any efficiency problem here.
>   3. I then wrote a HashedSet which has an inner class that extends
> AbstractHashedMap, but which takes a modified HashEntry class where the
> key and the value are the same object. The resulting
> net.bblfish.util.HashedSet is thus more memory efficient than
> java.util.HashSet.
>
> HashedSet
>      map -> SpecialisedHashMap
>               entries -> HashEntries[]
>                       key -> Object
>                            next -> HashEntry
>
>
> Perhaps it would be worth adding these changes to the library.
>
> If the change to
> org.apache.commons.collections.map.AbstractHashedMap.HashEntry is
> accepted then it will make it relatively easy to write a WeakHashedSet.
> It will be just a matter of extending WeakReference by implementing the
> HashEntry methods.
>
> WeakHashedSet
>      map -> SpeacialWeakHashMap
>               entries -> WeakHashEntries[] extends WeakReference
>                                            implements HashEntry
>                       key -> Object
>                            next -> HashEntry
>
>
> Henry
>
> I have placed the changes in a tar file at:
> http://bblfish.net/tmp/changed.tar
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org