You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Knut Ploder <kn...@at.ibm.com> on 2003/12/19 20:17:17 UTC
[PrimitiveCollections] implementations of e.g. org.apache.commons.collections.primitives.LongCollection
- I have one to offer
Are there any implementations of the primitive collections?
I mean some class that implements e.g.
org.apache.commons.collections.primitives.LongCollection (other than the
abstract org.apache.commons.collections.primitives.AbstractLongCollection).
I just cannot find one; I tried
* commons-primitives-1.0
* collections-2.1
but no luck.
I found a IntHashMap in [lang], which uses
The reason for asking is, I have written a nice class LongSet, which
happens to implement
org.apache.commons.collections.primitives.LongCollection (all I had to do
is add 'implements
org.apache.commons.collections.primitives.LongCollection' :-)
It is quite fast, and needs only little memory.
just in case my code is not a duplication (but, as I said in the prolog, I
couldn't find anything on jakarta) I would be happy to contribute my code.
let me explain my approach:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
I'm using some hashtable (nothing new).
but for the collisions-chains, I do NOT instantiate Entry-objects; rather,
I allocate 2 arrays of length capacity, one for the keys, one for the
next-pointer, like:
long[] keys = new long[capacity];
int[] next = new int[capacity];
this way I save quite a lot of memory (I have the memory-overhead of 2
array-objects, rather than the overhead for k objects of the form
class Entry {
long key;
Entry next;
}
which was my initial form of implementing the collision chain (well, I
started with java.util.LinkedList, which requires Long-objects rather than
long-primitives, requiring even more memory).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
the idea can easily be adapted to Maps (in fact I have done a
LongObjectMap).
and - of course - it can be used for any other primitive type (I chose long
because that's what i needed).
and now the actual code:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
package com.knuti.util.nativecollections;
//import java.util.Iterator;
import java.util.NoSuchElementException;
import org.apache.commons.collections.primitives.AbstractLongCollection;
import org.apache.commons.collections.primitives.LongIterator;
/**
* A Set with elements of type <code>long</code>.
*
* It uses a hashtable, but not a standard version from
<code>java.util</code>.
* My hashtable differs in the implementation for the collision chains:
* <ul>
* <li>I use an array of long, plus an array of int,
* <li>rather than an ordinary LinkedList.
* </ul>
* This saves lots of memory (overhead of many small objects is replaced by
overhead of 2 array-objects),
* and causes little work for GC.
*
* @author Knut Ploder
*
* @version 1.1 adapt to jakarta collection primitives (replacing my -
conceptionally identical - LongIterator)
*
*/
public class LongSet extends AbstractLongCollection {
/**
* The load above which rehashing occurs.
* <br>
* Here, load means the ratio <tt>buckets/capacity</tt>.
* In other words, the (maximum) average length of a collision-chain,
* or the average number of items in a bucket of the hashtable.
* <br>note that the loadfactor may exceed 1.0
*/
protected static final float DEFAULT_LOAD_FACTOR = 0.8f;
/**
* The default initial capacity for the hash table.
* <br>
* Here, capacity means the number of items the Set can hold.
* Rehashing occurs when the are more than capacity items in the Set,
* which is in contrast to many other implementaions,
* where rehashing occurs already at size
<tt>capacity*loadfactor</tt> items.
*/
protected static final int DEFAULT_INITIAL_CAPACITY = 11;
/**
* @see {link #DEFAULT_LOAD_FACTOR}
*/
protected float loadFactor;
/**
* The current number of items in the set.
* <br>Do not mix up with capacity.
*/
int _itemCtr;
/**
* the buckets of the hashtable.
* they hold an index into {@link #entries}, or -1 to indicate null
*/
private int[] buckets;
/**
* this is a replacement for the linked-lists
* @see #Entries
*/
private Entries entries;
/**
* Return the number of elements in the map.
*/
public final int size() {
return _itemCtr;
}
//
// constructors
//
/**
* Creates a new <code>LongSet</code> instance,
* with the default capacity
* and load factor.
*/
public LongSet() {
this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
/**
* Creates a new <code>LongSet</code> instance,
* with the default load factor,
* that can hold up to <tt>initialCapacity</tt> elements
* without triggering a rehash.
*
* @param initialCapacity an <code>int</code> value
*/
public LongSet(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
/**
* Creates a new <code>LongSet</code> instance,
* with load factor <tt>loadFactor</tt>,
* that can hold <tt>initialCapacity</tt> elements
* without triggering a rehash.
*
* @param initialCapacity an <code>int</code> value
* @param loadFactor a <code>float</code> value
*/
public LongSet(int initialCapacity, float loadFactor) {
super();
this.loadFactor = loadFactor;
setUp(initialCapacity);
}
//
// internal utility stuff
//
/**
* Initializes the Set so that it can hold <tt>desiredCapacity</tt>
elements.
* <p>
* the number of buckets is <tt>desiredCapacity / loadfactor</tt>.
* <p>for example,
* a desiredCapacity = 1000 means you want a Set that can hold (up
to) 1000 entries without resize()ing.
* if the loadfactor is 80%, you need 1000 / 0.8 = 1250 buckets.
*
* @param desiredCapacity an <code>int</code> value
*
*/
protected void setUp(int desiredCapacity) {
int numbuckets;
int numitems;
// the most senseful version. do what we are asked to do
numitems = desiredCapacity;
// compute number of buckets
numbuckets = Math.round( desiredCapacity / loadFactor);
// allocate buckets
buckets = new int[numbuckets];
for( int i=0; i<buckets.length; i++) {
buckets[i]=-1; // the indexing-counterpart for null
}
// pre-allocate memory for the maximum number of entries
entries = new Entries( numitems);
// initialise counter, how many elements there are in the map.
this._itemCtr = 0;
if(true)
System.out.println("" + getClass().getName() + ": alloc
desired for "
+ desiredCapacity + " items. "
+ "actually allocated "
+ buckets.length + " buckets / "
+ entries.entriesDotNext.length + " entries "
);
}
/**
* previously, this was called the hash-function.
* but it is not; it just computes the bucket-index.
* <p>
* the hash-function would be a method of the key; for keys of type
long,
* which we do have here, the hashfunction simply is the long-value
itself.
*/
protected int bucketindex(long key) {
int h = (int)(key % buckets.length);
if(h < 0) {
h = -h;
}
return h;
}
/**
* resize to a new size.
* the new size is taken as is, without checking for primeness or
whatever.
* in case of bad input, some java runtimeexception may occur,
* e.g. NegativeArraySizeException
*/
protected void resize( int newcapacity) {
// save old stuff
Entries oldentries = this.entries;
int[] oldbuckets = buckets;
System.out.println("resize: newcap = " + newcapacity);
// allocate for new size
setUp(newcapacity);
// now, re-fill
for( int b=0; b<oldbuckets.length; b++) {
for(int entry = oldbuckets[b]; entry != -1; entry =
oldentries.getNext(entry) ) {
//trace("check index " + entry + ", in bucket " +
h);
long xkey = oldentries.getKey(entry);
add( xkey);
}
}
}
/**
* @see
org.apache.commons.collections.primitives.LongCollection#contains(long)
*/
//
// public interface
//
/**
* Check if the key is in the Set.
*/
public boolean contains(long key) {
int h = bucketindex(key);
for (int entry = buckets[h]; entry != -1; entry =
entries.getNext(entry) ) {
if (entries.getKey(entry) == key) {
return true;
}
}
return false;
}
/**
* @see
org.apache.commons.collections.primitives.LongCollection#add(long)
*/
public boolean add(long key) {
// check if key is already in use
int h = bucketindex(key);
for (int entry = buckets[h]; entry != -1; entry =
entries.getNext(entry)) {
if (entries.getKey(entry) == key) { //if (entry.key
== key) {
return false; // not changed
}
}
// resize if we're full
if(this.entries.freelist == -1) {
//assert: number of items in map == arraysize in
entries-arrays
if( entries.entriesDotKey.length != _itemCtr) {
throw new IllegalStateException("assertion
violated: "
+ "number of items in map (" + _itemCtr
+ ")"
+ "!="
+ "arraysize in entries-arrays: (" +
entries.entriesDotKey.length + ")"
);
}
int oldcapacity = this._itemCtr;
int newcapacity = 2*oldcapacity;
resize( newcapacity);
// h depends on hashsize, so we must re-compute it!!!
h = bucketindex(key);
}
// add the new entry
this._itemCtr++;
int entry = entries.makeEntry(key);
entries.setNext( entry, buckets[h]);
buckets[h] = entry;
return true; // Set has changed
}
/**
* Remove the specified key from the Set.
*
* @see
org.apache.commons.collections.primitives.LongCollection#removeElement(long)
*/
public boolean removeElement(long key) {
int h = bucketindex(key);
int entry = buckets[h];
int prev = -1;
while(entry != -1) {
if(entries.getKey(entry) != key) {
// not the key we want
prev = entry;
entry = entries.getNext(entry);
} else {
// found key
if (prev == -1) {
//first in list
buckets[h]=entries.getNext(entry);
} else {
entries.setNext(prev,
entries.getNext(entry));
}
// put item in free-list
entries.delete(entry);
_itemCtr--;
return true; // set has changed
}
}
// key not found -> set not changed
return false;
}
/**
* provide an iterator over the set.
*/
public LongIterator iterator() {
LongIterator iter = new LocalIterator();
return iter;
}
//
// informal public interface
//
/**
* print some statistics
*/
public void dumpStats() {
int used=0;
final int MAX=25;
int maxdep=0;
int maxdepctr=0;
int[] stat = new int[MAX+1];
for( int ctr=0; ctr<buckets.length; ctr++) {
if(buckets[ctr] != -1) {
used++;
int len=0;
int next=buckets[ctr];
while(next!=-1) {
len++;
next=entries.getNext(next);
}
if(len>maxdep) {
maxdep=len;
maxdepctr=1;
} else if(len==maxdep) {
maxdepctr++;
}
if(len<MAX) {
stat[len]++;
} else {
stat[MAX]++;
}
}
}
System.out.println("number of used buckets " + used);
for(int i=0; i<MAX; i++) {
if(stat[i]!=0) System.out.println("stat[" + i + "]=" +
stat[i]);
}
if(stat[MAX]!=0) System.out.println("stat[>" + MAX + "]=" +
stat[MAX]);
System.out.println("maxdep=" + maxdep + " (" + maxdepctr + "
times)");
//System.out.println("used bucket ratio " +
((used*100.0)/capacity) + "%");
int capacity=entries.entriesDotNext.length;
System.out.println( "capacity " + capacity);
System.out.println( "number of buckets " + buckets.length);
System.out.println( "loadfactor (fact) " + ((capacity
*100.0)/buckets.length) + "%");
System.out.println( "loadfactor (should)" + ( loadFactor
*100.0) + "%");
System.out.println( "number of entries " + _itemCtr);
System.out.println( "actual loadfactor " + ((_itemCtr
*100.0)/capacity) + "%");
//System.out.println("ratio " + ((size *100.0)/capacity) +
"%");
System.out.println( "entries per bucket " + (_itemCtr * 1.0 /
buckets.length) + "");
}
/**
* print contents, plus some statistics
*/
public void dump() {
System.out.println(""
+ "contents of "
+ getClass().getName()
//+ this
);
//##
LongIterator iter = iterator();
while(iter.hasNext()) {
System.out.println( "item: " + iter.next() );
}
dumpStats();
}
//
//
//
/**
* use arrays of the appropriate type for key, and next.
* <br>for key, use whatever
* <br>for next, rather than using Entry (a ref to another Entry),
* we use int as index to the 'virtual' Entry.
* -1 is the pendant for a null-Entry.
* <br>
* we cannot rely on GC, so we must also
* <li>have a free-list
* <li>have a method delete() that must be manually invoked when an
Entry is not needed any more
* <br>since Entry objects are only internal, this 'ugliness' is not
visible outside,
* and the memory- and performance increase is worth the 'why don't
you use Entry-objects'-code.
*/
private class Entries {
private int freelist;
private long[] entriesDotKey;
private int[] entriesDotNext;
/**
* Constructor for Entries.
*/
Entries(int capacity) {
entriesDotKey = new long[capacity];
entriesDotNext = new int[capacity];
// init the free-list; it contains all items,
freelist=0;
for(int i=0; i<entriesDotNext.length; i++) {
entriesDotNext[i]=i+1;
}
entriesDotNext[entriesDotNext.length-1]=-1;
}
/**
* return a 'reference' to a new virtual Entry.
* <p>
* it is assumed that there is always some entry available.
* if not, resize() should have been invoked right before
invoking this method.
*/
int makeEntry(long key) {
if(freelist==-1) {
throw new IllegalStateException("free list is
empty");
}
int i=freelist;
freelist = entriesDotNext[i];
entriesDotNext[i] = -1;
entriesDotKey[i] = key;
return i; // remember, the index represents the
Entry-reference
}
/**
* mark an entry unused.
* <li>wipe out key and value
* <li>add to freelist
*/
void delete( int index) {
entriesDotKey[index]=0; // wipe out key-value
entriesDotNext[index]=freelist; // add as new head of
free-list
freelist=index;
}
// set- and get- methods for the virtual Entries
long getKey(int index) {
return entriesDotKey[index];
}
void setKey(int index, long key) {
entriesDotKey[index] = key;
}
int getNext(int index) {
return entriesDotNext[index];
}
void setNext(int index, int next) {
entriesDotNext[index] = next;
}
}
//
// iterator stuff
//
private class LocalIterator implements LongIterator {
//private LongMapNoentryclass map;
/**
* reference to the bucket holding the 'next' item.
* <br>
* It is the index into {@link #buckets}.
*/
private int next_bucketidx = 0;
//private int current_bucketidx = 0; we would need it for a
high-performance remove()
/**
* reference to the item that {@link #next()} would return.
* <br>
* It is the index into {@link super#entries}.
*/
private int next_entryidx = -1;
/**
* reference to the current item (the one that {@link #remove
()} would remove).
* <br>
* the difference to {@link #entryidx} is:
* <li>{@link #current_entryidx} is changed by invocations of
{@link #next()} only
* <li>{@link #next_entryidx} is changed by invocations of
{@link #next()} as well as {@link #hasNext()}
*/
private int current_entryidx = -1;
/**
* constructor. no need to pass the map, since we're an inner
class,
* and so we have access to our surrounding map anyway.
*
* the iterator is positioned 'before the first item'.
* this means:
* <li>must invoke hasNext() to handle the case of an empty map
* <li>must invoke next() at least once, before getKey() and
getValue() may be used.
*/
private LocalIterator( /* LongMapNoentryclass map */ ) {
//this.map = map;
advance();
current_entryidx = -1;
}
/**
* @see com.knuti.util.nativecollections.LongIterator#hasNext()
*/
public boolean hasNext() {
return next_entryidx != -1;
}
/**
* internal method, advance to next element.
* on return, {@link #next_entryidx} 'references' the
next-item.
* if there is no more, {@link #next_entryidx} will be -1
*/
protected void advance() {
// advance in bucket, if possible
if(next_entryidx!=-1) {
next_entryidx = entries.getNext(next_entryidx);
}
// find non-empty bucket (or, do nothing if bucket did
have next entry)
while( (next_entryidx == -1) && (next_bucketidx <
buckets.length) ) {
next_entryidx = buckets[next_bucketidx];
next_bucketidx++;
}
}
/**
* same as in <code>java.util.Iterator#next()</code>,
* except for the return type.
*
* @see com.knuti.util.nativecollections.MapEntryIterator#next
()
*
*/
public long next() {
if(!hasNext()) {
throw new NoSuchElementException();
}
current_entryidx = next_entryidx; // position the
'finger'
advance(); //
pre-compute 'next'
long res = entries.getKey( current_entryidx);
return res;
}
/**
* Remove the current element.
*
* @see
com.knuti.util.nativecollections.MapEntryIterator#remove()
*
* @throws UnsupportedOperationException
*
*/
public void remove() {
//if(true) throw new UnsupportedOperationException();
if (current_entryidx == -1) {
throw new IllegalStateException( "no current
element");
} else {
long key=entries.getKey( current_entryidx);
LongSet.this.removeElement( key);
current_entryidx = -1;
}
}
}
/**
* mini test, does nothing very useful
*/
public static void main(String[] args) {
// initial capacity 17,
// allow up to 3,14 entries per bucket before re-hashing.
// i.e. start with a hashtable of size 17/3.14 = 5
LongSet x = new LongSet(17,3.14f);
// test 1: store a lot of entries.
// key: remainder mod 997
for (int num = 12345; num < 123456; num+=31) {
long key=num%99997;
//long key=num%number;
//long key=num/number;
boolean changed=x.add( key);
if(changed) {
System.out.println( "new:" + key + "
<--------------------");
} else {
System.out.println( "old:" + key + " (remove
now)");
x.removeElement( key);
}
}
// show what we have
x.dump();
}
}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
knuti
(despite my email address, i am writing this as a private person)
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org
Re: [PrimitiveCollections] implementations of e.g.
org.apache.commons.collections.primitives.LongCollection - I have one to
offer
Posted by Rodney Waldhoff <rw...@apache.org>.
On Fri, 19 Dec 2003, Knut Ploder wrote:
>
> Are there any implementations of the primitive collections?
> I mean some class that implements e.g.
> org.apache.commons.collections.primitives.LongCollection (other than the
> abstract org.apache.commons.collections.primitives.AbstractLongCollection).
>
> I just cannot find one; I tried
> * commons-primitives-1.0
> * collections-2.1
> but no luck.
There are two concrete implementations of LongCollection in
commons-primitives-1.0 (also present but deprecated in the
commons-collections HEAD), namely ArrayLongList
(<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/ArrayLongList.html>)
and ArrayUnsignedIntList
(<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/ArrayUnsignedIntList.html>).
More generally, see Array<Type>List for a concrete implementation of
<Type>Collection.
(<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/package-summary.html>).
(Actually, there's other concrete instances like CollectionLongCollection
and ListLongList in the adapters package and UnmodifiableLongCollection
and UnmodifiableLongList in the decorators package, but those need to be
"seeded" with an instance of Collection/List or LongCollection/LongList,
respectively.)
> I found a IntHashMap in [lang], which uses
>
>
> The reason for asking is, I have written a nice class LongSet, which
> happens to implement
> org.apache.commons.collections.primitives.LongCollection (all I had to do
> is add 'implements
> org.apache.commons.collections.primitives.LongCollection' :-)
>
> It is quite fast, and needs only little memory.
> just in case my code is not a duplication (but, as I said in the prolog, I
> couldn't find anything on jakarta) I would be happy to contribute my code.
Currently there is no implementation of <Type>Set (or for that matter
<Type>Map) in commons-primitives, or even a declaration of the <Type>Set
or <Type>Map interface. I haven't had a chance to look at your
implemenation in detail, but from your overview it sounds quite
reasonable.
> let me explain my approach:
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
> I'm using some hashtable (nothing new).
> but for the collisions-chains, I do NOT instantiate Entry-objects; rather,
> I allocate 2 arrays of length capacity, one for the keys, one for the
> next-pointer, like:
>
> long[] keys = new long[capacity];
> int[] next = new int[capacity];
>
> this way I save quite a lot of memory (I have the memory-overhead of 2
> array-objects, rather than the overhead for k objects of the form
> class Entry {
> long key;
> Entry next;
> }
> which was my initial form of implementing the collision chain (well, I
> started with java.util.LinkedList, which requires Long-objects rather than
> long-primitives, requiring even more memory).
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
> the idea can easily be adapted to Maps (in fact I have done a
> LongObjectMap).
> and - of course - it can be used for any other primitive type (I chose long
> because that's what i needed).
>
> and now the actual code:
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
> package com.knuti.util.nativecollections;
>
> //import java.util.Iterator;
> import java.util.NoSuchElementException;
>
> import org.apache.commons.collections.primitives.AbstractLongCollection;
> import org.apache.commons.collections.primitives.LongIterator;
>
>
> /**
> * A Set with elements of type <code>long</code>.
> *
> * It uses a hashtable, but not a standard version from
> <code>java.util</code>.
> * My hashtable differs in the implementation for the collision chains:
> * <ul>
> * <li>I use an array of long, plus an array of int,
> * <li>rather than an ordinary LinkedList.
> * </ul>
> * This saves lots of memory (overhead of many small objects is replaced by
> overhead of 2 array-objects),
> * and causes little work for GC.
> *
> * @author Knut Ploder
> *
> * @version 1.1 adapt to jakarta collection primitives (replacing my -
> conceptionally identical - LongIterator)
> *
> */
> public class LongSet extends AbstractLongCollection {
>
>
> /**
> * The load above which rehashing occurs.
> * <br>
> * Here, load means the ratio <tt>buckets/capacity</tt>.
> * In other words, the (maximum) average length of a collision-chain,
> * or the average number of items in a bucket of the hashtable.
> * <br>note that the loadfactor may exceed 1.0
> */
> protected static final float DEFAULT_LOAD_FACTOR = 0.8f;
>
>
> /**
> * The default initial capacity for the hash table.
> * <br>
> * Here, capacity means the number of items the Set can hold.
> * Rehashing occurs when the are more than capacity items in the Set,
> * which is in contrast to many other implementaions,
> * where rehashing occurs already at size
> <tt>capacity*loadfactor</tt> items.
> */
> protected static final int DEFAULT_INITIAL_CAPACITY = 11;
>
>
> /**
> * @see {link #DEFAULT_LOAD_FACTOR}
> */
> protected float loadFactor;
>
>
> /**
> * The current number of items in the set.
> * <br>Do not mix up with capacity.
> */
> int _itemCtr;
>
>
> /**
> * the buckets of the hashtable.
> * they hold an index into {@link #entries}, or -1 to indicate null
> */
> private int[] buckets;
>
>
> /**
> * this is a replacement for the linked-lists
> * @see #Entries
> */
> private Entries entries;
>
>
> /**
> * Return the number of elements in the map.
> */
> public final int size() {
> return _itemCtr;
> }
>
>
>
> //
> // constructors
> //
>
>
>
> /**
> * Creates a new <code>LongSet</code> instance,
> * with the default capacity
> * and load factor.
> */
> public LongSet() {
> this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
> }
>
>
> /**
> * Creates a new <code>LongSet</code> instance,
> * with the default load factor,
> * that can hold up to <tt>initialCapacity</tt> elements
> * without triggering a rehash.
> *
> * @param initialCapacity an <code>int</code> value
> */
> public LongSet(int initialCapacity) {
> this(initialCapacity, DEFAULT_LOAD_FACTOR);
> }
>
>
> /**
> * Creates a new <code>LongSet</code> instance,
> * with load factor <tt>loadFactor</tt>,
> * that can hold <tt>initialCapacity</tt> elements
> * without triggering a rehash.
> *
> * @param initialCapacity an <code>int</code> value
> * @param loadFactor a <code>float</code> value
> */
> public LongSet(int initialCapacity, float loadFactor) {
> super();
> this.loadFactor = loadFactor;
> setUp(initialCapacity);
> }
>
>
>
> //
> // internal utility stuff
> //
>
>
>
> /**
> * Initializes the Set so that it can hold <tt>desiredCapacity</tt>
> elements.
> * <p>
> * the number of buckets is <tt>desiredCapacity / loadfactor</tt>.
> * <p>for example,
> * a desiredCapacity = 1000 means you want a Set that can hold (up
> to) 1000 entries without resize()ing.
> * if the loadfactor is 80%, you need 1000 / 0.8 = 1250 buckets.
> *
> * @param desiredCapacity an <code>int</code> value
> *
> */
> protected void setUp(int desiredCapacity) {
> int numbuckets;
> int numitems;
>
> // the most senseful version. do what we are asked to do
> numitems = desiredCapacity;
>
> // compute number of buckets
> numbuckets = Math.round( desiredCapacity / loadFactor);
>
> // allocate buckets
> buckets = new int[numbuckets];
> for( int i=0; i<buckets.length; i++) {
> buckets[i]=-1; // the indexing-counterpart for null
> }
>
> // pre-allocate memory for the maximum number of entries
> entries = new Entries( numitems);
>
> // initialise counter, how many elements there are in the map.
> this._itemCtr = 0;
>
> if(true)
> System.out.println("" + getClass().getName() + ": alloc
> desired for "
> + desiredCapacity + " items. "
> + "actually allocated "
> + buckets.length + " buckets / "
> + entries.entriesDotNext.length + " entries "
> );
> }
>
>
> /**
> * previously, this was called the hash-function.
> * but it is not; it just computes the bucket-index.
> * <p>
> * the hash-function would be a method of the key; for keys of type
> long,
> * which we do have here, the hashfunction simply is the long-value
> itself.
> */
> protected int bucketindex(long key) {
> int h = (int)(key % buckets.length);
> if(h < 0) {
> h = -h;
> }
> return h;
> }
>
>
> /**
> * resize to a new size.
> * the new size is taken as is, without checking for primeness or
> whatever.
> * in case of bad input, some java runtimeexception may occur,
> * e.g. NegativeArraySizeException
> */
> protected void resize( int newcapacity) {
> // save old stuff
> Entries oldentries = this.entries;
> int[] oldbuckets = buckets;
>
> System.out.println("resize: newcap = " + newcapacity);
>
> // allocate for new size
> setUp(newcapacity);
>
>
> // now, re-fill
> for( int b=0; b<oldbuckets.length; b++) {
> for(int entry = oldbuckets[b]; entry != -1; entry =
> oldentries.getNext(entry) ) {
> //trace("check index " + entry + ", in bucket " +
> h);
> long xkey = oldentries.getKey(entry);
> add( xkey);
> }
> }
> }
>
>
>
> /**
> * @see
> org.apache.commons.collections.primitives.LongCollection#contains(long)
> */
> //
> // public interface
> //
>
>
>
> /**
> * Check if the key is in the Set.
> */
> public boolean contains(long key) {
> int h = bucketindex(key);
> for (int entry = buckets[h]; entry != -1; entry =
> entries.getNext(entry) ) {
> if (entries.getKey(entry) == key) {
> return true;
> }
> }
> return false;
> }
>
>
> /**
> * @see
> org.apache.commons.collections.primitives.LongCollection#add(long)
> */
> public boolean add(long key) {
>
> // check if key is already in use
> int h = bucketindex(key);
> for (int entry = buckets[h]; entry != -1; entry =
> entries.getNext(entry)) {
> if (entries.getKey(entry) == key) { //if (entry.key
> == key) {
> return false; // not changed
> }
> }
>
> // resize if we're full
> if(this.entries.freelist == -1) {
>
> //assert: number of items in map == arraysize in
> entries-arrays
> if( entries.entriesDotKey.length != _itemCtr) {
> throw new IllegalStateException("assertion
> violated: "
> + "number of items in map (" + _itemCtr
> + ")"
> + "!="
> + "arraysize in entries-arrays: (" +
> entries.entriesDotKey.length + ")"
> );
> }
>
> int oldcapacity = this._itemCtr;
> int newcapacity = 2*oldcapacity;
> resize( newcapacity);
> // h depends on hashsize, so we must re-compute it!!!
> h = bucketindex(key);
> }
>
> // add the new entry
> this._itemCtr++;
> int entry = entries.makeEntry(key);
> entries.setNext( entry, buckets[h]);
> buckets[h] = entry;
>
> return true; // Set has changed
> }
>
>
> /**
> * Remove the specified key from the Set.
> *
> * @see
> org.apache.commons.collections.primitives.LongCollection#removeElement(long)
> */
> public boolean removeElement(long key) {
> int h = bucketindex(key);
>
> int entry = buckets[h];
> int prev = -1;
> while(entry != -1) {
> if(entries.getKey(entry) != key) {
> // not the key we want
> prev = entry;
> entry = entries.getNext(entry);
> } else {
> // found key
> if (prev == -1) {
> //first in list
> buckets[h]=entries.getNext(entry);
> } else {
> entries.setNext(prev,
> entries.getNext(entry));
> }
> // put item in free-list
> entries.delete(entry);
>
> _itemCtr--;
>
> return true; // set has changed
> }
> }
>
> // key not found -> set not changed
> return false;
>
> }
>
>
> /**
> * provide an iterator over the set.
> */
> public LongIterator iterator() {
> LongIterator iter = new LocalIterator();
> return iter;
> }
>
>
> //
> // informal public interface
> //
>
>
> /**
> * print some statistics
> */
> public void dumpStats() {
> int used=0;
> final int MAX=25;
> int maxdep=0;
> int maxdepctr=0;
> int[] stat = new int[MAX+1];
> for( int ctr=0; ctr<buckets.length; ctr++) {
> if(buckets[ctr] != -1) {
> used++;
> int len=0;
> int next=buckets[ctr];
> while(next!=-1) {
> len++;
> next=entries.getNext(next);
> }
> if(len>maxdep) {
> maxdep=len;
> maxdepctr=1;
> } else if(len==maxdep) {
> maxdepctr++;
> }
> if(len<MAX) {
> stat[len]++;
> } else {
> stat[MAX]++;
> }
> }
> }
> System.out.println("number of used buckets " + used);
> for(int i=0; i<MAX; i++) {
> if(stat[i]!=0) System.out.println("stat[" + i + "]=" +
> stat[i]);
> }
> if(stat[MAX]!=0) System.out.println("stat[>" + MAX + "]=" +
> stat[MAX]);
> System.out.println("maxdep=" + maxdep + " (" + maxdepctr + "
> times)");
>
> //System.out.println("used bucket ratio " +
> ((used*100.0)/capacity) + "%");
>
> int capacity=entries.entriesDotNext.length;
> System.out.println( "capacity " + capacity);
> System.out.println( "number of buckets " + buckets.length);
> System.out.println( "loadfactor (fact) " + ((capacity
> *100.0)/buckets.length) + "%");
> System.out.println( "loadfactor (should)" + ( loadFactor
> *100.0) + "%");
> System.out.println( "number of entries " + _itemCtr);
> System.out.println( "actual loadfactor " + ((_itemCtr
> *100.0)/capacity) + "%");
> //System.out.println("ratio " + ((size *100.0)/capacity) +
> "%");
> System.out.println( "entries per bucket " + (_itemCtr * 1.0 /
> buckets.length) + "");
> }
>
>
> /**
> * print contents, plus some statistics
> */
> public void dump() {
> System.out.println(""
> + "contents of "
> + getClass().getName()
> //+ this
> );
> //##
> LongIterator iter = iterator();
> while(iter.hasNext()) {
> System.out.println( "item: " + iter.next() );
> }
> dumpStats();
> }
>
>
> //
> //
> //
>
>
> /**
> * use arrays of the appropriate type for key, and next.
> * <br>for key, use whatever
> * <br>for next, rather than using Entry (a ref to another Entry),
> * we use int as index to the 'virtual' Entry.
> * -1 is the pendant for a null-Entry.
> * <br>
> * we cannot rely on GC, so we must also
> * <li>have a free-list
> * <li>have a method delete() that must be manually invoked when an
> Entry is not needed any more
> * <br>since Entry objects are only internal, this 'ugliness' is not
> visible outside,
> * and the memory- and performance increase is worth the 'why don't
> you use Entry-objects'-code.
> */
> private class Entries {
>
> private int freelist;
>
> private long[] entriesDotKey;
> private int[] entriesDotNext;
>
>
> /**
> * Constructor for Entries.
> */
> Entries(int capacity) {
> entriesDotKey = new long[capacity];
> entriesDotNext = new int[capacity];
>
> // init the free-list; it contains all items,
> freelist=0;
> for(int i=0; i<entriesDotNext.length; i++) {
> entriesDotNext[i]=i+1;
> }
> entriesDotNext[entriesDotNext.length-1]=-1;
> }
>
>
> /**
> * return a 'reference' to a new virtual Entry.
> * <p>
> * it is assumed that there is always some entry available.
> * if not, resize() should have been invoked right before
> invoking this method.
> */
> int makeEntry(long key) {
> if(freelist==-1) {
> throw new IllegalStateException("free list is
> empty");
> }
> int i=freelist;
> freelist = entriesDotNext[i];
> entriesDotNext[i] = -1;
> entriesDotKey[i] = key;
>
> return i; // remember, the index represents the
> Entry-reference
> }
>
>
> /**
> * mark an entry unused.
> * <li>wipe out key and value
> * <li>add to freelist
> */
> void delete( int index) {
> entriesDotKey[index]=0; // wipe out key-value
> entriesDotNext[index]=freelist; // add as new head of
> free-list
> freelist=index;
> }
>
> // set- and get- methods for the virtual Entries
>
> long getKey(int index) {
> return entriesDotKey[index];
> }
> void setKey(int index, long key) {
> entriesDotKey[index] = key;
> }
>
> int getNext(int index) {
> return entriesDotNext[index];
> }
> void setNext(int index, int next) {
> entriesDotNext[index] = next;
> }
>
> }
>
>
>
> //
> // iterator stuff
> //
>
>
> private class LocalIterator implements LongIterator {
>
>
> //private LongMapNoentryclass map;
>
>
> /**
> * reference to the bucket holding the 'next' item.
> * <br>
> * It is the index into {@link #buckets}.
> */
> private int next_bucketidx = 0;
>
> //private int current_bucketidx = 0; we would need it for a
> high-performance remove()
>
>
> /**
> * reference to the item that {@link #next()} would return.
> * <br>
> * It is the index into {@link super#entries}.
> */
> private int next_entryidx = -1;
>
>
> /**
> * reference to the current item (the one that {@link #remove
> ()} would remove).
> * <br>
> * the difference to {@link #entryidx} is:
> * <li>{@link #current_entryidx} is changed by invocations of
> {@link #next()} only
> * <li>{@link #next_entryidx} is changed by invocations of
> {@link #next()} as well as {@link #hasNext()}
> */
> private int current_entryidx = -1;
>
>
> /**
> * constructor. no need to pass the map, since we're an inner
> class,
> * and so we have access to our surrounding map anyway.
> *
> * the iterator is positioned 'before the first item'.
> * this means:
> * <li>must invoke hasNext() to handle the case of an empty map
> * <li>must invoke next() at least once, before getKey() and
> getValue() may be used.
> */
> private LocalIterator( /* LongMapNoentryclass map */ ) {
> //this.map = map;
> advance();
> current_entryidx = -1;
> }
>
>
> /**
> * @see com.knuti.util.nativecollections.LongIterator#hasNext()
> */
> public boolean hasNext() {
> return next_entryidx != -1;
> }
>
>
> /**
> * internal method, advance to next element.
> * on return, {@link #next_entryidx} 'references' the
> next-item.
> * if there is no more, {@link #next_entryidx} will be -1
> */
> protected void advance() {
> // advance in bucket, if possible
> if(next_entryidx!=-1) {
> next_entryidx = entries.getNext(next_entryidx);
> }
> // find non-empty bucket (or, do nothing if bucket did
> have next entry)
> while( (next_entryidx == -1) && (next_bucketidx <
> buckets.length) ) {
> next_entryidx = buckets[next_bucketidx];
> next_bucketidx++;
> }
> }
>
>
> /**
> * same as in <code>java.util.Iterator#next()</code>,
> * except for the return type.
> *
> * @see com.knuti.util.nativecollections.MapEntryIterator#next
> ()
> *
> */
> public long next() {
> if(!hasNext()) {
> throw new NoSuchElementException();
> }
> current_entryidx = next_entryidx; // position the
> 'finger'
> advance(); //
> pre-compute 'next'
> long res = entries.getKey( current_entryidx);
> return res;
> }
>
>
> /**
> * Remove the current element.
> *
> * @see
> com.knuti.util.nativecollections.MapEntryIterator#remove()
> *
> * @throws UnsupportedOperationException
> *
> */
> public void remove() {
> //if(true) throw new UnsupportedOperationException();
> if (current_entryidx == -1) {
> throw new IllegalStateException( "no current
> element");
> } else {
> long key=entries.getKey( current_entryidx);
> LongSet.this.removeElement( key);
> current_entryidx = -1;
> }
> }
>
> }
>
>
> /**
> * mini test, does nothing very useful
> */
> public static void main(String[] args) {
>
> // initial capacity 17,
> // allow up to 3,14 entries per bucket before re-hashing.
> // i.e. start with a hashtable of size 17/3.14 = 5
> LongSet x = new LongSet(17,3.14f);
>
> // test 1: store a lot of entries.
> // key: remainder mod 997
> for (int num = 12345; num < 123456; num+=31) {
> long key=num%99997;
> //long key=num%number;
> //long key=num/number;
> boolean changed=x.add( key);
> if(changed) {
> System.out.println( "new:" + key + "
> <--------------------");
> } else {
> System.out.println( "old:" + key + " (remove
> now)");
> x.removeElement( key);
> }
> }
> // show what we have
> x.dump();
> }
> }
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
>
>
> knuti
> (despite my email address, i am writing this as a private person)
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
--
- Rod <http://radio.weblogs.com/0122027/>
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org