You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Knut Ploder <kn...@at.ibm.com> on 2003/12/19 20:17:17 UTC

[PrimitiveCollections] implementations of e.g. org.apache.commons.collections.primitives.LongCollection - I have one to offer

Are there any implementations of the primitive collections?
I mean some class that implements e.g.
org.apache.commons.collections.primitives.LongCollection (other than the
abstract org.apache.commons.collections.primitives.AbstractLongCollection).

I just cannot find one; I tried
* commons-primitives-1.0
* collections-2.1
but no luck.
I found a IntHashMap in [lang], which uses


The reason for asking is, I have written a nice class LongSet, which
happens to implement
org.apache.commons.collections.primitives.LongCollection (all I had to do
is add   'implements
org.apache.commons.collections.primitives.LongCollection' :-)

It is quite fast, and needs only little memory.
just in case my code is not a duplication (but, as I said in the prolog, I
couldn't find anything on jakarta) I would be happy to contribute my code.

let me explain my approach:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
I'm using some hashtable (nothing new).
but for the collisions-chains, I do NOT instantiate Entry-objects; rather,
I allocate 2 arrays of length capacity, one for the keys, one for the
next-pointer, like:

 long[] keys = new long[capacity];
 int[] next = new int[capacity];

this way I save quite a lot of memory (I have the memory-overhead of 2
array-objects, rather than the overhead for k objects of the form
class Entry {
 long key;
 Entry next;
}
which was my initial form of implementing the collision chain (well, I
started with java.util.LinkedList, which requires Long-objects rather than
long-primitives, requiring even more memory).


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
the idea can easily be adapted to Maps (in fact I have done a
LongObjectMap).
and - of course - it can be used for any other primitive type (I chose long
because that's what i needed).

and now the actual code:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -
package com.knuti.util.nativecollections;

//import java.util.Iterator;
import java.util.NoSuchElementException;

import org.apache.commons.collections.primitives.AbstractLongCollection;
import org.apache.commons.collections.primitives.LongIterator;


/**
 * A Set with elements of type <code>long</code>.
 *
 * It uses a hashtable, but not a standard version from
<code>java.util</code>.
 * My hashtable differs in the implementation for the collision chains:
 * <ul>
 * <li>I use an array of long, plus an array of int,
 * <li>rather than an ordinary LinkedList.
 * </ul>
 * This saves lots of memory (overhead of many small objects is replaced by
overhead of 2 array-objects),
 * and causes little work for GC.
 *
 * @author Knut Ploder
 *
 * @version 1.1 adapt to jakarta collection primitives (replacing my -
conceptionally identical - LongIterator)
 *
 */
public class LongSet extends AbstractLongCollection {


      /**
       * The load above which rehashing occurs.
       * <br>
       * Here, load means the ratio <tt>buckets/capacity</tt>.
       * In other words, the (maximum) average length of a collision-chain,
       * or the average number of items in a bucket of the hashtable.
       * <br>note that the loadfactor may exceed 1.0
       */
      protected static final float DEFAULT_LOAD_FACTOR = 0.8f;


      /**
       * The default initial capacity for the hash table.
       * <br>
       * Here, capacity means the number of items the Set can hold.
       * Rehashing occurs when the are more than capacity items in the Set,
       * which is in contrast to many other implementaions,
       * where rehashing occurs already at size
<tt>capacity*loadfactor</tt> items.
       */
      protected static final int DEFAULT_INITIAL_CAPACITY = 11;


      /**
       * @see {link #DEFAULT_LOAD_FACTOR}
       */
      protected float loadFactor;


      /**
       * The current number of items in the set.
       * <br>Do not mix up with capacity.
       */
      int _itemCtr;


      /**
       * the buckets of the hashtable.
       * they hold an index into {@link #entries}, or -1 to indicate null
       */
      private int[] buckets;


      /**
       * this is a replacement for the linked-lists
       * @see #Entries
       */
      private Entries entries;


      /**
       * Return the number of elements in the map.
       */
      public final int size() {
            return _itemCtr;
      }



      //
      // constructors
      //



      /**
       * Creates a new <code>LongSet</code> instance,
       * with the default capacity
       * and load factor.
       */
      public LongSet() {
          this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
      }


      /**
       * Creates a new <code>LongSet</code> instance,
       * with the default load factor,
       * that can hold up to <tt>initialCapacity</tt> elements
       * without triggering a rehash.
       *
       * @param initialCapacity an <code>int</code> value
       */
      public LongSet(int initialCapacity) {
          this(initialCapacity, DEFAULT_LOAD_FACTOR);
      }


      /**
       * Creates a new <code>LongSet</code> instance,
       * with load factor <tt>loadFactor</tt>,
       * that can hold <tt>initialCapacity</tt> elements
       * without triggering a rehash.
       *
       * @param initialCapacity an <code>int</code> value
       * @param loadFactor a <code>float</code> value
       */
      public LongSet(int initialCapacity, float loadFactor) {
            super();
            this.loadFactor = loadFactor;
            setUp(initialCapacity);
      }



      //
      //  internal utility stuff
      //



      /**
       * Initializes the Set so that it can hold <tt>desiredCapacity</tt>
elements.
       * <p>
       * the number of buckets is <tt>desiredCapacity / loadfactor</tt>.
       * <p>for example,
       * a desiredCapacity = 1000 means you want a Set that can hold (up
to) 1000 entries without resize()ing.
       * if the loadfactor is 80%, you need 1000 / 0.8 = 1250 buckets.
       *
       * @param desiredCapacity an <code>int</code> value
       *
       */
      protected void setUp(int desiredCapacity) {
            int numbuckets;
            int numitems;

            // the most senseful version. do what we are asked to do
            numitems = desiredCapacity;

            // compute number of buckets
            numbuckets = Math.round( desiredCapacity / loadFactor);

            // allocate buckets
            buckets = new int[numbuckets];
            for( int i=0; i<buckets.length; i++) {
                  buckets[i]=-1; // the indexing-counterpart for null
            }

            // pre-allocate memory for the maximum number of entries
            entries = new Entries( numitems);

            // initialise counter, how many elements there are in the map.
            this._itemCtr = 0;

            if(true)
                  System.out.println("" + getClass().getName() + ": alloc
desired for "
                        + desiredCapacity + " items. "
                        + "actually allocated "
                        + buckets.length + " buckets / "
                        + entries.entriesDotNext.length + " entries "
                        );
      }


      /**
       * previously, this was called the hash-function.
       * but it is not; it just computes the bucket-index.
       * <p>
       * the hash-function would be a method of the key; for keys of type
long,
       * which we do have here, the hashfunction simply is the long-value
itself.
       */
      protected int bucketindex(long key) {
            int h = (int)(key % buckets.length);
            if(h < 0) {
                  h = -h;
            }
            return h;
      }


      /**
       * resize to a new size.
       * the new size is taken as is, without checking for primeness or
whatever.
       * in case of bad input, some java runtimeexception may occur,
       * e.g. NegativeArraySizeException
       */
      protected void resize( int newcapacity) {
            // save old stuff
            Entries oldentries = this.entries;
            int[] oldbuckets = buckets;

            System.out.println("resize: newcap = " + newcapacity);

            // allocate for new size
            setUp(newcapacity);


            // now, re-fill
            for( int b=0; b<oldbuckets.length; b++) {
                  for(int entry = oldbuckets[b]; entry != -1; entry =
oldentries.getNext(entry) ) {
                        //trace("check index " + entry + ", in bucket " +
h);
                        long xkey = oldentries.getKey(entry);
                        add( xkey);
                  }
            }
      }



      /**
       * @see
org.apache.commons.collections.primitives.LongCollection#contains(long)
       */
      //
      //  public interface
      //



      /**
       * Check if the key is in the Set.
       */
      public boolean contains(long key) {
            int h = bucketindex(key);
            for (int entry = buckets[h]; entry != -1; entry =
entries.getNext(entry) ) {
                  if (entries.getKey(entry) == key) {
                        return true;
                  }
            }
            return false;
      }


      /**
       * @see
org.apache.commons.collections.primitives.LongCollection#add(long)
       */
      public boolean add(long key) {

            // check if key is already in use
            int h = bucketindex(key);
            for (int entry = buckets[h]; entry != -1; entry =
entries.getNext(entry)) {
                  if (entries.getKey(entry) == key) {       //if (entry.key
== key) {
                        return false;     // not changed
                  }
            }

            // resize if we're full
            if(this.entries.freelist == -1) {

                  //assert: number of items in map == arraysize in
entries-arrays
                  if( entries.entriesDotKey.length != _itemCtr) {
                        throw new IllegalStateException("assertion
violated: "
                                    + "number of items in map (" + _itemCtr
+ ")"
                                    + "!="
                                    + "arraysize in entries-arrays: (" +
entries.entriesDotKey.length  + ")"
                                    );
                  }

                  int oldcapacity = this._itemCtr;
                  int newcapacity = 2*oldcapacity;
                  resize( newcapacity);
                  // h depends on hashsize, so we must re-compute it!!!
                  h = bucketindex(key);
            }

            // add the new entry
            this._itemCtr++;
            int entry = entries.makeEntry(key);
            entries.setNext( entry, buckets[h]);
            buckets[h] = entry;

            return true;      // Set has changed
      }


      /**
       * Remove the specified key from the Set.
       *
       * @see
org.apache.commons.collections.primitives.LongCollection#removeElement(long)
       */
      public boolean removeElement(long key) {
            int h = bucketindex(key);

            int entry = buckets[h];
            int prev = -1;
            while(entry != -1) {
                  if(entries.getKey(entry) != key) {
                        // not the key we want
                        prev = entry;
                        entry = entries.getNext(entry);
                  } else {
                        // found key
                        if (prev == -1) {
                              //first in list
                              buckets[h]=entries.getNext(entry);
                        } else {
                              entries.setNext(prev,
entries.getNext(entry));
                        }
                        // put item in free-list
                        entries.delete(entry);

                        _itemCtr--;

                        return true;      // set has changed
                  }
            }

            // key not found -> set not changed
            return false;

      }


      /**
       * provide an iterator over the set.
       */
      public LongIterator iterator() {
            LongIterator iter = new LocalIterator();
            return iter;
      }


      //
      //  informal public interface
      //


      /**
       * print some statistics
       */
      public void dumpStats() {
            int used=0;
            final int MAX=25;
            int maxdep=0;
            int maxdepctr=0;
            int[] stat = new int[MAX+1];
            for( int ctr=0; ctr<buckets.length; ctr++) {
                  if(buckets[ctr] != -1) {
                        used++;
                        int len=0;
                        int next=buckets[ctr];
                        while(next!=-1) {
                              len++;
                              next=entries.getNext(next);
                        }
                        if(len>maxdep) {
                              maxdep=len;
                              maxdepctr=1;
                        } else if(len==maxdep) {
                              maxdepctr++;
                        }
                        if(len<MAX) {
                              stat[len]++;
                        } else {
                              stat[MAX]++;
                        }
                  }
            }
            System.out.println("number of used buckets " + used);
            for(int i=0; i<MAX; i++) {
                  if(stat[i]!=0) System.out.println("stat[" + i + "]=" +
stat[i]);
            }
            if(stat[MAX]!=0) System.out.println("stat[>" + MAX + "]=" +
stat[MAX]);
            System.out.println("maxdep=" + maxdep + " (" + maxdepctr + "
times)");

            //System.out.println("used bucket ratio " +
((used*100.0)/capacity) + "%");

            int capacity=entries.entriesDotNext.length;
            System.out.println( "capacity           " + capacity);
            System.out.println( "number of buckets  " + buckets.length);
            System.out.println( "loadfactor (fact)  " + ((capacity
*100.0)/buckets.length) + "%");
            System.out.println( "loadfactor (should)" + ( loadFactor
*100.0) + "%");
            System.out.println( "number of entries  " + _itemCtr);
            System.out.println( "actual loadfactor  " + ((_itemCtr
*100.0)/capacity) + "%");
            //System.out.println("ratio " + ((size *100.0)/capacity) +
"%");
            System.out.println( "entries per bucket " + (_itemCtr * 1.0 /
buckets.length) + "");
      }


      /**
       * print contents, plus some statistics
       */
      public void dump() {
            System.out.println(""
                        + "contents of "
                        + getClass().getName()
                        //+ this
                        );
            //##
            LongIterator iter = iterator();
            while(iter.hasNext()) {
                  System.out.println( "item: " + iter.next() );
            }
            dumpStats();
      }


      //
      //
      //


      /**
       * use arrays of the appropriate type for key, and next.
       * <br>for key, use whatever
       * <br>for next, rather than using Entry (a ref to another Entry),
       * we use int as index to the 'virtual' Entry.
       * -1 is the pendant for a null-Entry.
       * <br>
       * we cannot rely on GC, so we must also
       * <li>have a free-list
       * <li>have a method delete() that must be manually invoked when an
Entry is not needed any more
       * <br>since Entry objects are only internal, this 'ugliness' is not
visible outside,
       * and the memory- and performance increase is worth the 'why don't
you use Entry-objects'-code.
       */
      private class Entries {

            private int freelist;

            private long[] entriesDotKey;
            private int[] entriesDotNext;


            /**
             * Constructor for Entries.
             */
            Entries(int capacity) {
                  entriesDotKey = new long[capacity];
                  entriesDotNext = new int[capacity];

                  // init the free-list; it contains all items,
                  freelist=0;
                  for(int i=0; i<entriesDotNext.length; i++) {
                        entriesDotNext[i]=i+1;
                  }
                  entriesDotNext[entriesDotNext.length-1]=-1;
            }


            /**
             * return a 'reference' to a new virtual Entry.
             * <p>
             * it is assumed that there is always some entry available.
             * if not, resize() should have been invoked right before
invoking this method.
             */
            int makeEntry(long key) {
                  if(freelist==-1) {
                        throw new IllegalStateException("free list is
empty");
                  }
                  int i=freelist;
                  freelist = entriesDotNext[i];
                  entriesDotNext[i] = -1;
                  entriesDotKey[i] = key;

                  return i;   // remember, the index represents the
Entry-reference
            }


            /**
             * mark an entry unused.
             * <li>wipe out key and value
             * <li>add to freelist
             */
            void delete( int index) {
                  entriesDotKey[index]=0;       // wipe out key-value
                  entriesDotNext[index]=freelist;     // add as new head of
free-list
                  freelist=index;
            }

            // set- and get- methods for the virtual Entries

            long getKey(int index) {
                  return entriesDotKey[index];
            }
            void setKey(int index, long key) {
                  entriesDotKey[index] = key;
            }

            int getNext(int index) {
                  return entriesDotNext[index];
            }
            void setNext(int index, int next) {
                  entriesDotNext[index] = next;
            }

      }



      //
      // iterator stuff
      //


      private class LocalIterator implements LongIterator {


            //private LongMapNoentryclass map;


            /**
             * reference to the bucket holding the 'next' item.
             * <br>
             * It is the index into {@link #buckets}.
             */
            private int next_bucketidx = 0;

            //private int current_bucketidx = 0; we would need it for a
high-performance remove()


            /**
             * reference to the item that {@link #next()} would return.
             * <br>
             * It is the index into {@link super#entries}.
             */
            private int next_entryidx = -1;


            /**
             * reference to the current item (the one that {@link #remove
()} would remove).
             * <br>
             * the difference to {@link #entryidx} is:
             * <li>{@link #current_entryidx} is changed by invocations of
{@link #next()} only
             * <li>{@link #next_entryidx} is changed by invocations of
{@link #next()} as well as {@link #hasNext()}
             */
            private int current_entryidx = -1;


            /**
             * constructor. no need to pass the map, since we're an inner
class,
             * and so we have access to our surrounding map anyway.
             *
             * the iterator is positioned 'before the first item'.
             * this means:
             * <li>must invoke hasNext() to handle the case of an empty map
             * <li>must invoke next() at least once, before getKey() and
getValue() may be used.
             */
            private LocalIterator( /* LongMapNoentryclass map */ ) {
                  //this.map = map;
                  advance();
                  current_entryidx = -1;
            }


            /**
             * @see com.knuti.util.nativecollections.LongIterator#hasNext()
             */
            public boolean hasNext() {
                  return next_entryidx != -1;
            }


            /**
             * internal method, advance to next element.
             * on return, {@link #next_entryidx} 'references' the
next-item.
             * if there is no more, {@link #next_entryidx} will be -1
             */
            protected void advance() {
                  // advance in bucket, if possible
                  if(next_entryidx!=-1) {
                        next_entryidx = entries.getNext(next_entryidx);
                  }
                  // find non-empty bucket (or, do nothing if bucket did
have next entry)
                  while( (next_entryidx == -1) && (next_bucketidx <
buckets.length) ) {
                        next_entryidx = buckets[next_bucketidx];
                        next_bucketidx++;
                  }
            }


            /**
             * same as in <code>java.util.Iterator#next()</code>,
             * except for the return type.
             *
             * @see com.knuti.util.nativecollections.MapEntryIterator#next
()
             *
             */
            public long next() {
                  if(!hasNext()) {
                        throw new NoSuchElementException();
                  }
                  current_entryidx = next_entryidx;   // position the
'finger'
                  advance();                                      //
pre-compute 'next'
                  long res = entries.getKey( current_entryidx);
                  return res;
            }


            /**
             * Remove the current element.
             *
             * @see
com.knuti.util.nativecollections.MapEntryIterator#remove()
             *
             * @throws UnsupportedOperationException
             *
             */
            public void remove() {
                  //if(true) throw new UnsupportedOperationException();
                  if (current_entryidx == -1) {
                        throw new IllegalStateException( "no current
element");
                  } else {
                        long key=entries.getKey( current_entryidx);
                        LongSet.this.removeElement( key);
                        current_entryidx = -1;
                  }
            }

      }


      /**
       * mini test, does nothing very useful
       */
      public static void main(String[] args) {

            // initial capacity 17,
            // allow up to 3,14 entries per bucket before re-hashing.
            // i.e. start with a hashtable of size 17/3.14 = 5
            LongSet x = new LongSet(17,3.14f);

            // test 1: store a lot of entries.
            // key: remainder mod 997
            for (int num = 12345; num < 123456; num+=31) {
                  long key=num%99997;
                  //long key=num%number;
                  //long key=num/number;
                  boolean changed=x.add( key);
                  if(changed) {
                        System.out.println( "new:" + key + "
<--------------------");
                  } else {
                        System.out.println( "old:" + key + " (remove
now)");
                        x.removeElement( key);
                  }
            }
            // show what we have
            x.dump();
      }
}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - -


knuti
(despite my email address, i am writing this as a private person)




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [PrimitiveCollections] implementations of e.g. org.apache.commons.collections.primitives.LongCollection - I have one to offer

Posted by Rodney Waldhoff <rw...@apache.org>.
On Fri, 19 Dec 2003, Knut Ploder wrote:

>
> Are there any implementations of the primitive collections?
> I mean some class that implements e.g.
> org.apache.commons.collections.primitives.LongCollection (other than the
> abstract org.apache.commons.collections.primitives.AbstractLongCollection).
>
> I just cannot find one; I tried
> * commons-primitives-1.0
> * collections-2.1
> but no luck.

There are two concrete implementations of LongCollection in
commons-primitives-1.0 (also present but deprecated in the
commons-collections HEAD), namely ArrayLongList
(<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/ArrayLongList.html>)
and ArrayUnsignedIntList
(<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/ArrayUnsignedIntList.html>).

More generally, see Array<Type>List for a concrete implementation of
<Type>Collection.
(<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/package-summary.html>).

(Actually, there's other concrete instances like CollectionLongCollection
and ListLongList in the adapters package and UnmodifiableLongCollection
and UnmodifiableLongList in the decorators package, but those need to be
"seeded" with an instance of Collection/List or LongCollection/LongList,
respectively.)

> I found a IntHashMap in [lang], which uses
>
>
> The reason for asking is, I have written a nice class LongSet, which
> happens to implement
> org.apache.commons.collections.primitives.LongCollection (all I had to do
> is add   'implements
> org.apache.commons.collections.primitives.LongCollection' :-)
>
> It is quite fast, and needs only little memory.
> just in case my code is not a duplication (but, as I said in the prolog, I
> couldn't find anything on jakarta) I would be happy to contribute my code.

Currently there is no implementation of <Type>Set (or for that matter
<Type>Map) in commons-primitives, or even a declaration of the <Type>Set
or <Type>Map interface.  I haven't had a chance to look at your
implemenation in detail, but from your overview it sounds quite
reasonable.


> let me explain my approach:
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
> I'm using some hashtable (nothing new).
> but for the collisions-chains, I do NOT instantiate Entry-objects; rather,
> I allocate 2 arrays of length capacity, one for the keys, one for the
> next-pointer, like:
>
>  long[] keys = new long[capacity];
>  int[] next = new int[capacity];
>
> this way I save quite a lot of memory (I have the memory-overhead of 2
> array-objects, rather than the overhead for k objects of the form
> class Entry {
>  long key;
>  Entry next;
> }
> which was my initial form of implementing the collision chain (well, I
> started with java.util.LinkedList, which requires Long-objects rather than
> long-primitives, requiring even more memory).
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
> the idea can easily be adapted to Maps (in fact I have done a
> LongObjectMap).
> and - of course - it can be used for any other primitive type (I chose long
> because that's what i needed).
>
> and now the actual code:
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
> package com.knuti.util.nativecollections;
>
> //import java.util.Iterator;
> import java.util.NoSuchElementException;
>
> import org.apache.commons.collections.primitives.AbstractLongCollection;
> import org.apache.commons.collections.primitives.LongIterator;
>
>
> /**
>  * A Set with elements of type <code>long</code>.
>  *
>  * It uses a hashtable, but not a standard version from
> <code>java.util</code>.
>  * My hashtable differs in the implementation for the collision chains:
>  * <ul>
>  * <li>I use an array of long, plus an array of int,
>  * <li>rather than an ordinary LinkedList.
>  * </ul>
>  * This saves lots of memory (overhead of many small objects is replaced by
> overhead of 2 array-objects),
>  * and causes little work for GC.
>  *
>  * @author Knut Ploder
>  *
>  * @version 1.1 adapt to jakarta collection primitives (replacing my -
> conceptionally identical - LongIterator)
>  *
>  */
> public class LongSet extends AbstractLongCollection {
>
>
>       /**
>        * The load above which rehashing occurs.
>        * <br>
>        * Here, load means the ratio <tt>buckets/capacity</tt>.
>        * In other words, the (maximum) average length of a collision-chain,
>        * or the average number of items in a bucket of the hashtable.
>        * <br>note that the loadfactor may exceed 1.0
>        */
>       protected static final float DEFAULT_LOAD_FACTOR = 0.8f;
>
>
>       /**
>        * The default initial capacity for the hash table.
>        * <br>
>        * Here, capacity means the number of items the Set can hold.
>        * Rehashing occurs when the are more than capacity items in the Set,
>        * which is in contrast to many other implementaions,
>        * where rehashing occurs already at size
> <tt>capacity*loadfactor</tt> items.
>        */
>       protected static final int DEFAULT_INITIAL_CAPACITY = 11;
>
>
>       /**
>        * @see {link #DEFAULT_LOAD_FACTOR}
>        */
>       protected float loadFactor;
>
>
>       /**
>        * The current number of items in the set.
>        * <br>Do not mix up with capacity.
>        */
>       int _itemCtr;
>
>
>       /**
>        * the buckets of the hashtable.
>        * they hold an index into {@link #entries}, or -1 to indicate null
>        */
>       private int[] buckets;
>
>
>       /**
>        * this is a replacement for the linked-lists
>        * @see #Entries
>        */
>       private Entries entries;
>
>
>       /**
>        * Return the number of elements in the map.
>        */
>       public final int size() {
>             return _itemCtr;
>       }
>
>
>
>       //
>       // constructors
>       //
>
>
>
>       /**
>        * Creates a new <code>LongSet</code> instance,
>        * with the default capacity
>        * and load factor.
>        */
>       public LongSet() {
>           this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
>       }
>
>
>       /**
>        * Creates a new <code>LongSet</code> instance,
>        * with the default load factor,
>        * that can hold up to <tt>initialCapacity</tt> elements
>        * without triggering a rehash.
>        *
>        * @param initialCapacity an <code>int</code> value
>        */
>       public LongSet(int initialCapacity) {
>           this(initialCapacity, DEFAULT_LOAD_FACTOR);
>       }
>
>
>       /**
>        * Creates a new <code>LongSet</code> instance,
>        * with load factor <tt>loadFactor</tt>,
>        * that can hold <tt>initialCapacity</tt> elements
>        * without triggering a rehash.
>        *
>        * @param initialCapacity an <code>int</code> value
>        * @param loadFactor a <code>float</code> value
>        */
>       public LongSet(int initialCapacity, float loadFactor) {
>             super();
>             this.loadFactor = loadFactor;
>             setUp(initialCapacity);
>       }
>
>
>
>       //
>       //  internal utility stuff
>       //
>
>
>
>       /**
>        * Initializes the Set so that it can hold <tt>desiredCapacity</tt>
> elements.
>        * <p>
>        * the number of buckets is <tt>desiredCapacity / loadfactor</tt>.
>        * <p>for example,
>        * a desiredCapacity = 1000 means you want a Set that can hold (up
> to) 1000 entries without resize()ing.
>        * if the loadfactor is 80%, you need 1000 / 0.8 = 1250 buckets.
>        *
>        * @param desiredCapacity an <code>int</code> value
>        *
>        */
>       protected void setUp(int desiredCapacity) {
>             int numbuckets;
>             int numitems;
>
>             // the most senseful version. do what we are asked to do
>             numitems = desiredCapacity;
>
>             // compute number of buckets
>             numbuckets = Math.round( desiredCapacity / loadFactor);
>
>             // allocate buckets
>             buckets = new int[numbuckets];
>             for( int i=0; i<buckets.length; i++) {
>                   buckets[i]=-1; // the indexing-counterpart for null
>             }
>
>             // pre-allocate memory for the maximum number of entries
>             entries = new Entries( numitems);
>
>             // initialise counter, how many elements there are in the map.
>             this._itemCtr = 0;
>
>             if(true)
>                   System.out.println("" + getClass().getName() + ": alloc
> desired for "
>                         + desiredCapacity + " items. "
>                         + "actually allocated "
>                         + buckets.length + " buckets / "
>                         + entries.entriesDotNext.length + " entries "
>                         );
>       }
>
>
>       /**
>        * previously, this was called the hash-function.
>        * but it is not; it just computes the bucket-index.
>        * <p>
>        * the hash-function would be a method of the key; for keys of type
> long,
>        * which we do have here, the hashfunction simply is the long-value
> itself.
>        */
>       protected int bucketindex(long key) {
>             int h = (int)(key % buckets.length);
>             if(h < 0) {
>                   h = -h;
>             }
>             return h;
>       }
>
>
>       /**
>        * resize to a new size.
>        * the new size is taken as is, without checking for primeness or
> whatever.
>        * in case of bad input, some java runtimeexception may occur,
>        * e.g. NegativeArraySizeException
>        */
>       protected void resize( int newcapacity) {
>             // save old stuff
>             Entries oldentries = this.entries;
>             int[] oldbuckets = buckets;
>
>             System.out.println("resize: newcap = " + newcapacity);
>
>             // allocate for new size
>             setUp(newcapacity);
>
>
>             // now, re-fill
>             for( int b=0; b<oldbuckets.length; b++) {
>                   for(int entry = oldbuckets[b]; entry != -1; entry =
> oldentries.getNext(entry) ) {
>                         //trace("check index " + entry + ", in bucket " +
> h);
>                         long xkey = oldentries.getKey(entry);
>                         add( xkey);
>                   }
>             }
>       }
>
>
>
>       /**
>        * @see
> org.apache.commons.collections.primitives.LongCollection#contains(long)
>        */
>       //
>       //  public interface
>       //
>
>
>
>       /**
>        * Check if the key is in the Set.
>        */
>       public boolean contains(long key) {
>             int h = bucketindex(key);
>             for (int entry = buckets[h]; entry != -1; entry =
> entries.getNext(entry) ) {
>                   if (entries.getKey(entry) == key) {
>                         return true;
>                   }
>             }
>             return false;
>       }
>
>
>       /**
>        * @see
> org.apache.commons.collections.primitives.LongCollection#add(long)
>        */
>       public boolean add(long key) {
>
>             // check if key is already in use
>             int h = bucketindex(key);
>             for (int entry = buckets[h]; entry != -1; entry =
> entries.getNext(entry)) {
>                   if (entries.getKey(entry) == key) {       //if (entry.key
> == key) {
>                         return false;     // not changed
>                   }
>             }
>
>             // resize if we're full
>             if(this.entries.freelist == -1) {
>
>                   //assert: number of items in map == arraysize in
> entries-arrays
>                   if( entries.entriesDotKey.length != _itemCtr) {
>                         throw new IllegalStateException("assertion
> violated: "
>                                     + "number of items in map (" + _itemCtr
> + ")"
>                                     + "!="
>                                     + "arraysize in entries-arrays: (" +
> entries.entriesDotKey.length  + ")"
>                                     );
>                   }
>
>                   int oldcapacity = this._itemCtr;
>                   int newcapacity = 2*oldcapacity;
>                   resize( newcapacity);
>                   // h depends on hashsize, so we must re-compute it!!!
>                   h = bucketindex(key);
>             }
>
>             // add the new entry
>             this._itemCtr++;
>             int entry = entries.makeEntry(key);
>             entries.setNext( entry, buckets[h]);
>             buckets[h] = entry;
>
>             return true;      // Set has changed
>       }
>
>
>       /**
>        * Remove the specified key from the Set.
>        *
>        * @see
> org.apache.commons.collections.primitives.LongCollection#removeElement(long)
>        */
>       public boolean removeElement(long key) {
>             int h = bucketindex(key);
>
>             int entry = buckets[h];
>             int prev = -1;
>             while(entry != -1) {
>                   if(entries.getKey(entry) != key) {
>                         // not the key we want
>                         prev = entry;
>                         entry = entries.getNext(entry);
>                   } else {
>                         // found key
>                         if (prev == -1) {
>                               //first in list
>                               buckets[h]=entries.getNext(entry);
>                         } else {
>                               entries.setNext(prev,
> entries.getNext(entry));
>                         }
>                         // put item in free-list
>                         entries.delete(entry);
>
>                         _itemCtr--;
>
>                         return true;      // set has changed
>                   }
>             }
>
>             // key not found -> set not changed
>             return false;
>
>       }
>
>
>       /**
>        * provide an iterator over the set.
>        */
>       public LongIterator iterator() {
>             LongIterator iter = new LocalIterator();
>             return iter;
>       }
>
>
>       //
>       //  informal public interface
>       //
>
>
>       /**
>        * print some statistics
>        */
>       public void dumpStats() {
>             int used=0;
>             final int MAX=25;
>             int maxdep=0;
>             int maxdepctr=0;
>             int[] stat = new int[MAX+1];
>             for( int ctr=0; ctr<buckets.length; ctr++) {
>                   if(buckets[ctr] != -1) {
>                         used++;
>                         int len=0;
>                         int next=buckets[ctr];
>                         while(next!=-1) {
>                               len++;
>                               next=entries.getNext(next);
>                         }
>                         if(len>maxdep) {
>                               maxdep=len;
>                               maxdepctr=1;
>                         } else if(len==maxdep) {
>                               maxdepctr++;
>                         }
>                         if(len<MAX) {
>                               stat[len]++;
>                         } else {
>                               stat[MAX]++;
>                         }
>                   }
>             }
>             System.out.println("number of used buckets " + used);
>             for(int i=0; i<MAX; i++) {
>                   if(stat[i]!=0) System.out.println("stat[" + i + "]=" +
> stat[i]);
>             }
>             if(stat[MAX]!=0) System.out.println("stat[>" + MAX + "]=" +
> stat[MAX]);
>             System.out.println("maxdep=" + maxdep + " (" + maxdepctr + "
> times)");
>
>             //System.out.println("used bucket ratio " +
> ((used*100.0)/capacity) + "%");
>
>             int capacity=entries.entriesDotNext.length;
>             System.out.println( "capacity           " + capacity);
>             System.out.println( "number of buckets  " + buckets.length);
>             System.out.println( "loadfactor (fact)  " + ((capacity
> *100.0)/buckets.length) + "%");
>             System.out.println( "loadfactor (should)" + ( loadFactor
> *100.0) + "%");
>             System.out.println( "number of entries  " + _itemCtr);
>             System.out.println( "actual loadfactor  " + ((_itemCtr
> *100.0)/capacity) + "%");
>             //System.out.println("ratio " + ((size *100.0)/capacity) +
> "%");
>             System.out.println( "entries per bucket " + (_itemCtr * 1.0 /
> buckets.length) + "");
>       }
>
>
>       /**
>        * print contents, plus some statistics
>        */
>       public void dump() {
>             System.out.println(""
>                         + "contents of "
>                         + getClass().getName()
>                         //+ this
>                         );
>             //##
>             LongIterator iter = iterator();
>             while(iter.hasNext()) {
>                   System.out.println( "item: " + iter.next() );
>             }
>             dumpStats();
>       }
>
>
>       //
>       //
>       //
>
>
>       /**
>        * use arrays of the appropriate type for key, and next.
>        * <br>for key, use whatever
>        * <br>for next, rather than using Entry (a ref to another Entry),
>        * we use int as index to the 'virtual' Entry.
>        * -1 is the pendant for a null-Entry.
>        * <br>
>        * we cannot rely on GC, so we must also
>        * <li>have a free-list
>        * <li>have a method delete() that must be manually invoked when an
> Entry is not needed any more
>        * <br>since Entry objects are only internal, this 'ugliness' is not
> visible outside,
>        * and the memory- and performance increase is worth the 'why don't
> you use Entry-objects'-code.
>        */
>       private class Entries {
>
>             private int freelist;
>
>             private long[] entriesDotKey;
>             private int[] entriesDotNext;
>
>
>             /**
>              * Constructor for Entries.
>              */
>             Entries(int capacity) {
>                   entriesDotKey = new long[capacity];
>                   entriesDotNext = new int[capacity];
>
>                   // init the free-list; it contains all items,
>                   freelist=0;
>                   for(int i=0; i<entriesDotNext.length; i++) {
>                         entriesDotNext[i]=i+1;
>                   }
>                   entriesDotNext[entriesDotNext.length-1]=-1;
>             }
>
>
>             /**
>              * return a 'reference' to a new virtual Entry.
>              * <p>
>              * it is assumed that there is always some entry available.
>              * if not, resize() should have been invoked right before
> invoking this method.
>              */
>             int makeEntry(long key) {
>                   if(freelist==-1) {
>                         throw new IllegalStateException("free list is
> empty");
>                   }
>                   int i=freelist;
>                   freelist = entriesDotNext[i];
>                   entriesDotNext[i] = -1;
>                   entriesDotKey[i] = key;
>
>                   return i;   // remember, the index represents the
> Entry-reference
>             }
>
>
>             /**
>              * mark an entry unused.
>              * <li>wipe out key and value
>              * <li>add to freelist
>              */
>             void delete( int index) {
>                   entriesDotKey[index]=0;       // wipe out key-value
>                   entriesDotNext[index]=freelist;     // add as new head of
> free-list
>                   freelist=index;
>             }
>
>             // set- and get- methods for the virtual Entries
>
>             long getKey(int index) {
>                   return entriesDotKey[index];
>             }
>             void setKey(int index, long key) {
>                   entriesDotKey[index] = key;
>             }
>
>             int getNext(int index) {
>                   return entriesDotNext[index];
>             }
>             void setNext(int index, int next) {
>                   entriesDotNext[index] = next;
>             }
>
>       }
>
>
>
>       //
>       // iterator stuff
>       //
>
>
>       private class LocalIterator implements LongIterator {
>
>
>             //private LongMapNoentryclass map;
>
>
>             /**
>              * reference to the bucket holding the 'next' item.
>              * <br>
>              * It is the index into {@link #buckets}.
>              */
>             private int next_bucketidx = 0;
>
>             //private int current_bucketidx = 0; we would need it for a
> high-performance remove()
>
>
>             /**
>              * reference to the item that {@link #next()} would return.
>              * <br>
>              * It is the index into {@link super#entries}.
>              */
>             private int next_entryidx = -1;
>
>
>             /**
>              * reference to the current item (the one that {@link #remove
> ()} would remove).
>              * <br>
>              * the difference to {@link #entryidx} is:
>              * <li>{@link #current_entryidx} is changed by invocations of
> {@link #next()} only
>              * <li>{@link #next_entryidx} is changed by invocations of
> {@link #next()} as well as {@link #hasNext()}
>              */
>             private int current_entryidx = -1;
>
>
>             /**
>              * constructor. no need to pass the map, since we're an inner
> class,
>              * and so we have access to our surrounding map anyway.
>              *
>              * the iterator is positioned 'before the first item'.
>              * this means:
>              * <li>must invoke hasNext() to handle the case of an empty map
>              * <li>must invoke next() at least once, before getKey() and
> getValue() may be used.
>              */
>             private LocalIterator( /* LongMapNoentryclass map */ ) {
>                   //this.map = map;
>                   advance();
>                   current_entryidx = -1;
>             }
>
>
>             /**
>              * @see com.knuti.util.nativecollections.LongIterator#hasNext()
>              */
>             public boolean hasNext() {
>                   return next_entryidx != -1;
>             }
>
>
>             /**
>              * internal method, advance to next element.
>              * on return, {@link #next_entryidx} 'references' the
> next-item.
>              * if there is no more, {@link #next_entryidx} will be -1
>              */
>             protected void advance() {
>                   // advance in bucket, if possible
>                   if(next_entryidx!=-1) {
>                         next_entryidx = entries.getNext(next_entryidx);
>                   }
>                   // find non-empty bucket (or, do nothing if bucket did
> have next entry)
>                   while( (next_entryidx == -1) && (next_bucketidx <
> buckets.length) ) {
>                         next_entryidx = buckets[next_bucketidx];
>                         next_bucketidx++;
>                   }
>             }
>
>
>             /**
>              * same as in <code>java.util.Iterator#next()</code>,
>              * except for the return type.
>              *
>              * @see com.knuti.util.nativecollections.MapEntryIterator#next
> ()
>              *
>              */
>             public long next() {
>                   if(!hasNext()) {
>                         throw new NoSuchElementException();
>                   }
>                   current_entryidx = next_entryidx;   // position the
> 'finger'
>                   advance();                                      //
> pre-compute 'next'
>                   long res = entries.getKey( current_entryidx);
>                   return res;
>             }
>
>
>             /**
>              * Remove the current element.
>              *
>              * @see
> com.knuti.util.nativecollections.MapEntryIterator#remove()
>              *
>              * @throws UnsupportedOperationException
>              *
>              */
>             public void remove() {
>                   //if(true) throw new UnsupportedOperationException();
>                   if (current_entryidx == -1) {
>                         throw new IllegalStateException( "no current
> element");
>                   } else {
>                         long key=entries.getKey( current_entryidx);
>                         LongSet.this.removeElement( key);
>                         current_entryidx = -1;
>                   }
>             }
>
>       }
>
>
>       /**
>        * mini test, does nothing very useful
>        */
>       public static void main(String[] args) {
>
>             // initial capacity 17,
>             // allow up to 3,14 entries per bucket before re-hashing.
>             // i.e. start with a hashtable of size 17/3.14 = 5
>             LongSet x = new LongSet(17,3.14f);
>
>             // test 1: store a lot of entries.
>             // key: remainder mod 997
>             for (int num = 12345; num < 123456; num+=31) {
>                   long key=num%99997;
>                   //long key=num%number;
>                   //long key=num/number;
>                   boolean changed=x.add( key);
>                   if(changed) {
>                         System.out.println( "new:" + key + "
> <--------------------");
>                   } else {
>                         System.out.println( "old:" + key + " (remove
> now)");
>                         x.removeElement( key);
>                   }
>             }
>             // show what we have
>             x.dump();
>       }
> }
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - -
>
>
> knuti
> (despite my email address, i am writing this as a private person)
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>

-- 
- Rod <http://radio.weblogs.com/0122027/>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org