You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Juozas Baliuka <ba...@centras.lt> on 2002/06/20 19:33:29 UTC

[Collection] algorythms was [Commons-Avalon:collections] code integration

Hi,
I want to ask about Map implementation (BucketMap).
Is it like HashMap (linear probing) or something different like "double
hashing".
If it is "linear probing", is it plans to implement alternatyve algorythms ?

<snip>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: [Collection] algorythms was [Commons-Avalon:collections] code integration

Posted by Juozas Baliuka <ba...@mwm.lt>.
Hi,
I as understand all of them are almost the same,
I need "fast" iteration and "add", I am not expert, but there are a lot of 
books,
I believe it is possible to implement "faster" Set for iteration.

At 13:15 2002.06.20 -0500, Michael A. Smith wrote:
>On Thu, 20 Jun 2002, Berin Loritsch wrote:
> > > From: Juozas Baliuka [mailto:baliuka@centras.lt]
> > >
> > > Ok,
> > > I see it is "linear probing".
> > > We can implement "double hashing" I am not sure it is
> > > somethin better, but my book says "double hashing" is better
> > > it says "Empyricaly proved", I don't believe empyric, but we
> > > can try it.
> >
> >
> > Ok. I guess.  But please keep the original as a reference.
> >
> > I'm not sure what the difference is between them, but I can
> > tell you this: the less work done on each lookup, the quicker
> > the Map works.  Using the hashCode as an index to the array
> > is about as fast as it gets.
> >
> > It might be something like the difference between LinkedList
> > and ArrayList.
> >
> > Up to a certain number of elements, ArrayList smokes (or at
> > least beats) the LinkedList.  After that threshold, LinkedList
> > wins.  However, that threshold is so high in JDK 1.3/1.4
> > Hotspot JVMs that you'd be better off with ArrayList in 99%
> > of all applications.
> >
> > Just something to think about before going down that road...
>
>double hashing and linear probing are two different methods for
>resolving collisions in the hashtable.  Double hashing is a technique
>where you use two different hashing algorithms to determine two
>difffernt likely bucks that an element may be located.  Linear probing
>involves putting an element in the "next empty bucket" if the hashed
>bucket is already taken.  The implementation used here (and in java.util
>implementaions) is direct chaining hashing, where elements of the bucket
>are actually linked lists of the elements whose hashcode maps to
>that bucket.
>
>Linear probing and double hashing do not allow the number of
>elements to exceed the number of buckets in the hashtable and thus are
>not really appropriate for this collection.
>
>regards
>michael
>
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


[Collections] HashSet was algorythms

Posted by Juozas Baliuka <ba...@centras.lt>.
Hi,

May be somebody wants to help me implement "fast" set ?
I added some incomplete implementation and want to see something like this
in Collections project. It is not synchronized and optimized at this time.



----- Original Message -----
From: "Berin Loritsch" <bl...@apache.org>
To: "'Jakarta Commons Developers List'" <co...@jakarta.apache.org>
Sent: Thursday, June 20, 2002 8:25 PM
Subject: RE: [Collection] algorythms was [Commons-Avalon:collections] code
integration


> > From: Juozas Baliuka [mailto:baliuka@centras.lt]
> >
> > I will implement Set interface based on this algorythm, It
> > besause need "fast" implementation for "large" sets, I will
> > try to implement Set based on your Map too. I will submit
> > this implementation if it will be "faster".
>
> Ok.  The real test is this:
>
> 100 simultaneous threads, all requesting random values (actually
> a random value from a known set of values).
>
> How long does it take to process 10,000,000 accesses?
>
> For 100 simultaneous threads, that translates into 10,000
> iterations per thread.
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>

RE: [Collection] algorythms was [Commons-Avalon:collections] code integration

Posted by Berin Loritsch <bl...@apache.org>.
> From: Juozas Baliuka [mailto:baliuka@centras.lt] 
> 
> I will implement Set interface based on this algorythm, It 
> besause need "fast" implementation for "large" sets, I will 
> try to implement Set based on your Map too. I will submit 
> this implementation if it will be "faster".

Ok.  The real test is this:

100 simultaneous threads, all requesting random values (actually
a random value from a known set of values).

How long does it take to process 10,000,000 accesses?

For 100 simultaneous threads, that translates into 10,000
iterations per thread.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [Collection] algorythms was [Commons-Avalon:collections] code integration

Posted by Juozas Baliuka <ba...@centras.lt>.
I will implement Set interface based on this algorythm, It besause need
"fast" implementation for "large" sets, I will try to implement Set based on
your Map
too. I will submit this implementation if it will be "faster".

> > From: Juozas Baliuka [mailto:baliuka@centras.lt]
> >
> > Ok,
> > I see it is "linear probing".
> > We can implement "double hashing" I am not sure it is
> > somethin better, but my book says "double hashing" is better
> > it says "Empyricaly proved", I don't believe empyric, but we
> > can try it.
>
>
> Ok. I guess.  But please keep the original as a reference.
>
> I'm not sure what the difference is between them, but I can
> tell you this: the less work done on each lookup, the quicker
> the Map works.  Using the hashCode as an index to the array
> is about as fast as it gets.
>
> It might be something like the difference between LinkedList
> and ArrayList.
>
> Up to a certain number of elements, ArrayList smokes (or at
> least beats) the LinkedList.  After that threshold, LinkedList
> wins.  However, that threshold is so high in JDK 1.3/1.4
> Hotspot JVMs that you'd be better off with ArrayList in 99%
> of all applications.
>
> Just something to think about before going down that road...
>
> >
> > > > From: Juozas Baliuka [mailto:baliuka@centras.lt]
> > > >
> > > > Hi,
> > > > I want to ask about Map implementation (BucketMap).
> > > > Is it like HashMap (linear probing) or something different like
> > > > "double hashing". If it is "linear probing", is it plans to
> > > > implement alternatyve algorythms ?
> > >
> > > Not sure what you mean.  My implementation was rather simplistic.
> > > Basically there is an array of buckets. Each Bucket
> > represents a hash
> > > value (hashCode() % n). Synchronization occurs on the bucket.
> > >
> > > I don't pretend to be an algorithmic genious.  I just
> > > hacked it together and saw it did a better job than
> > > HashMap at removing the race conditions (because
> > > HashMap is not threadsafe), and better than a
> > > Synchronized Map because there was less thread contention.
> > >
> > > Anything more than that, I couldn't tell you.  I actually
> > > got the algorithm I used from a magazine article, and
> > > hacked in the Map support that you do see in there.
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:commons-dev-> unsubscribe@jakarta.apache.org>
> > For
> > additional commands,
> > e-mail: <ma...@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: [Collection] algorythms was [Commons-Avalon:collections] code integration

Posted by "Michael A. Smith" <ma...@apache.org>.
On Thu, 20 Jun 2002, Berin Loritsch wrote:
> > From: Juozas Baliuka [mailto:baliuka@centras.lt] 
> > 
> > Ok,
> > I see it is "linear probing".
> > We can implement "double hashing" I am not sure it is 
> > somethin better, but my book says "double hashing" is better 
> > it says "Empyricaly proved", I don't believe empyric, but we 
> > can try it.
> 
> 
> Ok. I guess.  But please keep the original as a reference.
> 
> I'm not sure what the difference is between them, but I can
> tell you this: the less work done on each lookup, the quicker
> the Map works.  Using the hashCode as an index to the array
> is about as fast as it gets.
> 
> It might be something like the difference between LinkedList
> and ArrayList.
> 
> Up to a certain number of elements, ArrayList smokes (or at
> least beats) the LinkedList.  After that threshold, LinkedList
> wins.  However, that threshold is so high in JDK 1.3/1.4
> Hotspot JVMs that you'd be better off with ArrayList in 99%
> of all applications.
> 
> Just something to think about before going down that road...

double hashing and linear probing are two different methods for 
resolving collisions in the hashtable.  Double hashing is a technique 
where you use two different hashing algorithms to determine two 
difffernt likely bucks that an element may be located.  Linear probing 
involves putting an element in the "next empty bucket" if the hashed 
bucket is already taken.  The implementation used here (and in java.util 
implementaions) is direct chaining hashing, where elements of the bucket 
are actually linked lists of the elements whose hashcode maps to 
that bucket.

Linear probing and double hashing do not allow the number of 
elements to exceed the number of buckets in the hashtable and thus are 
not really appropriate for this collection.

regards
michael


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: [Collection] algorythms was [Commons-Avalon:collections] code integration

Posted by Berin Loritsch <bl...@apache.org>.
> From: Juozas Baliuka [mailto:baliuka@centras.lt] 
> 
> Ok,
> I see it is "linear probing".
> We can implement "double hashing" I am not sure it is 
> somethin better, but my book says "double hashing" is better 
> it says "Empyricaly proved", I don't believe empyric, but we 
> can try it.


Ok. I guess.  But please keep the original as a reference.

I'm not sure what the difference is between them, but I can
tell you this: the less work done on each lookup, the quicker
the Map works.  Using the hashCode as an index to the array
is about as fast as it gets.

It might be something like the difference between LinkedList
and ArrayList.

Up to a certain number of elements, ArrayList smokes (or at
least beats) the LinkedList.  After that threshold, LinkedList
wins.  However, that threshold is so high in JDK 1.3/1.4
Hotspot JVMs that you'd be better off with ArrayList in 99%
of all applications.

Just something to think about before going down that road...

> 
> > > From: Juozas Baliuka [mailto:baliuka@centras.lt]
> > >
> > > Hi,
> > > I want to ask about Map implementation (BucketMap).
> > > Is it like HashMap (linear probing) or something different like 
> > > "double hashing". If it is "linear probing", is it plans to 
> > > implement alternatyve algorythms ?
> >
> > Not sure what you mean.  My implementation was rather simplistic.  
> > Basically there is an array of buckets. Each Bucket 
> represents a hash 
> > value (hashCode() % n). Synchronization occurs on the bucket.
> >
> > I don't pretend to be an algorithmic genious.  I just
> > hacked it together and saw it did a better job than
> > HashMap at removing the race conditions (because
> > HashMap is not threadsafe), and better than a
> > Synchronized Map because there was less thread contention.
> >
> > Anything more than that, I couldn't tell you.  I actually
> > got the algorithm I used from a magazine article, and
> > hacked in the Map support that you do see in there.
> >
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:commons-dev-> unsubscribe@jakarta.apache.org>
> For 
> additional commands, 
> e-mail: <ma...@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [Collection] algorythms was [Commons-Avalon:collections] code integration

Posted by Juozas Baliuka <ba...@centras.lt>.
Ok,
I see it is "linear probing".
We can implement "double hashing" I am not sure it is somethin better, but
my book says "double hashing" is better it says "Empyricaly proved",
I don't believe empyric, but we can try it.

> > From: Juozas Baliuka [mailto:baliuka@centras.lt]
> >
> > Hi,
> > I want to ask about Map implementation (BucketMap).
> > Is it like HashMap (linear probing) or something different
> > like "double hashing". If it is "linear probing", is it plans
> > to implement alternatyve algorythms ?
>
> Not sure what you mean.  My implementation was rather
> simplistic.  Basically there is an array of buckets.
> Each Bucket represents a hash value (hashCode() % n).
> Synchronization occurs on the bucket.
>
> I don't pretend to be an algorithmic genious.  I just
> hacked it together and saw it did a better job than
> HashMap at removing the race conditions (because
> HashMap is not threadsafe), and better than a
> Synchronized Map because there was less thread contention.
>
> Anything more than that, I couldn't tell you.  I actually
> got the algorithm I used from a magazine article, and
> hacked in the Map support that you do see in there.
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: [Collection] algorythms was [Commons-Avalon:collections] code integration

Posted by Berin Loritsch <bl...@apache.org>.
> From: Juozas Baliuka [mailto:baliuka@centras.lt] 
> 
> Hi,
> I want to ask about Map implementation (BucketMap).
> Is it like HashMap (linear probing) or something different 
> like "double hashing". If it is "linear probing", is it plans 
> to implement alternatyve algorythms ?

Not sure what you mean.  My implementation was rather
simplistic.  Basically there is an array of buckets.
Each Bucket represents a hash value (hashCode() % n).
Synchronization occurs on the bucket.

I don't pretend to be an algorithmic genious.  I just
hacked it together and saw it did a better job than
HashMap at removing the race conditions (because
HashMap is not threadsafe), and better than a
Synchronized Map because there was less thread contention.

Anything more than that, I couldn't tell you.  I actually
got the algorithm I used from a magazine article, and
hacked in the Map support that you do see in there.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>