You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stephen Colebourne <sc...@btopenworld.com> on 2003/11/03 01:08:13 UTC

[collections] Flat3Map, faster than HashMap

Well I think I've achieved a neat little class in Flat3Map, but I guess
others should decide that. (Its checked in, but if people don't like it I'll
remove it or rename it or otherwise modify it)

Flat3Map is a Map implementation that is optimised for size 3 and less. It
stores data in ordinary fields and does not create an array, map entry or
other complex object until size 3 is breached.

Once size 4 is reached, a HashMap is created and the Flat3Map behaves as per
a decorated HashMap.

Performance wise -
- Gets at size 3 or less are about 0-10% faster than HashMap.
- Puts at size 3 or less are over 4 times faster than HashMap.
- Performance 5% slower than HashMap once size 3 exceeded once.

The gains on put are probably down to object creation and therefore garbage
collection. The new MapIterator should be used to get the maximum advantage,
as it doesn't create MapEntry objects. The performance test class is also
checked in so you can try it out (you have to play with the comments in the
loop).

I also suspect, but can't think how to prove, that Fast3Map will be easier
for the garbage collector to dispose of as it contains no arrays or complex
types to hunt through for dependencies. Any ideas on how to test this?

Opinions? And should there be a Flat1Map, Flat5Map etc. And what about a
Flat3List??

Stephen




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [collections] Flat3Map, faster than HashMap

Posted by Stephen Colebourne <sc...@btopenworld.com>.
> My point was to question whether
> "performance tweaked" code is the type of thing that we want to add to
> commons.  I'm not convinced that classes specific to certain number of
> Map or List entries would be useful additions.  Obviously, I can only
> speak for myself - does anyone else have a need for such a thing?
Well we certainly have lots of other collections/maps that are tweaked in
other ways, for fast reads and slow writes, for validation, for observing
changes. One that handles small sizes well seems to be a good fit.

One variation would be to say that the map has a fixed size of 3 above which
it cannot go. It might make things clearer as to when to use it, but less
flexible.

> Did you try creating a HashMap using an
> initial capacity of 3 [ via new HashMap(3) ]?  In this case, I believe
> that only 3 objects are created on construction.  I think that setting
> the loadFactor to 1 [ via new HashMap(3,1)] would also ensure that the
> Map is only rehashed when the entries become greater than 3.
These parameters wouldn't be ideal. The initial allocation size of the
HashMap goes down from 120 bytes to 72 bytes (close to the 56 bytes).
However the Flat3Map retains a _constant_ size as it is updated with no
object creation.

Adding and removing from a HashMap always creates an Entry object at a cost
of 24 bytes and significant time (both creation and gc) for each new entry.

In my performance test, HashMap(3, 1) performs basically the same as
HashMap(), as I expected.

In some degree, the question is whether the gc time is proportional to the
number of objects it collects. And if it is, then Flat3Map is a good design,
and would probably scale to larger sizes.

A variation on the design is to store the keys and values in two parallel
Object arrays rather than Entry objects. Again this should have a
significant impact in avoiding object creation. I believe this might require
an 'open' hashing technique. Anyone know anything about this or have an
example implementation?

Stephen


> > Actually I just did some more tests:
> > Create new Flat3Map() vs create new HashMap()
> > 2910 vs 4060
> >
> > Create new Flat3Map() put 3 mappings vs create new HashMap() put 3
mappings
> > 4170 vs 8120
> >
> > Memory usage
> > Size 0: 56 bytes vs 120 bytes
> > Size 1: 56 bytes vs 144 bytes
> > Size 2: 56 bytes vs 168 bytes
> > Size 3: 56 bytes vs 192 bytes
> > Size 4: 272 bytes vs 216 bytes
> > Size 5: 296 bytes vs 240 bytes
> >
> > Object creation on construction:
> > None vs Entry[16]



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [collections] Flat3Map, faster than HashMap

Posted by __matthewHawthorne <ma...@phreaker.net>.
Comments inline...


Stephen Colebourne wrote:
> From: "__matthewHawthorne" <ma...@phreaker.net>
> 
>>When you say that Flat3Map is faster than HashMap for <= 3 gets/puts,
> 
> er <= size 3. You can get/put as often as you like, without exceeding size
> 3.

Sorry, I just phrased that wrong.  I knew that you meant size<=3



>>  My concern is always the cost required to obtain this situation.  I'll
>>always prefer a nice clean object oriented solution over one that is
>>tweaked for performance.  I'll never avoid using a complex type or
>>creating an object just because it will hurt performance.  Whether this
>>preference is practical in the real world, I'm not sure.
> 
> Well of course the point here is that as far as the API user is concerned
> there is no change - you just use a Map interface. Do you need to care how
> it is implemented?

You're right - once again I phrased this wrong.  From the user's 
perspective there is no difference.  My point was to question whether 
"performance tweaked" code is the type of thing that we want to add to 
commons.  I'm not convinced that classes specific to certain number of 
Map or List entries would be useful additions.  Obviously, I can only 
speak for myself - does anyone else have a need for such a thing?



>>Question: Did you try testing HashMap with different values for
>>initialCapacity and loadFactor?
> 
> Actually I just did some more tests:
> Create new Flat3Map() vs create new HashMap()
> 2910 vs 4060
> 
> Create new Flat3Map() put 3 mappings vs create new HashMap() put 3 mappings
> 4170 vs 8120
> 
> Memory usage
> Size 0: 56 bytes vs 120 bytes
> Size 1: 56 bytes vs 144 bytes
> Size 2: 56 bytes vs 168 bytes
> Size 3: 56 bytes vs 192 bytes
> Size 4: 272 bytes vs 216 bytes
> Size 5: 296 bytes vs 240 bytes
> 
> Object creation on construction:
> None vs Entry[16]

Maybe I wasn't specific enough.  Did you try creating a HashMap using an 
initial capacity of 3 [ via new HashMap(3) ]?  In this case, I believe 
that only 3 objects are created on construction.  I think that setting 
the loadFactor to 1 [ via new HashMap(3,1)] would also ensure that the 
Map is only rehashed when the entries become greater than 3.

I would imagine that using loadFactor and initialCapacity may provide 
performance close to Flat3Map.  It seems that Sun put them there for 
situations when the number of entries is known - the same situation 
you're in.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [collections] Flat3Map, faster than HashMap

Posted by Stephen Colebourne <sc...@btopenworld.com>.
From: "__matthewHawthorne" <ma...@phreaker.net>
> When you say that Flat3Map is faster than HashMap for <= 3 gets/puts,
er <= size 3. You can get/put as often as you like, without exceeding size
3.

> I would think that you're talking about milliseconds.  And, if an
> application has a performance problem, I would suspect that it isn't
> coming from wasted milliseconds caused by small HashMaps.
>
> It's a proven fact that less object creation equals better performance.
Its this that the class targets more. Lots of small object creation and loss
is very expensive.

>   My concern is always the cost required to obtain this situation.  I'll
> always prefer a nice clean object oriented solution over one that is
> tweaked for performance.  I'll never avoid using a complex type or
> creating an object just because it will hurt performance.  Whether this
> preference is practical in the real world, I'm not sure.
Well of course the point here is that as far as the API user is concerned
there is no change - you just use a Map interface. Do you need to care how
it is implemented?

> My point is, I'd never see myself using a Flat3Map to improve my apps
> performance, mainly because any performance problems that exist are
> coming from somewhere else.  I'm not sure if there is a valid problem
> for which Flat3Map is a solution.
At my work we have large complex object models that can cope with large
amounts of data. But in probability, most of the time the lists/maps will be
small. This class adresses those cases.

> Question: Did you try testing HashMap with different values for
> initialCapacity and loadFactor?
Actually I just did some more tests:
Create new Flat3Map() vs create new HashMap()
2910 vs 4060

Create new Flat3Map() put 3 mappings vs create new HashMap() put 3 mappings
4170 vs 8120

Memory usage
Size 0: 56 bytes vs 120 bytes
Size 1: 56 bytes vs 144 bytes
Size 2: 56 bytes vs 168 bytes
Size 3: 56 bytes vs 192 bytes
Size 4: 272 bytes vs 216 bytes
Size 5: 296 bytes vs 240 bytes

Object creation on construction:
None vs Entry[16]

Stephen

>
> Any other thoughts?
>
>
>
>
> Stephen Colebourne wrote:
> > Well I think I've achieved a neat little class in Flat3Map, but I guess
> > others should decide that. (Its checked in, but if people don't like it
I'll
> > remove it or rename it or otherwise modify it)
> >
> > Flat3Map is a Map implementation that is optimised for size 3 and less.
It
> > stores data in ordinary fields and does not create an array, map entry
or
> > other complex object until size 3 is breached.
> >
> > Once size 4 is reached, a HashMap is created and the Flat3Map behaves as
per
> > a decorated HashMap.
> >
> > Performance wise -
> > - Gets at size 3 or less are about 0-10% faster than HashMap.
> > - Puts at size 3 or less are over 4 times faster than HashMap.
> > - Performance 5% slower than HashMap once size 3 exceeded once.
> >
> > The gains on put are probably down to object creation and therefore
garbage
> > collection. The new MapIterator should be used to get the maximum
advantage,
> > as it doesn't create MapEntry objects. The performance test class is
also
> > checked in so you can try it out (you have to play with the comments in
the
> > loop).
> >
> > I also suspect, but can't think how to prove, that Fast3Map will be
easier
> > for the garbage collector to dispose of as it contains no arrays or
complex
> > types to hunt through for dependencies. Any ideas on how to test this?
> >
> > Opinions? And should there be a Flat1Map, Flat5Map etc. And what about a
> > Flat3List??
> >
> > Stephen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [collections] Flat3Map, faster than HashMap

Posted by Phil Steitz <ph...@steitz.com>.
__matthewHawthorne wrote:
> Performance is always a strange issue.
> 
> When you say that Flat3Map is faster than HashMap for <= 3 gets/puts, I 
> would think that you're talking about milliseconds.  And, if an 
> application has a performance problem, I would suspect that it isn't 
> coming from wasted milliseconds caused by small HashMaps.

Generally true; but 1000 milliseconds = 1 sec. -- i.e., it all depends 
how "hot" the code segment that you are optimizing is.

> 
> It's a proven fact that less object creation equals better performance. 
>  My concern is always the cost required to obtain this situation.  I'll 
> always prefer a nice clean object oriented solution over one that is 
> tweaked for performance.  

Optimized solutions with nice interfaces are the best of both worlds. 
Flat3Map just looks like a Map from the outside.

>I'll never avoid using a complex type or 
> creating an object just because it will hurt performance.  Whether this 
> preference is practical in the real world, I'm not sure.

Not always. Even with recent JDKs (which lots of users still don't get 
to use), heavily loaded web apps can run into problems if they create 
too many complex objects per request.

> 
> My point is, I'd never see myself using a Flat3Map to improve my apps 
> performance, mainly because any performance problems that exist are 
> coming from somewhere else.  I'm not sure if there is a valid problem 
> for which Flat3Map is a solution.

I agree with you here. I can't envision a scenario where I would use 
this class. A "lightweight HashMap" for things like JNDI or controller 
lookups might be useful; but not with such a small limit on the number 
of entries.  The natural thing to think about would be a configurable 
"breach point"; but I doubt that this could be implemented even as 
efficiently as what the HashMap implementation already provides.

An interesting question is for what value of n does a FlatnMap give 
worse performance than a HashMap?

> 
> Question: Did you try testing HashMap with different values for 
> initialCapacity and loadFactor?

> 
> Any other thoughts?
> 
> 
> 
> 
> Stephen Colebourne wrote:
> 
>> Well I think I've achieved a neat little class in Flat3Map, but I guess
>> others should decide that. (Its checked in, but if people don't like 
>> it I'll
>> remove it or rename it or otherwise modify it)
>>
>> Flat3Map is a Map implementation that is optimised for size 3 and 
>> less. It
>> stores data in ordinary fields and does not create an array, map entry or
>> other complex object until size 3 is breached.
>>
>> Once size 4 is reached, a HashMap is created and the Flat3Map behaves 
>> as per
>> a decorated HashMap.
>>
>> Performance wise -
>> - Gets at size 3 or less are about 0-10% faster than HashMap.
>> - Puts at size 3 or less are over 4 times faster than HashMap.
>> - Performance 5% slower than HashMap once size 3 exceeded once.
>>
>> The gains on put are probably down to object creation and therefore 
>> garbage
>> collection. The new MapIterator should be used to get the maximum 
>> advantage,
>> as it doesn't create MapEntry objects. The performance test class is also
>> checked in so you can try it out (you have to play with the comments 
>> in the
>> loop).
>>
>> I also suspect, but can't think how to prove, that Fast3Map will be 
>> easier
>> for the garbage collector to dispose of as it contains no arrays or 
>> complex
>> types to hunt through for dependencies. Any ideas on how to test this?

>>
>> Opinions? And should there be a Flat1Map, Flat5Map etc. And what about a
>> Flat3List??
>>
>> Stephen
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [collections] Flat3Map, faster than HashMap

Posted by __matthewHawthorne <ma...@phreaker.net>.
Performance is always a strange issue.

When you say that Flat3Map is faster than HashMap for <= 3 gets/puts, I 
would think that you're talking about milliseconds.  And, if an 
application has a performance problem, I would suspect that it isn't 
coming from wasted milliseconds caused by small HashMaps.

It's a proven fact that less object creation equals better performance. 
  My concern is always the cost required to obtain this situation.  I'll 
always prefer a nice clean object oriented solution over one that is 
tweaked for performance.  I'll never avoid using a complex type or 
creating an object just because it will hurt performance.  Whether this 
preference is practical in the real world, I'm not sure.

My point is, I'd never see myself using a Flat3Map to improve my apps 
performance, mainly because any performance problems that exist are 
coming from somewhere else.  I'm not sure if there is a valid problem 
for which Flat3Map is a solution.

Question: Did you try testing HashMap with different values for 
initialCapacity and loadFactor?

Any other thoughts?




Stephen Colebourne wrote:
> Well I think I've achieved a neat little class in Flat3Map, but I guess
> others should decide that. (Its checked in, but if people don't like it I'll
> remove it or rename it or otherwise modify it)
> 
> Flat3Map is a Map implementation that is optimised for size 3 and less. It
> stores data in ordinary fields and does not create an array, map entry or
> other complex object until size 3 is breached.
> 
> Once size 4 is reached, a HashMap is created and the Flat3Map behaves as per
> a decorated HashMap.
> 
> Performance wise -
> - Gets at size 3 or less are about 0-10% faster than HashMap.
> - Puts at size 3 or less are over 4 times faster than HashMap.
> - Performance 5% slower than HashMap once size 3 exceeded once.
> 
> The gains on put are probably down to object creation and therefore garbage
> collection. The new MapIterator should be used to get the maximum advantage,
> as it doesn't create MapEntry objects. The performance test class is also
> checked in so you can try it out (you have to play with the comments in the
> loop).
> 
> I also suspect, but can't think how to prove, that Fast3Map will be easier
> for the garbage collector to dispose of as it contains no arrays or complex
> types to hunt through for dependencies. Any ideas on how to test this?
> 
> Opinions? And should there be a Flat1Map, Flat5Map etc. And what about a
> Flat3List??
> 
> Stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org