You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by Aleksey Shipilev <al...@gmail.com> on 2008/04/18 08:50:37 UTC

[classlib][luni][performance] Improvements in Collections

Colleagues,

I had recently filed two JIRAs with improvements in Collections,
giving up to +30-40% to serialization benchmarks. Presumably they will
boost the performance across the all users since the optimization is
pretty general:
https://issues.apache.org/jira/browse/HARMONY-5761
https://issues.apache.org/jira/browse/HARMONY-5718

Would some classlib guru (Tim, Nathan, Tony?) review and commit them?

Thanks,
Aleksey.

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Thanks Jimmy!

MTHarness is custom-made multi-threaded harness capable of running
several workloads. For example such the workload could be SerialBench,
inspired by JBoss serialization benchmarks. You may find the bundle
and details here [1], I'm working now on integrating of this benchmark
into BTI.

The reason why I justified the patch with SerialBench is that
IdentityHashMap and WeakHashMap are both heavily used in
ObjectInputStream/ObjectOutputStream and ThreadLocal. These patches
give significant boosts on such the workload.

Aleksey.

[1] https://issues.apache.org/jira/browse/HARMONY-5632

On Fri, Apr 18, 2008 at 2:29 PM, Jimmy,Jing Lv <fi...@gmail.com> wrote:
> Hi,
>
>     Thanks Aleksey. I've read the patch and yes it is general (AND
>  operation is much faster than MOD opertion), I believe this fix does
>  help to performance.
>     What "serialization benchmarks" are you using (Sorry I don't know
>  what is MT/SerialBench)?  Can you post more detail about this?
>     If no objection, I'd like to commit the patch(after running the full test).
>
>  2008/4/18, Aleksey Shipilev <al...@gmail.com>:
>
>
> > Colleagues,
>  >
>  >  I had recently filed two JIRAs with improvements in Collections,
>  >  giving up to +30-40% to serialization benchmarks. Presumably they will
>  >  boost the performance across the all users since the optimization is
>  >  pretty general:
>  >  https://issues.apache.org/jira/browse/HARMONY-5761
>  >  https://issues.apache.org/jira/browse/HARMONY-5718
>  >
>  >  Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
>  >
>  >  Thanks,
>  >
>  > Aleksey.
>  >
>
>
>  --
>
>  Best Regards!
>
>  Jimmy, Jing Lv
>  China Software Development Lab, IBM
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by "Jimmy,Jing Lv" <fi...@gmail.com>.

Hi,

    Thanks Aleksey. I've read the patch and yes it is general (AND
operation is much faster than MOD opertion), I believe this fix does
help to performance.
    What "serialization benchmarks" are you using (Sorry I don't know
what is MT/SerialBench)?  Can you post more detail about this?
    If no objection, I'd like to commit the patch(after running the full test).

2008/4/18, Aleksey Shipilev <al...@gmail.com>:
> Colleagues,
>
>  I had recently filed two JIRAs with improvements in Collections,
>  giving up to +30-40% to serialization benchmarks. Presumably they will
>  boost the performance across the all users since the optimization is
>  pretty general:
>  https://issues.apache.org/jira/browse/HARMONY-5761
>  https://issues.apache.org/jira/browse/HARMONY-5718
>
>  Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
>
>  Thanks,
>
> Aleksey.
>


-- 

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Ok, I had updated both JIRAs [1, 2]. I think we can commit WeakHashMap
ariphmetics [1], but postpone improvements in IdentityHashMap
ariphmetics [2].

We can start moving IdentityHashMap to HashMap [3] if nobody objects.

Thanks,
Aleksey.

[1] https://issues.apache.org/jira/browse/HARMONY-5761
[2] https://issues.apache.org/jira/browse/HARMONY-5718
[3] https://issues.apache.org/jira/browse/HARMONY-5771



On Mon, Apr 21, 2008 at 4:47 PM, Aleksey Shipilev
<al...@gmail.com> wrote:
> Hi Tim,
>
>  Yeah, I think it makes sense to get IdentityHashMap and WeakHashMap to
>  be internally consistent with HashMap implementation. It would be
>  better to just inline roundTo2K() method to computeCapacity() and
>  that's all.
>
>  I will update JIRAs a little later, if Jimmy didn't help out there already.
>
>  Thanks,
>  Aleksey.
>
>
>
>
>  On Mon, Apr 21, 2008 at 4:38 PM, Tim Ellison <t....@gmail.com> wrote:
>  > Just catching up with this thread...
>  >
>  >  So it seems everyone agrees that making the number of buckets a power of
>  > two, and calculating that number using the same mechanism as already applied
>  > to HashMap is the way to go?
>  >
>  >  Regards,
>  >  Tim
>  >
>  >
>  >
>  >  Nathan Beyer wrote:
>  >
>  > > That patch wasn't applied 'as-is', so I wouldn't agree to that. However, I
>  > > would agree to making WeakHashMap and IdendityHashMap consistent, if not
>  > > just use the same code, with HashMap.
>  > >
>  > > -Nathan
>  > >
>  > > On Fri, Apr 18, 2008 at 2:26 PM, Sergey Salishev <
>  > > sergey.i.salishev@gmail.com> wrote:
>  > >
>  > >
>  > > > Nathan,
>  > > >
>  > > > Do you have any problems with applying the
>  > > > https://issues.apache.org/jira/browse/HARMONY-4064 patch to the
>  > > > WeakHashMap?
>  > > >
>  > > > Thanks,
>  > > > Sergey.
>  > > >
>  > > >
>  > > > On Fri, Apr 18, 2008 at 11:21 PM, Aleksey Shipilev <
>  > > >
>  > aleksey.shipilev@gmail.com<ht...@gmail.com>>
>  > > > wrote:
>  > > >
>  > > >
>  > > > > Ok, let's start over. Current implementation of *HM does not guarantee
>  > > > > the storage size is 2^k. On the other hand such the requirement is the
>  > > > > _prerequisite_ for performance optimization done in the patch. Thus
>  > > > > the rounding code is the essential part of the patch and can't be
>  > > > > removed. Removal of this code will lead to performance degradations.
>  > > > > We can inline this method into computeCapacity for convenience reasons
>  > > > > (I'm sorry now I hadn't done that already).
>  > > > >
>  > > > > Is there any problem with my arguments?
>  > > > >
>  > > > > Thanks,
>  > > > > Aleksey.
>  > > > >
>  > > > > On Fri, Apr 18, 2008 at 11:14 PM, Nathan Beyer
>  > <nd...@apache.org>>
>  > > > >
>  > > > wrote:
>  > > >
>  > > > >
>  > > > > >  I think the capacity calculation should be removed from the patch.
>  > > > > >
>  > > > >
>  > > > It's
>  > > >
>  > > > > NOT
>  > > > >
>  > > > > >  the performance boost, correct?
>  > > > > >
>  > > > >
>  > > >
>  > >
>  > >
>  >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Hi Tim,

Yeah, I think it makes sense to get IdentityHashMap and WeakHashMap to
be internally consistent with HashMap implementation. It would be
better to just inline roundTo2K() method to computeCapacity() and
that's all.

I will update JIRAs a little later, if Jimmy didn't help out there already.

Thanks,
Aleksey.


On Mon, Apr 21, 2008 at 4:38 PM, Tim Ellison <t....@gmail.com> wrote:
> Just catching up with this thread...
>
>  So it seems everyone agrees that making the number of buckets a power of
> two, and calculating that number using the same mechanism as already applied
> to HashMap is the way to go?
>
>  Regards,
>  Tim
>
>
>
>  Nathan Beyer wrote:
>
> > That patch wasn't applied 'as-is', so I wouldn't agree to that. However, I
> > would agree to making WeakHashMap and IdendityHashMap consistent, if not
> > just use the same code, with HashMap.
> >
> > -Nathan
> >
> > On Fri, Apr 18, 2008 at 2:26 PM, Sergey Salishev <
> > sergey.i.salishev@gmail.com> wrote:
> >
> >
> > > Nathan,
> > >
> > > Do you have any problems with applying the
> > > https://issues.apache.org/jira/browse/HARMONY-4064 patch to the
> > > WeakHashMap?
> > >
> > > Thanks,
> > > Sergey.
> > >
> > >
> > > On Fri, Apr 18, 2008 at 11:21 PM, Aleksey Shipilev <
> > >
> aleksey.shipilev@gmail.com<ht...@gmail.com>>
> > > wrote:
> > >
> > >
> > > > Ok, let's start over. Current implementation of *HM does not guarantee
> > > > the storage size is 2^k. On the other hand such the requirement is the
> > > > _prerequisite_ for performance optimization done in the patch. Thus
> > > > the rounding code is the essential part of the patch and can't be
> > > > removed. Removal of this code will lead to performance degradations.
> > > > We can inline this method into computeCapacity for convenience reasons
> > > > (I'm sorry now I hadn't done that already).
> > > >
> > > > Is there any problem with my arguments?
> > > >
> > > > Thanks,
> > > > Aleksey.
> > > >
> > > > On Fri, Apr 18, 2008 at 11:14 PM, Nathan Beyer
> <nd...@apache.org>>
> > > >
> > > wrote:
> > >
> > > >
> > > > >  I think the capacity calculation should be removed from the patch.
> > > > >
> > > >
> > > It's
> > >
> > > > NOT
> > > >
> > > > >  the performance boost, correct?
> > > > >
> > > >
> > >
> >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Tim Ellison <t....@gmail.com>.

Just catching up with this thread...

So it seems everyone agrees that making the number of buckets a power of 
two, and calculating that number using the same mechanism as already 
applied to HashMap is the way to go?

Regards,
Tim

Nathan Beyer wrote:
> That patch wasn't applied 'as-is', so I wouldn't agree to that. However, I
> would agree to making WeakHashMap and IdendityHashMap consistent, if not
> just use the same code, with HashMap.
> 
> -Nathan
> 
> On Fri, Apr 18, 2008 at 2:26 PM, Sergey Salishev <
> sergey.i.salishev@gmail.com> wrote:
> 
>> Nathan,
>>
>> Do you have any problems with applying the
>> https://issues.apache.org/jira/browse/HARMONY-4064 patch to the
>> WeakHashMap?
>>
>> Thanks,
>> Sergey.
>>
>>
>> On Fri, Apr 18, 2008 at 11:21 PM, Aleksey Shipilev <
>> aleksey.shipilev@gmail.com<ht...@gmail.com>>
>> wrote:
>>
>>> Ok, let's start over. Current implementation of *HM does not guarantee
>>> the storage size is 2^k. On the other hand such the requirement is the
>>> _prerequisite_ for performance optimization done in the patch. Thus
>>> the rounding code is the essential part of the patch and can't be
>>> removed. Removal of this code will lead to performance degradations.
>>> We can inline this method into computeCapacity for convenience reasons
>>> (I'm sorry now I hadn't done that already).
>>>
>>> Is there any problem with my arguments?
>>>
>>> Thanks,
>>> Aleksey.
>>>
>>> On Fri, Apr 18, 2008 at 11:14 PM, Nathan Beyer <nd...@apache.org>>
>> wrote:
>>>>  I think the capacity calculation should be removed from the patch.
>> It's
>>> NOT
>>>>  the performance boost, correct?
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

That patch wasn't applied 'as-is', so I wouldn't agree to that. However, I
would agree to making WeakHashMap and IdendityHashMap consistent, if not
just use the same code, with HashMap.

-Nathan

On Fri, Apr 18, 2008 at 2:26 PM, Sergey Salishev <
sergey.i.salishev@gmail.com> wrote:

> Nathan,
>
> Do you have any problems with applying the
> https://issues.apache.org/jira/browse/HARMONY-4064 patch to the
> WeakHashMap?
>
> Thanks,
> Sergey.
>
>
> On Fri, Apr 18, 2008 at 11:21 PM, Aleksey Shipilev <
> aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
>
> > Ok, let's start over. Current implementation of *HM does not guarantee
> > the storage size is 2^k. On the other hand such the requirement is the
> > _prerequisite_ for performance optimization done in the patch. Thus
> > the rounding code is the essential part of the patch and can't be
> > removed. Removal of this code will lead to performance degradations.
> > We can inline this method into computeCapacity for convenience reasons
> > (I'm sorry now I hadn't done that already).
> >
> > Is there any problem with my arguments?
> >
> > Thanks,
> > Aleksey.
> >
> > On Fri, Apr 18, 2008 at 11:14 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
> > >  I think the capacity calculation should be removed from the patch.
> It's
> > NOT
> > >  the performance boost, correct?
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Sergey Salishev <se...@gmail.com>.

Nathan,

Do you have any problems with applying the
https://issues.apache.org/jira/browse/HARMONY-4064 patch to the WeakHashMap?

Thanks,
Sergey.


On Fri, Apr 18, 2008 at 11:21 PM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Ok, let's start over. Current implementation of *HM does not guarantee
> the storage size is 2^k. On the other hand such the requirement is the
> _prerequisite_ for performance optimization done in the patch. Thus
> the rounding code is the essential part of the patch and can't be
> removed. Removal of this code will lead to performance degradations.
> We can inline this method into computeCapacity for convenience reasons
> (I'm sorry now I hadn't done that already).
>
> Is there any problem with my arguments?
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 11:14 PM, Nathan Beyer <nd...@apache.org> wrote:
> >  I think the capacity calculation should be removed from the patch. It's
> NOT
> >  the performance boost, correct?
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Ok, let's start over. Current implementation of *HM does not guarantee
the storage size is 2^k. On the other hand such the requirement is the
_prerequisite_ for performance optimization done in the patch. Thus
the rounding code is the essential part of the patch and can't be
removed. Removal of this code will lead to performance degradations.
We can inline this method into computeCapacity for convenience reasons
(I'm sorry now I hadn't done that already).

Is there any problem with my arguments?

Thanks,
Aleksey.

On Fri, Apr 18, 2008 at 11:14 PM, Nathan Beyer <nd...@apache.org> wrote:
>  I think the capacity calculation should be removed from the patch. It's NOT
>  the performance boost, correct?

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

If all you're talking about is the capacity calculation then, yes. If you're
talking about the entire patch, then no.

I think the capacity calculation should be removed from the patch. It's NOT
the performance boost, correct?

On Fri, Apr 18, 2008 at 2:10 PM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Ok, Nathan, let's stop for a while.
> Am I right that the problem is different-style implementation of
> basically the same thing in all three HashMaps? Am I right that we
> haven't any disagreements on the performance impact of such the
> optimization?
>
> I can elaborate on these issues if there are still misunderstandings :)
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 11:06 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
> > On Fri, Apr 18, 2008 at 1:53 PM, Sergey Salishev <
> >
> > sergey.i.salishev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> >
> > > Nathan,
> >  >
> >  > The "load factor" does't impact the capacity growth scale. It only
> governs
> >  > the persentage of the free space reserved inside the current
> capacity. So
> >  > if
> >  > the initial capacity is X the capacity growth scale would be 2X 4X
> 8X...
> >  > with all load factor values.
> >
> >
> >  Correct, it determines when the capacity should increase.
> >
> >
> >
> >  >
> >  >
> >  > The changes proposed by Alexey for WeakHashMap are already applied to
> >  > HashMap almost a year ago.
> >
> >
> >  That is not what I understand. The majority of the patch has nothing to
> do
> >  with the capacity calculation. Additionally, it's not the same; the
> method
> >  names are different, the algorithm is slightly different, etc. If this
> >  capacity calculation is good for all three HashMaps, then put the exact
> same
> >  code in all three HashMaps. I'll vote for consistency and clarity
> >  (java.util.HashMap's 'calculateCapacity' method is much clearer).
> >
> >  -Nathan
> >
> >
> >  >
> >  >
> >  > Thanks.
> >  > Sergey.
> >
> > >
> >  > On Fri, Apr 18, 2008 at 10:41 PM, Nathan Beyer <nd...@apache.org>
> <ht...@apache.org>>
> >  > wrote:
> >  >
> >  > > On Fri, Apr 18, 2008 at 12:57 PM, Sergey Salishev <
> >
> > > > sergey.i.salishev@gmail.com<ht...@gmail.com>
> <https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com
> >>
> >  > wrote:
> >  > >
> >  > > > Nathan
> >  > > >
> >
> >
> > > > > I agree with  you. This can be a problem. In my practice the
> biggest
> >  > > > problems were with 1-2 sized hash maps taking 16 size arrays.
> Probably
> >  > > the
> >  > > > best place to implement % optimization would bi CPU:)
> >  > > > We can try the different approach. The HashMap growth is governed
> only
> >  > > by
> >  > > > initial capacity so in the HashMap constructor one can say is 2^k
> or
> >  > > not.
> >  > >
> >  > >
> >  > > What about 'load factor'? That governs the growth as well.
> >  > >
> >  > > -Nathan
> >  > >
> >  > >
> >  > > >
> >  > > >
> >  > > > We can write something like
> >  > > >
> >  > > > <code>
> >  > > > final boolean powOf2 = isPowOf2(capacity);
> >  > > >
> >  > > > final int mod(int index, int length) {
> >  > > >     return powOf2? index & (length - 1) : index % length;
> >  > > > }
> >  > > > <code>
> >  > > >
> >  > > > In theory the common path for the workload should be specialized
> by
> >  > JIT.
> >  > > >
> >  > > > Ooops. When looking into HashMap code I've noticed someone
> already did
> >  > > > this.
> >  > > > https://issues.apache.org/jira/browse/HARMONY-4064
> >  > > > HashMap effectively does the rounding to the neares 2^k.
> >  > > >
> >  > > > Thanks.
> >  > > > Sergey.
> >  > > > On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <nd...@apache.org>
> <ht...@apache.org>
> >  > <
> >  > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
> >  > > > wrote:
> >  > > >
> >  > > > > I know that people will notice; I have personal experience with
> >  > > systems
> >  > > > > that
> >  > > > > included custom Map implementations were written because
> HashMaps
> >  > grew
> >  > > > too
> >  > > > > large for small data sets (less than 2000, actually) and wasted
> a
> >  > lot
> >  > > of
> >  > > > > memory for the unnecessary capacity. Even the use of the
> capacity
> >  > and
> >  > > > load
> >  > > > > factor didn't provide enough compensation in these cases.
> >  > > > >
> >  > > > > -Nathan
> >  > > > >
> >  > > > > On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
> >  > > > > sergey.i.salishev@gmail.com<ht...@gmail.com>
> <ht...@gmail.com>
> >  > <
> >  > >
> https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com
> >  > >>
> >  > >  > wrote:
> >  > > > >
> >  > > > > > Nathan,
> >  > > > > >
> >  > > > > > I don't think anyone will notice the hash map size rounding.
> It
> >  > can
> >  > > > lead
> >  > > > > > to
> >  > > > > > some memory overhead in very rare cases the user creates hash
> map
> >  > > with
> >  > > > > the
> >  > > > > > exact size. But in the most common case where the map is
> created
> >  > > with
> >  > > > > > default size the rounding will not change the behavior at all
> as
> >  > > it's
> >  > > > in
> >  > > > > > agreement with the standard 2x growth policy. On the other
> hand
> >  > size
> >  > > > > > rounding gives substantial performance boost on all gets.
> >  > > > > >
> >  > > > > > Thanks.
> >  > > > > > Sergey.
> >  > > > > >
> >  > > > > >
> >  > > > > > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <
> ndbeyer@apache.org<ht...@apache.org>
> <ht...@apache.org>
> >  > <
> >  > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>
> >  > > > <
> >  > > > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org
> >>
> >  > > > > > wrote:
> >  > > > > >
> >  > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> >  > > > > > >
> >  > > > > > > Again, I don't agree with the capacity rounding in the
> patch
> >  > > > attached
> >  > > > > to
> >  > > > > > > this issue. I do like the change to the internal data
> structure;
> >  > > use
> >  > > > > two
> >  > > > > > > arrays for key/value instead a single array. It makes the
> code
> >  > > > easier
> >  > > > > to
> >  > > > > > > read.
> >  > > > > > >
> >  > > > > > > -Nathan
> >  > > > > > >
> >  > > > > > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> >  > > > > > > aleksey.shipilev@gmail.com<ht...@gmail.com>
> <ht...@gmail.com>
> >  > <
> >  > >
> https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com>
> >  > > > <
> >  > > > >
> >  > >
> https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com
> >  > > > >>
> >  > > > >  > wrote:
> >  > > > > > >
> >  > > > > > >  > Colleagues,
> >  > > > > > > >
> >  > > > > > > > I had recently filed two JIRAs with improvements in
> >  > Collections,
> >  > > > > > > > giving up to +30-40% to serialization benchmarks.
> Presumably
> >  > > they
> >  > > > > will
> >  > > > > > > > boost the performance across the all users since the
> >  > > optimization
> >  > > > is
> >  > > > > > > > pretty general:
> >  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5761
> >  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> >  > > > > > > >
> >  > > > > > > > Would some classlib guru (Tim, Nathan, Tony?) review and
> >  > commit
> >  > > > > them?
> >  > > > > > > >
> >  > > > > > > > Thanks,
> >  > > > > > > > Aleksey.
> >  > > > > > > >
> >  > > > > > >
> >  > > > > >
> >  > > > >
> >  > > >
> >  > >
> >  >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Ok, Nathan, let's stop for a while.
Am I right that the problem is different-style implementation of
basically the same thing in all three HashMaps? Am I right that we
haven't any disagreements on the performance impact of such the
optimization?

I can elaborate on these issues if there are still misunderstandings :)

Thanks,
Aleksey.

On Fri, Apr 18, 2008 at 11:06 PM, Nathan Beyer <nd...@apache.org> wrote:
> On Fri, Apr 18, 2008 at 1:53 PM, Sergey Salishev <
>
> sergey.i.salishev@gmail.com> wrote:
>
>
> > Nathan,
>  >
>  > The "load factor" does't impact the capacity growth scale. It only governs
>  > the persentage of the free space reserved inside the current capacity. So
>  > if
>  > the initial capacity is X the capacity growth scale would be 2X 4X 8X...
>  > with all load factor values.
>
>
>  Correct, it determines when the capacity should increase.
>
>
>
>  >
>  >
>  > The changes proposed by Alexey for WeakHashMap are already applied to
>  > HashMap almost a year ago.
>
>
>  That is not what I understand. The majority of the patch has nothing to do
>  with the capacity calculation. Additionally, it's not the same; the method
>  names are different, the algorithm is slightly different, etc. If this
>  capacity calculation is good for all three HashMaps, then put the exact same
>  code in all three HashMaps. I'll vote for consistency and clarity
>  (java.util.HashMap's 'calculateCapacity' method is much clearer).
>
>  -Nathan
>
>
>  >
>  >
>  > Thanks.
>  > Sergey.
>
> >
>  > On Fri, Apr 18, 2008 at 10:41 PM, Nathan Beyer <nd...@apache.org>>
>  > wrote:
>  >
>  > > On Fri, Apr 18, 2008 at 12:57 PM, Sergey Salishev <
>
> > > sergey.i.salishev@gmail.com<ht...@gmail.com>>
>  > wrote:
>  > >
>  > > > Nathan
>  > > >
>
>
> > > > I agree with  you. This can be a problem. In my practice the biggest
>  > > > problems were with 1-2 sized hash maps taking 16 size arrays. Probably
>  > > the
>  > > > best place to implement % optimization would bi CPU:)
>  > > > We can try the different approach. The HashMap growth is governed only
>  > > by
>  > > > initial capacity so in the HashMap constructor one can say is 2^k or
>  > > not.
>  > >
>  > >
>  > > What about 'load factor'? That governs the growth as well.
>  > >
>  > > -Nathan
>  > >
>  > >
>  > > >
>  > > >
>  > > > We can write something like
>  > > >
>  > > > <code>
>  > > > final boolean powOf2 = isPowOf2(capacity);
>  > > >
>  > > > final int mod(int index, int length) {
>  > > >     return powOf2? index & (length - 1) : index % length;
>  > > > }
>  > > > <code>
>  > > >
>  > > > In theory the common path for the workload should be specialized by
>  > JIT.
>  > > >
>  > > > Ooops. When looking into HashMap code I've noticed someone already did
>  > > > this.
>  > > > https://issues.apache.org/jira/browse/HARMONY-4064
>  > > > HashMap effectively does the rounding to the neares 2^k.
>  > > >
>  > > > Thanks.
>  > > > Sergey.
>  > > > On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <nd...@apache.org>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
>  > > > wrote:
>  > > >
>  > > > > I know that people will notice; I have personal experience with
>  > > systems
>  > > > > that
>  > > > > included custom Map implementations were written because HashMaps
>  > grew
>  > > > too
>  > > > > large for small data sets (less than 2000, actually) and wasted a
>  > lot
>  > > of
>  > > > > memory for the unnecessary capacity. Even the use of the capacity
>  > and
>  > > > load
>  > > > > factor didn't provide enough compensation in these cases.
>  > > > >
>  > > > > -Nathan
>  > > > >
>  > > > > On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
>  > > > > sergey.i.salishev@gmail.com<ht...@gmail.com>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com
>  > >>
>  > >  > wrote:
>  > > > >
>  > > > > > Nathan,
>  > > > > >
>  > > > > > I don't think anyone will notice the hash map size rounding. It
>  > can
>  > > > lead
>  > > > > > to
>  > > > > > some memory overhead in very rare cases the user creates hash map
>  > > with
>  > > > > the
>  > > > > > exact size. But in the most common case where the map is created
>  > > with
>  > > > > > default size the rounding will not change the behavior at all as
>  > > it's
>  > > > in
>  > > > > > agreement with the standard 2x growth policy. On the other hand
>  > size
>  > > > > > rounding gives substantial performance boost on all gets.
>  > > > > >
>  > > > > > Thanks.
>  > > > > > Sergey.
>  > > > > >
>  > > > > >
>  > > > > > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <nd...@apache.org>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>
>  > > > <
>  > > > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
>  > > > > > wrote:
>  > > > > >
>  > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
>  > > > > > >
>  > > > > > > Again, I don't agree with the capacity rounding in the patch
>  > > > attached
>  > > > > to
>  > > > > > > this issue. I do like the change to the internal data structure;
>  > > use
>  > > > > two
>  > > > > > > arrays for key/value instead a single array. It makes the code
>  > > > easier
>  > > > > to
>  > > > > > > read.
>  > > > > > >
>  > > > > > > -Nathan
>  > > > > > >
>  > > > > > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
>  > > > > > > aleksey.shipilev@gmail.com<ht...@gmail.com>
>  > <
>  > > https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com>
>  > > > <
>  > > > >
>  > > https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com
>  > > > >>
>  > > > >  > wrote:
>  > > > > > >
>  > > > > > >  > Colleagues,
>  > > > > > > >
>  > > > > > > > I had recently filed two JIRAs with improvements in
>  > Collections,
>  > > > > > > > giving up to +30-40% to serialization benchmarks. Presumably
>  > > they
>  > > > > will
>  > > > > > > > boost the performance across the all users since the
>  > > optimization
>  > > > is
>  > > > > > > > pretty general:
>  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5761
>  > > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
>  > > > > > > >
>  > > > > > > > Would some classlib guru (Tim, Nathan, Tony?) review and
>  > commit
>  > > > > them?
>  > > > > > > >
>  > > > > > > > Thanks,
>  > > > > > > > Aleksey.
>  > > > > > > >
>  > > > > > >
>  > > > > >
>  > > > >
>  > > >
>  > >
>  >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

On Fri, Apr 18, 2008 at 1:53 PM, Sergey Salishev <
sergey.i.salishev@gmail.com> wrote:

> Nathan,
>
> The "load factor" does't impact the capacity growth scale. It only governs
> the persentage of the free space reserved inside the current capacity. So
> if
> the initial capacity is X the capacity growth scale would be 2X 4X 8X...
> with all load factor values.


Correct, it determines when the capacity should increase.


>
>
> The changes proposed by Alexey for WeakHashMap are already applied to
> HashMap almost a year ago.


That is not what I understand. The majority of the patch has nothing to do
with the capacity calculation. Additionally, it's not the same; the method
names are different, the algorithm is slightly different, etc. If this
capacity calculation is good for all three HashMaps, then put the exact same
code in all three HashMaps. I'll vote for consistency and clarity
(java.util.HashMap's 'calculateCapacity' method is much clearer).

-Nathan


>
>
> Thanks.
> Sergey.
>
> On Fri, Apr 18, 2008 at 10:41 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
>
> > On Fri, Apr 18, 2008 at 12:57 PM, Sergey Salishev <
> > sergey.i.salishev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> > > Nathan
> > >
> > > I agree with  you. This can be a problem. In my practice the biggest
> > > problems were with 1-2 sized hash maps taking 16 size arrays. Probably
> > the
> > > best place to implement % optimization would bi CPU:)
> > > We can try the different approach. The HashMap growth is governed only
> > by
> > > initial capacity so in the HashMap constructor one can say is 2^k or
> > not.
> >
> >
> > What about 'load factor'? That governs the growth as well.
> >
> > -Nathan
> >
> >
> > >
> > >
> > > We can write something like
> > >
> > > <code>
> > > final boolean powOf2 = isPowOf2(capacity);
> > >
> > > final int mod(int index, int length) {
> > >     return powOf2? index & (length - 1) : index % length;
> > > }
> > > <code>
> > >
> > > In theory the common path for the workload should be specialized by
> JIT.
> > >
> > > Ooops. When looking into HashMap code I've noticed someone already did
> > > this.
> > > https://issues.apache.org/jira/browse/HARMONY-4064
> > > HashMap effectively does the rounding to the neares 2^k.
> > >
> > > Thanks.
> > > Sergey.
> > > On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <nd...@apache.org>
> <
> > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
> > > wrote:
> > >
> > > > I know that people will notice; I have personal experience with
> > systems
> > > > that
> > > > included custom Map implementations were written because HashMaps
> grew
> > > too
> > > > large for small data sets (less than 2000, actually) and wasted a
> lot
> > of
> > > > memory for the unnecessary capacity. Even the use of the capacity
> and
> > > load
> > > > factor didn't provide enough compensation in these cases.
> > > >
> > > > -Nathan
> > > >
> > > > On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
> > > > sergey.i.salishev@gmail.com<ht...@gmail.com>
> <
> > https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com
> >>
> >  > wrote:
> > > >
> > > > > Nathan,
> > > > >
> > > > > I don't think anyone will notice the hash map size rounding. It
> can
> > > lead
> > > > > to
> > > > > some memory overhead in very rare cases the user creates hash map
> > with
> > > > the
> > > > > exact size. But in the most common case where the map is created
> > with
> > > > > default size the rounding will not change the behavior at all as
> > it's
> > > in
> > > > > agreement with the standard 2x growth policy. On the other hand
> size
> > > > > rounding gives substantial performance boost on all gets.
> > > > >
> > > > > Thanks.
> > > > > Sergey.
> > > > >
> > > > >
> > > > > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <nd...@apache.org>
> <
> > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>
> > > <
> > > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
> > > > > wrote:
> > > > >
> > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> > > > > >
> > > > > > Again, I don't agree with the capacity rounding in the patch
> > > attached
> > > > to
> > > > > > this issue. I do like the change to the internal data structure;
> > use
> > > > two
> > > > > > arrays for key/value instead a single array. It makes the code
> > > easier
> > > > to
> > > > > > read.
> > > > > >
> > > > > > -Nathan
> > > > > >
> > > > > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> > > > > > aleksey.shipilev@gmail.com<ht...@gmail.com>
> <
> > https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com>
> > > <
> > > >
> > https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com
> > > >>
> > > >  > wrote:
> > > > > >
> > > > > >  > Colleagues,
> > > > > > >
> > > > > > > I had recently filed two JIRAs with improvements in
> Collections,
> > > > > > > giving up to +30-40% to serialization benchmarks. Presumably
> > they
> > > > will
> > > > > > > boost the performance across the all users since the
> > optimization
> > > is
> > > > > > > pretty general:
> > > > > > > https://issues.apache.org/jira/browse/HARMONY-5761
> > > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> > > > > > >
> > > > > > > Would some classlib guru (Tim, Nathan, Tony?) review and
> commit
> > > > them?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Aleksey.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Sergey Salishev <se...@gmail.com>.

Nathan,

The "load factor" does't impact the capacity growth scale. It only governs
the persentage of the free space reserved inside the current capacity. So if
the initial capacity is X the capacity growth scale would be 2X 4X 8X...
with all load factor values.

The changes proposed by Alexey for WeakHashMap are already applied to
HashMap almost a year ago.

Thanks.
Sergey.

On Fri, Apr 18, 2008 at 10:41 PM, Nathan Beyer <nd...@apache.org> wrote:

> On Fri, Apr 18, 2008 at 12:57 PM, Sergey Salishev <
> sergey.i.salishev@gmail.com> wrote:
>
> > Nathan
> >
> > I agree with  you. This can be a problem. In my practice the biggest
> > problems were with 1-2 sized hash maps taking 16 size arrays. Probably
> the
> > best place to implement % optimization would bi CPU:)
> > We can try the different approach. The HashMap growth is governed only
> by
> > initial capacity so in the HashMap constructor one can say is 2^k or
> not.
>
>
> What about 'load factor'? That governs the growth as well.
>
> -Nathan
>
>
> >
> >
> > We can write something like
> >
> > <code>
> > final boolean powOf2 = isPowOf2(capacity);
> >
> > final int mod(int index, int length) {
> >     return powOf2? index & (length - 1) : index % length;
> > }
> > <code>
> >
> > In theory the common path for the workload should be specialized by JIT.
> >
> > Ooops. When looking into HashMap code I've noticed someone already did
> > this.
> > https://issues.apache.org/jira/browse/HARMONY-4064
> > HashMap effectively does the rounding to the neares 2^k.
> >
> > Thanks.
> > Sergey.
> > On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <ndbeyer@apache.org<
> https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
> > wrote:
> >
> > > I know that people will notice; I have personal experience with
> systems
> > > that
> > > included custom Map implementations were written because HashMaps grew
> > too
> > > large for small data sets (less than 2000, actually) and wasted a lot
> of
> > > memory for the unnecessary capacity. Even the use of the capacity and
> > load
> > > factor didn't provide enough compensation in these cases.
> > >
> > > -Nathan
> > >
> > > On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
> > > sergey.i.salishev@gmail.com<
> https://mail.google.com/mail?view=cm&tf=0&to=sergey.i.salishev@gmail.com>>
>  > wrote:
> > >
> > > > Nathan,
> > > >
> > > > I don't think anyone will notice the hash map size rounding. It can
> > lead
> > > > to
> > > > some memory overhead in very rare cases the user creates hash map
> with
> > > the
> > > > exact size. But in the most common case where the map is created
> with
> > > > default size the rounding will not change the behavior at all as
> it's
> > in
> > > > agreement with the standard 2x growth policy. On the other hand size
> > > > rounding gives substantial performance boost on all gets.
> > > >
> > > > Thanks.
> > > > Sergey.
> > > >
> > > >
> > > > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <ndbeyer@apache.org<
> https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>
> > <
> > > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
> > > > wrote:
> > > >
> > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> > > > >
> > > > > Again, I don't agree with the capacity rounding in the patch
> > attached
> > > to
> > > > > this issue. I do like the change to the internal data structure;
> use
> > > two
> > > > > arrays for key/value instead a single array. It makes the code
> > easier
> > > to
> > > > > read.
> > > > >
> > > > > -Nathan
> > > > >
> > > > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> > > > > aleksey.shipilev@gmail.com<
> https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com>
> > <
> > >
> https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com
> > >>
> > >  > wrote:
> > > > >
> > > > >  > Colleagues,
> > > > > >
> > > > > > I had recently filed two JIRAs with improvements in Collections,
> > > > > > giving up to +30-40% to serialization benchmarks. Presumably
> they
> > > will
> > > > > > boost the performance across the all users since the
> optimization
> > is
> > > > > > pretty general:
> > > > > > https://issues.apache.org/jira/browse/HARMONY-5761
> > > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> > > > > >
> > > > > > Would some classlib guru (Tim, Nathan, Tony?) review and commit
> > > them?
> > > > > >
> > > > > > Thanks,
> > > > > > Aleksey.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

On Fri, Apr 18, 2008 at 12:57 PM, Sergey Salishev <
sergey.i.salishev@gmail.com> wrote:

> Nathan
>
> I agree with  you. This can be a problem. In my practice the biggest
> problems were with 1-2 sized hash maps taking 16 size arrays. Probably the
> best place to implement % optimization would bi CPU:)
> We can try the different approach. The HashMap growth is governed only by
> initial capacity so in the HashMap constructor one can say is 2^k or not.


What about 'load factor'? That governs the growth as well.

-Nathan


>
>
> We can write something like
>
> <code>
> final boolean powOf2 = isPowOf2(capacity);
>
> final int mod(int index, int length) {
>     return powOf2? index & (length - 1) : index % length;
> }
> <code>
>
> In theory the common path for the workload should be specialized by JIT.
>
> Ooops. When looking into HashMap code I've noticed someone already did
> this.
> https://issues.apache.org/jira/browse/HARMONY-4064
> HashMap effectively does the rounding to the neares 2^k.
>
> Thanks.
> Sergey.
> On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
>
> > I know that people will notice; I have personal experience with systems
> > that
> > included custom Map implementations were written because HashMaps grew
> too
> > large for small data sets (less than 2000, actually) and wasted a lot of
> > memory for the unnecessary capacity. Even the use of the capacity and
> load
> > factor didn't provide enough compensation in these cases.
> >
> > -Nathan
> >
> > On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
> > sergey.i.salishev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> > > Nathan,
> > >
> > > I don't think anyone will notice the hash map size rounding. It can
> lead
> > > to
> > > some memory overhead in very rare cases the user creates hash map with
> > the
> > > exact size. But in the most common case where the map is created with
> > > default size the rounding will not change the behavior at all as it's
> in
> > > agreement with the standard 2x growth policy. On the other hand size
> > > rounding gives substantial performance boost on all gets.
> > >
> > > Thanks.
> > > Sergey.
> > >
> > >
> > > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <nd...@apache.org>
> <
> > https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
> > > wrote:
> > >
> > > > https://issues.apache.org/jira/browse/HARMONY-5718
> > > >
> > > > Again, I don't agree with the capacity rounding in the patch
> attached
> > to
> > > > this issue. I do like the change to the internal data structure; use
> > two
> > > > arrays for key/value instead a single array. It makes the code
> easier
> > to
> > > > read.
> > > >
> > > > -Nathan
> > > >
> > > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> > > > aleksey.shipilev@gmail.com<ht...@gmail.com>
> <
> > https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com
> >>
> >  > wrote:
> > > >
> > > >  > Colleagues,
> > > > >
> > > > > I had recently filed two JIRAs with improvements in Collections,
> > > > > giving up to +30-40% to serialization benchmarks. Presumably they
> > will
> > > > > boost the performance across the all users since the optimization
> is
> > > > > pretty general:
> > > > > https://issues.apache.org/jira/browse/HARMONY-5761
> > > > > https://issues.apache.org/jira/browse/HARMONY-5718
> > > > >
> > > > > Would some classlib guru (Tim, Nathan, Tony?) review and commit
> > them?
> > > > >
> > > > > Thanks,
> > > > > Aleksey.
> > > > >
> > > >
> > >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Sergey Salishev <se...@gmail.com>.

Nathan

I agree with  you. This can be a problem. In my practice the biggest
problems were with 1-2 sized hash maps taking 16 size arrays. Probably the
best place to implement % optimization would bi CPU:)
We can try the different approach. The HashMap growth is governed only by
initial capacity so in the HashMap constructor one can say is 2^k or not.

We can write something like

<code>
final boolean powOf2 = isPowOf2(capacity);

final int mod(int index, int length) {
     return powOf2? index & (length - 1) : index % length;
}
<code>

In theory the common path for the workload should be specialized by JIT.

Ooops. When looking into HashMap code I've noticed someone already did this.
https://issues.apache.org/jira/browse/HARMONY-4064
HashMap effectively does the rounding to the neares 2^k.

Thanks.
Sergey.
On Fri, Apr 18, 2008 at 8:55 PM, Nathan Beyer <nd...@apache.org> wrote:

> I know that people will notice; I have personal experience with systems
> that
> included custom Map implementations were written because HashMaps grew too
> large for small data sets (less than 2000, actually) and wasted a lot of
> memory for the unnecessary capacity. Even the use of the capacity and load
> factor didn't provide enough compensation in these cases.
>
> -Nathan
>
> On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
> sergey.i.salishev@gmail.com> wrote:
>
> > Nathan,
> >
> > I don't think anyone will notice the hash map size rounding. It can lead
> > to
> > some memory overhead in very rare cases the user creates hash map with
> the
> > exact size. But in the most common case where the map is created with
> > default size the rounding will not change the behavior at all as it's in
> > agreement with the standard 2x growth policy. On the other hand size
> > rounding gives substantial performance boost on all gets.
> >
> > Thanks.
> > Sergey.
> >
> >
> > On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <ndbeyer@apache.org<
> https://mail.google.com/mail?view=cm&tf=0&to=ndbeyer@apache.org>>
> > wrote:
> >
> > > https://issues.apache.org/jira/browse/HARMONY-5718
> > >
> > > Again, I don't agree with the capacity rounding in the patch attached
> to
> > > this issue. I do like the change to the internal data structure; use
> two
> > > arrays for key/value instead a single array. It makes the code easier
> to
> > > read.
> > >
> > > -Nathan
> > >
> > > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> > > aleksey.shipilev@gmail.com<
> https://mail.google.com/mail?view=cm&tf=0&to=aleksey.shipilev@gmail.com>>
>  > wrote:
> > >
> > >  > Colleagues,
> > > >
> > > > I had recently filed two JIRAs with improvements in Collections,
> > > > giving up to +30-40% to serialization benchmarks. Presumably they
> will
> > > > boost the performance across the all users since the optimization is
> > > > pretty general:
> > > > https://issues.apache.org/jira/browse/HARMONY-5761
> > > > https://issues.apache.org/jira/browse/HARMONY-5718
> > > >
> > > > Would some classlib guru (Tim, Nathan, Tony?) review and commit
> them?
> > > >
> > > > Thanks,
> > > > Aleksey.
> > > >
> > >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

I know that people will notice; I have personal experience with systems that
included custom Map implementations were written because HashMaps grew too
large for small data sets (less than 2000, actually) and wasted a lot of
memory for the unnecessary capacity. Even the use of the capacity and load
factor didn't provide enough compensation in these cases.

-Nathan

On Fri, Apr 18, 2008 at 11:46 AM, Sergey Salishev <
sergey.i.salishev@gmail.com> wrote:

> Nathan,
>
> I don't think anyone will notice the hash map size rounding. It can lead
> to
> some memory overhead in very rare cases the user creates hash map with the
> exact size. But in the most common case where the map is created with
> default size the rounding will not change the behavior at all as it's in
> agreement with the standard 2x growth policy. On the other hand size
> rounding gives substantial performance boost on all gets.
>
> Thanks.
> Sergey.
>
>
> On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
>
> > https://issues.apache.org/jira/browse/HARMONY-5718
> >
> > Again, I don't agree with the capacity rounding in the patch attached to
> > this issue. I do like the change to the internal data structure; use two
> > arrays for key/value instead a single array. It makes the code easier to
> > read.
> >
> > -Nathan
> >
> > On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> > aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> >  > Colleagues,
> > >
> > > I had recently filed two JIRAs with improvements in Collections,
> > > giving up to +30-40% to serialization benchmarks. Presumably they will
> > > boost the performance across the all users since the optimization is
> > > pretty general:
> > > https://issues.apache.org/jira/browse/HARMONY-5761
> > > https://issues.apache.org/jira/browse/HARMONY-5718
> > >
> > > Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
> > >
> > > Thanks,
> > > Aleksey.
> > >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Sergey Salishev <se...@gmail.com>.

Nathan,

I don't think anyone will notice the hash map size rounding. It can lead to
some memory overhead in very rare cases the user creates hash map with the
exact size. But in the most common case where the map is created with
default size the rounding will not change the behavior at all as it's in
agreement with the standard 2x growth policy. On the other hand size
rounding gives substantial performance boost on all gets.

Thanks.
Sergey.

On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <nd...@apache.org> wrote:

> https://issues.apache.org/jira/browse/HARMONY-5718
>
> Again, I don't agree with the capacity rounding in the patch attached to
> this issue. I do like the change to the internal data structure; use two
> arrays for key/value instead a single array. It makes the code easier to
> read.
>
> -Nathan
>
> On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> aleksey.shipilev@gmail.com> wrote:
>
>  > Colleagues,
> >
> > I had recently filed two JIRAs with improvements in Collections,
> > giving up to +30-40% to serialization benchmarks. Presumably they will
> > boost the performance across the all users since the optimization is
> > pretty general:
> > https://issues.apache.org/jira/browse/HARMONY-5761
> > https://issues.apache.org/jira/browse/HARMONY-5718
> >
> > Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
> >
> > Thanks,
> > Aleksey.
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

If this isn't part of the performance boost claimed, then pull it out of
this issue.

-Nathan

On Fri, Apr 18, 2008 at 12:10 PM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Nathan,
>
> The method name is inaccurate, right, it can be named like you want.
> The reason of performance boost _is_not_ in this method (nevertheless
> this method probably is fastest available for rounding). This method
> is utility and I see no point in moving it to Math.
>
> The reason of the performance boost is:
> > On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
> aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
> >> I'm pushing the idea that
> >> this optimization is general because it moves long-latency modulo (%)
> >> operations to really cheap mask (&), but to guarantee this, we should
> >> guarantee the storage is 2^k, where k is 0..N.
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 9:03 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
> > So the method 'roundTo2k' is incorrectly named, then? Should it be
> >  'roundTo2ToTheK'? Isn't there a Math method that should be used to do
> this,
> >  which in turn should be the focus of optimization? I'd rather see such
> a
> >  "general" optimization pushed into Math and then have all of the
> XHashMap
> >  implementations use the Math methods consistently.
> >
> >  -Nathan
> >
> >  On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
> >
> > aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> >
> > > Nathan,
> >  >
> >  > I think we have a sort of misunderstanding here. I'm not rounding to
> >  > 2000, rather I round to next power of 2. I'm pushing the idea that
> >  > this optimization is general because it moves long-latency modulo (%)
> >  > operations to really cheap mask (&), but to guarantee this, we should
> >  > guarantee the storage is 2^k, where k is 0..N.
> >  >
> >  > Thanks,
> >  > Aleksey.
> >  >
> >  > On Fri, Apr 18, 2008 at 8:44 PM, Nathan Beyer <nd...@apache.org>
> <ht...@apache.org>>
> >
> >
> > > wrote:
> >  > >  I'm sure this optimization shows an improvement in the
> serialization
> >  > use
> >  > >  case, but you'd be hard pressed to say that this improvement will
> make
> >  > 80%
> >  > >  of all uses of HashMap, WeakHashMap and IdentityHashMap better. If
> you
> >  > want
> >  > >  to round to 2000 to improve serialization, then do that in the
> >  > >  serialization.
> >  > >
> >  > >  I don't think this should be applied as is.
> >  > >
> >  > >  -Nathan
> >  > >
> >  >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

Yes, I'm well aware of all of hashing algorithms. I'm talking about the
patches --- the code in the patch wasn't clear. I'm very particular about
what code does and what is says it does.

In any case, didn't you just comment that this capacity rounding WAS NOT
what gives the performance boost? I'm confused.

-Nathan

On Fri, Apr 18, 2008 at 1:59 PM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Nathan,
>
> I'm afraid you had messed up two things: "computing the n-th power of
> 2" and "rounding to next power of 2". First one could be implemented
> using Math.pow(...), while second does not. This method is essential
> part of the issue since it provides the rounding facility that is used
> further for performance optimization.
>
> There is nice Wikipedia article on hash tables describing load factor
> and its impact on performance: http://en.wikipedia.org/wiki/Hash_table
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 10:40 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
> > BTW - I wasn't suggesting moving this to Math, I was asking why write a
> >  method for executing a 2^K that's already in Math [1]. I'm not keen on
> >  optimizing things in an language that's compiled to an interpreted
> bytecode
> >  -- that's the whole point of JITs. Keep the code clean, correct and
> easy to
> >  maintain.
> >
> >  [1]
> >
> http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Math.html#pow(double,%20double)<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Math.html#pow%28double,%20double%29>
> >
> >  On Fri, Apr 18, 2008 at 12:10 PM, Aleksey Shipilev <
> >
> > aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> >  > Nathan,
> >  >
> >
> > > The method name is inaccurate, right, it can be named like you want.
> >  > The reason of performance boost _is_not_ in this method (nevertheless
> >  > this method probably is fastest available for rounding). This method
> >  > is utility and I see no point in moving it to Math.
> >  >
> >  > The reason of the performance boost is:
> >  > > On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
> >  > aleksey.shipilev@gmail.com<ht...@gmail.com>
> <ht...@gmail.com>>
> >
> > > wrote:
> >  > >> I'm pushing the idea that
> >  > >> this optimization is general because it moves long-latency modulo
> (%)
> >  > >> operations to really cheap mask (&), but to guarantee this, we
> should
> >  > >> guarantee the storage is 2^k, where k is 0..N.
> >  >
> >  > Thanks,
> >  > Aleksey.
> >  >
> >
> > > On Fri, Apr 18, 2008 at 9:03 PM, Nathan Beyer <nd...@apache.org>
> <ht...@apache.org>>
> >  > wrote:
> >  > > So the method 'roundTo2k' is incorrectly named, then? Should it be
> >  > >  'roundTo2ToTheK'? Isn't there a Math method that should be used to
> do
> >  > this,
> >  > >  which in turn should be the focus of optimization? I'd rather see
> such
> >  > a
> >  > >  "general" optimization pushed into Math and then have all of the
> >  > XHashMap
> >  > >  implementations use the Math methods consistently.
> >  > >
> >  > >  -Nathan
> >  > >
> >  > >  On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
> >  > >
> >  > > aleksey.shipilev@gmail.com<ht...@gmail.com>
> <ht...@gmail.com>>
> >
> >
> > > wrote:
> >  > >
> >  > >
> >  > > > Nathan,
> >  > >  >
> >  > >  > I think we have a sort of misunderstanding here. I'm not
> rounding to
> >  > >  > 2000, rather I round to next power of 2. I'm pushing the idea
> that
> >  > >  > this optimization is general because it moves long-latency
> modulo (%)
> >  > >  > operations to really cheap mask (&), but to guarantee this, we
> should
> >  > >  > guarantee the storage is 2^k, where k is 0..N.
> >  > >  >
> >  > >  > Thanks,
> >  > >  > Aleksey.
> >  > >  >
> >  > >  > On Fri, Apr 18, 2008 at 8:44 PM, Nathan Beyer <
> ndbeyer@apache.org<ht...@apache.org>
> <ht...@apache.org>
> >  > <ht...@apache.org>>
> >  > >
> >  > >
> >  > > > wrote:
> >  > >  > >  I'm sure this optimization shows an improvement in the
> >  > serialization
> >  > >  > use
> >  > >  > >  case, but you'd be hard pressed to say that this improvement
> will
> >  > make
> >  > >  > 80%
> >  > >  > >  of all uses of HashMap, WeakHashMap and IdentityHashMap
> better. If
> >  > you
> >  > >  > want
> >  > >  > >  to round to 2000 to improve serialization, then do that in
> the
> >  > >  > >  serialization.
> >  > >  > >
> >  > >  > >  I don't think this should be applied as is.
> >  > >  > >
> >  > >  > >  -Nathan
> >  > >  > >
> >  > >  >
> >  > >
> >  >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Nathan,

I'm afraid you had messed up two things: "computing the n-th power of
2" and "rounding to next power of 2". First one could be implemented
using Math.pow(...), while second does not. This method is essential
part of the issue since it provides the rounding facility that is used
further for performance optimization.

There is nice Wikipedia article on hash tables describing load factor
and its impact on performance: http://en.wikipedia.org/wiki/Hash_table

Thanks,
Aleksey.

On Fri, Apr 18, 2008 at 10:40 PM, Nathan Beyer <nd...@apache.org> wrote:
> BTW - I wasn't suggesting moving this to Math, I was asking why write a
>  method for executing a 2^K that's already in Math [1]. I'm not keen on
>  optimizing things in an language that's compiled to an interpreted bytecode
>  -- that's the whole point of JITs. Keep the code clean, correct and easy to
>  maintain.
>
>  [1]
>  http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Math.html#pow(double,%20double)
>
>  On Fri, Apr 18, 2008 at 12:10 PM, Aleksey Shipilev <
>
> aleksey.shipilev@gmail.com> wrote:
>
>  > Nathan,
>  >
>
> > The method name is inaccurate, right, it can be named like you want.
>  > The reason of performance boost _is_not_ in this method (nevertheless
>  > this method probably is fastest available for rounding). This method
>  > is utility and I see no point in moving it to Math.
>  >
>  > The reason of the performance boost is:
>  > > On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
>  > aleksey.shipilev@gmail.com<ht...@gmail.com>>
>
> > wrote:
>  > >> I'm pushing the idea that
>  > >> this optimization is general because it moves long-latency modulo (%)
>  > >> operations to really cheap mask (&), but to guarantee this, we should
>  > >> guarantee the storage is 2^k, where k is 0..N.
>  >
>  > Thanks,
>  > Aleksey.
>  >
>
> > On Fri, Apr 18, 2008 at 9:03 PM, Nathan Beyer <nd...@apache.org>>
>  > wrote:
>  > > So the method 'roundTo2k' is incorrectly named, then? Should it be
>  > >  'roundTo2ToTheK'? Isn't there a Math method that should be used to do
>  > this,
>  > >  which in turn should be the focus of optimization? I'd rather see such
>  > a
>  > >  "general" optimization pushed into Math and then have all of the
>  > XHashMap
>  > >  implementations use the Math methods consistently.
>  > >
>  > >  -Nathan
>  > >
>  > >  On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
>  > >
>  > > aleksey.shipilev@gmail.com<ht...@gmail.com>>
>
>
> > wrote:
>  > >
>  > >
>  > > > Nathan,
>  > >  >
>  > >  > I think we have a sort of misunderstanding here. I'm not rounding to
>  > >  > 2000, rather I round to next power of 2. I'm pushing the idea that
>  > >  > this optimization is general because it moves long-latency modulo (%)
>  > >  > operations to really cheap mask (&), but to guarantee this, we should
>  > >  > guarantee the storage is 2^k, where k is 0..N.
>  > >  >
>  > >  > Thanks,
>  > >  > Aleksey.
>  > >  >
>  > >  > On Fri, Apr 18, 2008 at 8:44 PM, Nathan Beyer <nd...@apache.org>
>  > <ht...@apache.org>>
>  > >
>  > >
>  > > > wrote:
>  > >  > >  I'm sure this optimization shows an improvement in the
>  > serialization
>  > >  > use
>  > >  > >  case, but you'd be hard pressed to say that this improvement will
>  > make
>  > >  > 80%
>  > >  > >  of all uses of HashMap, WeakHashMap and IdentityHashMap better. If
>  > you
>  > >  > want
>  > >  > >  to round to 2000 to improve serialization, then do that in the
>  > >  > >  serialization.
>  > >  > >
>  > >  > >  I don't think this should be applied as is.
>  > >  > >
>  > >  > >  -Nathan
>  > >  > >
>  > >  >
>  > >
>  >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

BTW - I wasn't suggesting moving this to Math, I was asking why write a
method for executing a 2^K that's already in Math [1]. I'm not keen on
optimizing things in an language that's compiled to an interpreted bytecode
-- that's the whole point of JITs. Keep the code clean, correct and easy to
maintain.

[1]
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Math.html#pow(double,%20double)

On Fri, Apr 18, 2008 at 12:10 PM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Nathan,
>
> The method name is inaccurate, right, it can be named like you want.
> The reason of performance boost _is_not_ in this method (nevertheless
> this method probably is fastest available for rounding). This method
> is utility and I see no point in moving it to Math.
>
> The reason of the performance boost is:
> > On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
> aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
> >> I'm pushing the idea that
> >> this optimization is general because it moves long-latency modulo (%)
> >> operations to really cheap mask (&), but to guarantee this, we should
> >> guarantee the storage is 2^k, where k is 0..N.
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 9:03 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
> > So the method 'roundTo2k' is incorrectly named, then? Should it be
> >  'roundTo2ToTheK'? Isn't there a Math method that should be used to do
> this,
> >  which in turn should be the focus of optimization? I'd rather see such
> a
> >  "general" optimization pushed into Math and then have all of the
> XHashMap
> >  implementations use the Math methods consistently.
> >
> >  -Nathan
> >
> >  On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
> >
> > aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> >
> > > Nathan,
> >  >
> >  > I think we have a sort of misunderstanding here. I'm not rounding to
> >  > 2000, rather I round to next power of 2. I'm pushing the idea that
> >  > this optimization is general because it moves long-latency modulo (%)
> >  > operations to really cheap mask (&), but to guarantee this, we should
> >  > guarantee the storage is 2^k, where k is 0..N.
> >  >
> >  > Thanks,
> >  > Aleksey.
> >  >
> >  > On Fri, Apr 18, 2008 at 8:44 PM, Nathan Beyer <nd...@apache.org>
> <ht...@apache.org>>
> >
> >
> > > wrote:
> >  > >  I'm sure this optimization shows an improvement in the
> serialization
> >  > use
> >  > >  case, but you'd be hard pressed to say that this improvement will
> make
> >  > 80%
> >  > >  of all uses of HashMap, WeakHashMap and IdentityHashMap better. If
> you
> >  > want
> >  > >  to round to 2000 to improve serialization, then do that in the
> >  > >  serialization.
> >  > >
> >  > >  I don't think this should be applied as is.
> >  > >
> >  > >  -Nathan
> >  > >
> >  >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Nathan,

The method name is inaccurate, right, it can be named like you want.
The reason of performance boost _is_not_ in this method (nevertheless
this method probably is fastest available for rounding). This method
is utility and I see no point in moving it to Math.

The reason of the performance boost is:
> On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <al...@gmail.com> wrote:
>> I'm pushing the idea that
>> this optimization is general because it moves long-latency modulo (%)
>> operations to really cheap mask (&), but to guarantee this, we should
>> guarantee the storage is 2^k, where k is 0..N.

Thanks,
Aleksey.

On Fri, Apr 18, 2008 at 9:03 PM, Nathan Beyer <nd...@apache.org> wrote:
> So the method 'roundTo2k' is incorrectly named, then? Should it be
>  'roundTo2ToTheK'? Isn't there a Math method that should be used to do this,
>  which in turn should be the focus of optimization? I'd rather see such a
>  "general" optimization pushed into Math and then have all of the XHashMap
>  implementations use the Math methods consistently.
>
>  -Nathan
>
>  On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
>
> aleksey.shipilev@gmail.com> wrote:
>
>
> > Nathan,
>  >
>  > I think we have a sort of misunderstanding here. I'm not rounding to
>  > 2000, rather I round to next power of 2. I'm pushing the idea that
>  > this optimization is general because it moves long-latency modulo (%)
>  > operations to really cheap mask (&), but to guarantee this, we should
>  > guarantee the storage is 2^k, where k is 0..N.
>  >
>  > Thanks,
>  > Aleksey.
>  >
>  > On Fri, Apr 18, 2008 at 8:44 PM, Nathan Beyer <nd...@apache.org>>
>
>
> > wrote:
>  > >  I'm sure this optimization shows an improvement in the serialization
>  > use
>  > >  case, but you'd be hard pressed to say that this improvement will make
>  > 80%
>  > >  of all uses of HashMap, WeakHashMap and IdentityHashMap better. If you
>  > want
>  > >  to round to 2000 to improve serialization, then do that in the
>  > >  serialization.
>  > >
>  > >  I don't think this should be applied as is.
>  > >
>  > >  -Nathan
>  > >
>  >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

So the method 'roundTo2k' is incorrectly named, then? Should it be
'roundTo2ToTheK'? Isn't there a Math method that should be used to do this,
which in turn should be the focus of optimization? I'd rather see such a
"general" optimization pushed into Math and then have all of the XHashMap
implementations use the Math methods consistently.

-Nathan

On Fri, Apr 18, 2008 at 11:51 AM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Nathan,
>
> I think we have a sort of misunderstanding here. I'm not rounding to
> 2000, rather I round to next power of 2. I'm pushing the idea that
> this optimization is general because it moves long-latency modulo (%)
> operations to really cheap mask (&), but to guarantee this, we should
> guarantee the storage is 2^k, where k is 0..N.
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 8:44 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
> >  I'm sure this optimization shows an improvement in the serialization
> use
> >  case, but you'd be hard pressed to say that this improvement will make
> 80%
> >  of all uses of HashMap, WeakHashMap and IdentityHashMap better. If you
> want
> >  to round to 2000 to improve serialization, then do that in the
> >  serialization.
> >
> >  I don't think this should be applied as is.
> >
> >  -Nathan
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Nathan,

I think we have a sort of misunderstanding here. I'm not rounding to
2000, rather I round to next power of 2. I'm pushing the idea that
this optimization is general because it moves long-latency modulo (%)
operations to really cheap mask (&), but to guarantee this, we should
guarantee the storage is 2^k, where k is 0..N.

Thanks,
Aleksey.

On Fri, Apr 18, 2008 at 8:44 PM, Nathan Beyer <nd...@apache.org> wrote:
>  I'm sure this optimization shows an improvement in the serialization use
>  case, but you'd be hard pressed to say that this improvement will make 80%
>  of all uses of HashMap, WeakHashMap and IdentityHashMap better. If you want
>  to round to 2000 to improve serialization, then do that in the
>  serialization.
>
>  I don't think this should be applied as is.
>
>  -Nathan
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

If it's used in HashMap, then I think that's a mistake as well, but I'd have
to look at the exact details to know. The capacity and load factory are
externalized for a reason, so that a consumer can use them to adjust the
performance and resource consumptions to their needs.

I'm sure this optimization shows an improvement in the serialization use
case, but you'd be hard pressed to say that this improvement will make 80%
of all uses of HashMap, WeakHashMap and IdentityHashMap better. If you want
to round to 2000 to improve serialization, then do that in the
serialization.

I don't think this should be applied as is.

-Nathan

On Fri, Apr 18, 2008 at 11:34 AM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Nathan,
>
> 1. Spec does not require *HashMap should be exactly of given capacity,
> it's just should provide the capacity requested, which obviously right
> if we are rounding to next 2^k. This parameter is merely a hint to
> collection on what size it should be ready to, nobody could (or
> should) guarantee the exact map capacity, it does not visible from
> outside, it's all implementation business.
>
> 2. This hint can dramatically improve the performance of *HashMap, as
> demonstrated on benchmarks. This solution is already used to boost the
> performance of j.u.HashMap.
>
> What do you think?
>
> Thanks,
> Aleksey.
>
> On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <nd...@apache.org>>
> wrote:
> > https://issues.apache.org/jira/browse/HARMONY-5718
> >
> >  Again, I don't agree with the capacity rounding in the patch attached
> to
> >  this issue. I do like the change to the internal data structure; use
> two
> >  arrays for key/value instead a single array. It makes the code easier
> to
> >  read.
> >
> >
> >  -Nathan
> >
> >  On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
> >  aleksey.shipilev@gmail.com<ht...@gmail.com>>
> wrote:
> >
> >
> >
> > > Colleagues,
> >  >
> >  > I had recently filed two JIRAs with improvements in Collections,
> >  > giving up to +30-40% to serialization benchmarks. Presumably they
> will
> >  > boost the performance across the all users since the optimization is
> >  > pretty general:
> >  > https://issues.apache.org/jira/browse/HARMONY-5761
> >  > https://issues.apache.org/jira/browse/HARMONY-5718
> >  >
> >  > Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
> >  >
> >  > Thanks,
> >  > Aleksey.
> >  >
> >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Aleksey Shipilev <al...@gmail.com>.

Nathan,

1. Spec does not require *HashMap should be exactly of given capacity,
it's just should provide the capacity requested, which obviously right
if we are rounding to next 2^k. This parameter is merely a hint to
collection on what size it should be ready to, nobody could (or
should) guarantee the exact map capacity, it does not visible from
outside, it's all implementation business.

2. This hint can dramatically improve the performance of *HashMap, as
demonstrated on benchmarks. This solution is already used to boost the
performance of j.u.HashMap.

What do you think?

Thanks,
Aleksey.

On Fri, Apr 18, 2008 at 8:13 PM, Nathan Beyer <nd...@apache.org> wrote:
> https://issues.apache.org/jira/browse/HARMONY-5718
>
>  Again, I don't agree with the capacity rounding in the patch attached to
>  this issue. I do like the change to the internal data structure; use two
>  arrays for key/value instead a single array. It makes the code easier to
>  read.
>
>
>  -Nathan
>
>  On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
>  aleksey.shipilev@gmail.com> wrote:
>
>
>
> > Colleagues,
>  >
>  > I had recently filed two JIRAs with improvements in Collections,
>  > giving up to +30-40% to serialization benchmarks. Presumably they will
>  > boost the performance across the all users since the optimization is
>  > pretty general:
>  > https://issues.apache.org/jira/browse/HARMONY-5761
>  > https://issues.apache.org/jira/browse/HARMONY-5718
>  >
>  > Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
>  >
>  > Thanks,
>  > Aleksey.
>  >
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

https://issues.apache.org/jira/browse/HARMONY-5718

Again, I don't agree with the capacity rounding in the patch attached to
this issue. I do like the change to the internal data structure; use two
arrays for key/value instead a single array. It makes the code easier to
read.

-Nathan

On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Colleagues,
>
> I had recently filed two JIRAs with improvements in Collections,
> giving up to +30-40% to serialization benchmarks. Presumably they will
> boost the performance across the all users since the optimization is
> pretty general:
> https://issues.apache.org/jira/browse/HARMONY-5761
> https://issues.apache.org/jira/browse/HARMONY-5718
>
> Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
>
> Thanks,
> Aleksey.
>

Re: [classlib][luni][performance] Improvements in Collections

Posted by Nathan Beyer <nd...@apache.org>.

 https://issues.apache.org/jira/browse/HARMONY-5761

I'm not sure I would agree with this approach, at least not the bit about
rounding the capacity that's explicitly passed. That seems like a specific
optimization of WeakHashMap for the way it gets used during serialization. I
think it would be more appropriate to perform this capacity calculation in
the classes that actually instantiate the WeakHashMap.

-Nathan

On Fri, Apr 18, 2008 at 1:50 AM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Colleagues,
>
> I had recently filed two JIRAs with improvements in Collections,
> giving up to +30-40% to serialization benchmarks. Presumably they will
> boost the performance across the all users since the optimization is
> pretty general:
> https://issues.apache.org/jira/browse/HARMONY-5761
> https://issues.apache.org/jira/browse/HARMONY-5718
>
> Would some classlib guru (Tim, Nathan, Tony?) review and commit them?
>
> Thanks,
> Aleksey.
>