You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Aleksey Shipilev <al...@gmail.com> on 2008/04/18 20:53:42 UTC

[classlib][luni][performance] IdentityHashMap implementation

Hi there,

I had instrumented the current implementation of j.u.IdentityHashMap
and ran it through ThreadLocalBench and SerialBench. Surprisingly, the
collision rate there (frequency of sutiations when objects have
identical identityHashCode) is rather high - 8 collisions per one
get() on average with load factor of 0.75.

That mean identityHashCode does not disperse elements well and the
map's performance is subject to clustering. Moreover, the collision
chains are sometimes so big that they're covering a half of storage
array and IHM looses access performance degrading to linear search.
And also the performance of essentially the same IHMs (in terms of
contents) may vary because of different clustering layouts.

That's essentially the disadvantage of open-addressed hash tables and
there are known ways to overwhelm this problem [1]. But before
actually researching on this topic I need to answer some
"java-spec-legal" questions:
 1. Is it required for IdentityHashMap to use open addressing?
 2. Is it required for IdentityHashMap to use linear probing as
collision resolution scheme?
 3. What's the reason of having the separate implementation of
IdentityHashMap while it can be implemented with overriding of
"equality" operation in ordinary HashMap?
 4. Can identityHashCode be salted with some additional hash to break
clustering?

I'm actually interested in performance optimization of IHM, so I will
appreciate anyone comments on this topic.

Thanks,
Aleksey.

[1] http://en.wikipedia.org/wiki/Hash_table

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Tim Ellison <t....@gmail.com>.
Aleksey Shipilev wrote:
> Good point, Nathan! It was DRLVM, but I'll try on IBM VME too.

FWIW the IBM VME available for Harmony doesn't do a good job of 
distributing the identityHashCode.

> Anyway, I had incorporated this option in (4) question - can we just
> don't trust VM's System.identityHashCode and dope the hashcode with
> our own salt?

We had this discussion a awhile ago, and since we don't know what the 
distribution of the VM hashCode implementation will be, the only thing 
to do is assume that the VM is producing a good distribution, and let 
them suffer the consequences if they do not.

Regards,
Tim


Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
According to profile on MT/SerialBench, collision rate drops significantly!

Collision rate per key lookup (min / avg / max):
 a. legacy IdentityHashMap (open address, linear probing): 0 / 7.4 / 75
 b. new IdentityHashMap (based on HashMap, chaining): 0 / 2.8 / 5

I believe this is the reason for performance boost. No optimization
for identityHashCode was done yet.
I also think it makes sense to commit this implementation. Nathan,
Tim, Tony, Jimmy?

Thanks,
Aleksey.

On Mon, Apr 21, 2008 at 9:08 PM, Aleksey Shipilev
<al...@gmail.com> wrote:
> So, the IdentityHashMap based on HashMap is available at HARMONY-5771.
>
>  This change gives +20% on MT/SerialBench and +80% on MT/ThreadLocalBench.
>  It also incorporates arithmetic improvements [2] because HashMap
>  implementation already have them.
>
>  Should we commit this code [1] and then merge the overlapping code in
>  AbstractMap?
>
>  Thanks,
>  Aleksey.
>
>  [1] https://issues.apache.org/jira/browse/HARMONY-5771
>  [2] https://issues.apache.org/jira/browse/HARMONY-5718
>
>
>
>  On Mon, Apr 21, 2008 at 4:50 PM, Aleksey Shipilev
>  <al...@gmail.com> wrote:
>  > Tim,
>  >
>  >  I'm thinking now about making IdentityHashMap to be similar with
>  >  HashMap. AFAIK, there are spec problems with overriding of existing
>  >  HashMap implementation on equality operation, so I want to go this
>  >  way:
>  >   1. Copy-paste HashMap implementation to IdentityHashMap
>  >   2. Change the equality checks and hashCode() calls to
>  >  IdentityHashMap-specific ones
>  >   3. Merge the common code in AbstractMap to reduce code duplication.
>  >
>  >  Performance data can be available at (2), and no committing is
>  >  required on that step.
>  >
>  >  How this sounds?
>  >
>  >  Thanks,
>  >  Aleksey.
>  >
>  >
>  >
>  >  On Mon, Apr 21, 2008 at 4:45 PM, Tim Ellison <t....@gmail.com> wrote:
>  >  > Alexey Varlamov wrote:
>  >  >
>  >  > > 2008/4/21, Aleksey Shipilev <al...@gmail.com>:
>  >  > >
>  >  > > > Well, the implementation note is what confusing me.
>  >  > > > Can we ignore it and implement our own IdentityHashMap instead of
>  >  > > > open-addressed + linear probed + joint key/values array?
>  >  > > >
>  >  > >
>  >  > > I believe so (given that we respect other spec requirements). The
>  >  > > implementation note is a hint, not a part of actual spec IMO.
>  >  > >
>  >  >
>  >  >  Before committing to replace our current implementation, I'm interested to
>  >  > see some experiment and measurement of new ideas.  What could be done?
>  >  >
>  >  >  Regards,
>  >  >  Tim
>  >  >
>  >
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
So, the IdentityHashMap based on HashMap is available at HARMONY-5771.

This change gives +20% on MT/SerialBench and +80% on MT/ThreadLocalBench.
It also incorporates arithmetic improvements [2] because HashMap
implementation already have them.

Should we commit this code [1] and then merge the overlapping code in
AbstractMap?

Thanks,
Aleksey.

[1] https://issues.apache.org/jira/browse/HARMONY-5771
[2] https://issues.apache.org/jira/browse/HARMONY-5718

On Mon, Apr 21, 2008 at 4:50 PM, Aleksey Shipilev
<al...@gmail.com> wrote:
> Tim,
>
>  I'm thinking now about making IdentityHashMap to be similar with
>  HashMap. AFAIK, there are spec problems with overriding of existing
>  HashMap implementation on equality operation, so I want to go this
>  way:
>   1. Copy-paste HashMap implementation to IdentityHashMap
>   2. Change the equality checks and hashCode() calls to
>  IdentityHashMap-specific ones
>   3. Merge the common code in AbstractMap to reduce code duplication.
>
>  Performance data can be available at (2), and no committing is
>  required on that step.
>
>  How this sounds?
>
>  Thanks,
>  Aleksey.
>
>
>
>  On Mon, Apr 21, 2008 at 4:45 PM, Tim Ellison <t....@gmail.com> wrote:
>  > Alexey Varlamov wrote:
>  >
>  > > 2008/4/21, Aleksey Shipilev <al...@gmail.com>:
>  > >
>  > > > Well, the implementation note is what confusing me.
>  > > > Can we ignore it and implement our own IdentityHashMap instead of
>  > > > open-addressed + linear probed + joint key/values array?
>  > > >
>  > >
>  > > I believe so (given that we respect other spec requirements). The
>  > > implementation note is a hint, not a part of actual spec IMO.
>  > >
>  >
>  >  Before committing to replace our current implementation, I'm interested to
>  > see some experiment and measurement of new ideas.  What could be done?
>  >
>  >  Regards,
>  >  Tim
>  >
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Tim,

I'm thinking now about making IdentityHashMap to be similar with
HashMap. AFAIK, there are spec problems with overriding of existing
HashMap implementation on equality operation, so I want to go this
way:
 1. Copy-paste HashMap implementation to IdentityHashMap
 2. Change the equality checks and hashCode() calls to
IdentityHashMap-specific ones
 3. Merge the common code in AbstractMap to reduce code duplication.

Performance data can be available at (2), and no committing is
required on that step.

How this sounds?

Thanks,
Aleksey.

On Mon, Apr 21, 2008 at 4:45 PM, Tim Ellison <t....@gmail.com> wrote:
> Alexey Varlamov wrote:
>
> > 2008/4/21, Aleksey Shipilev <al...@gmail.com>:
> >
> > > Well, the implementation note is what confusing me.
> > > Can we ignore it and implement our own IdentityHashMap instead of
> > > open-addressed + linear probed + joint key/values array?
> > >
> >
> > I believe so (given that we respect other spec requirements). The
> > implementation note is a hint, not a part of actual spec IMO.
> >
>
>  Before committing to replace our current implementation, I'm interested to
> see some experiment and measurement of new ideas.  What could be done?
>
>  Regards,
>  Tim
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Tim Ellison <t....@gmail.com>.
Alexey Varlamov wrote:
> 2008/4/21, Aleksey Shipilev <al...@gmail.com>:
>> Well, the implementation note is what confusing me.
>> Can we ignore it and implement our own IdentityHashMap instead of
>> open-addressed + linear probed + joint key/values array?
> 
> I believe so (given that we respect other spec requirements). The
> implementation note is a hint, not a part of actual spec IMO.

Before committing to replace our current implementation, I'm interested 
to see some experiment and measurement of new ideas.  What could be done?

Regards,
Tim

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Alexey Varlamov <al...@gmail.com>.
2008/4/21, Aleksey Shipilev <al...@gmail.com>:
> Well, the implementation note is what confusing me.
> Can we ignore it and implement our own IdentityHashMap instead of
> open-addressed + linear probed + joint key/values array?

I believe so (given that we respect other spec requirements). The
implementation note is a hint, not a part of actual spec IMO.

>
> On Mon, Apr 21, 2008 at 8:50 AM, Alexey Varlamov
> <al...@gmail.com> wrote:
> >  Implementation note: This is a simple linear-probe hash table, as
> >  described for example in texts by Sedgewick and Knuth. The array
> >  alternates holding keys and values. (This has better locality for
> >  large tables than does using separate arrays.) For many JRE
> >  implementations and operation mixes, this class will yield better
> >  performance than HashMap (which uses chaining rather than
> >  linear-probing)."
>
> Thanks,
> Aleksey.
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Well, the implementation note is what confusing me.
Can we ignore it and implement our own IdentityHashMap instead of
open-addressed + linear probed + joint key/values array?

On Mon, Apr 21, 2008 at 8:50 AM, Alexey Varlamov
<al...@gmail.com> wrote:
>  Implementation note: This is a simple linear-probe hash table, as
>  described for example in texts by Sedgewick and Knuth. The array
>  alternates holding keys and values. (This has better locality for
>  large tables than does using separate arrays.) For many JRE
>  implementations and operation mixes, this class will yield better
>  performance than HashMap (which uses chaining rather than
>  linear-probing)."

Thanks,
Aleksey.

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Alexey Varlamov <al...@gmail.com>.
2008/4/19, Aleksey Shipilev <al...@gmail.com>:
> Good point, Nathan! It was DRLVM, but I'll try on IBM VME too.
>
> Anyway, I had incorporated this option in (4) question - can we just
> don't trust VM's System.identityHashCode and dope the hashcode with
> our own salt?

IMO this is mere impl detail and would not not violate specification,
so feel free to experiment. Below are most relevant parts of the spec:

"This class provides all of the optional map operations, and permits
null values and the null key. This class makes no guarantees as to the
order of the map; in particular, it does not guarantee that the order
will remain constant over time.

This class provides constant-time performance for the basic operations
(get and put), assuming the system identity hash function
(System.identityHashCode(Object)) disperses elements properly among
the buckets.

Implementation note: This is a simple linear-probe hash table, as
described for example in texts by Sedgewick and Knuth. The array
alternates holding keys and values. (This has better locality for
large tables than does using separate arrays.) For many JRE
implementations and operation mixes, this class will yield better
performance than HashMap (which uses chaining rather than
linear-probing)."

--
Alexey

>
>
> On Fri, Apr 18, 2008 at 11:10 PM, Nathan Beyer <nb...@gmail.com> wrote:
> > Did you run these tests with IBM's VME or DRLVM? IdendityHashMap's
> >  distribution of hash code is solely based on System.identityHashCode's
> >  results. We need to determine if the VM is doing the best thing first.
> >
> >  -Nathan
> >
> >  On Fri, Apr 18, 2008 at 1:53 PM, Aleksey Shipilev <
> >
> > aleksey.shipilev@gmail.com> wrote:
> >
> >
> >
> > > Hi there,
> >  >
> >  > I had instrumented the current implementation of j.u.IdentityHashMap
> >  > and ran it through ThreadLocalBench and SerialBench. Surprisingly, the
> >  > collision rate there (frequency of sutiations when objects have
> >  > identical identityHashCode) is rather high - 8 collisions per one
> >  > get() on average with load factor of 0.75.
> >  >
> >  > That mean identityHashCode does not disperse elements well and the
> >  > map's performance is subject to clustering. Moreover, the collision
> >  > chains are sometimes so big that they're covering a half of storage
> >  > array and IHM looses access performance degrading to linear search.
> >  > And also the performance of essentially the same IHMs (in terms of
> >  > contents) may vary because of different clustering layouts.
> >  >
> >  > That's essentially the disadvantage of open-addressed hash tables and
> >  > there are known ways to overwhelm this problem [1]. But before
> >  > actually researching on this topic I need to answer some
> >  > "java-spec-legal" questions:
> >  >  1. Is it required for IdentityHashMap to use open addressing?
> >  >  2. Is it required for IdentityHashMap to use linear probing as
> >  > collision resolution scheme?
> >  >  3. What's the reason of having the separate implementation of
> >  > IdentityHashMap while it can be implemented with overriding of
> >  > "equality" operation in ordinary HashMap?
> >  >  4. Can identityHashCode be salted with some additional hash to break
> >  > clustering?
> >  >
> >  > I'm actually interested in performance optimization of IHM, so I will
> >  > appreciate anyone comments on this topic.
> >  >
> >  > Thanks,
> >  > Aleksey.
> >  >
> >  > [1] http://en.wikipedia.org/wiki/Hash_table
> >  >
> >
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Good point, Nathan! It was DRLVM, but I'll try on IBM VME too.

Anyway, I had incorporated this option in (4) question - can we just
don't trust VM's System.identityHashCode and dope the hashcode with
our own salt?


On Fri, Apr 18, 2008 at 11:10 PM, Nathan Beyer <nb...@gmail.com> wrote:
> Did you run these tests with IBM's VME or DRLVM? IdendityHashMap's
>  distribution of hash code is solely based on System.identityHashCode's
>  results. We need to determine if the VM is doing the best thing first.
>
>  -Nathan
>
>  On Fri, Apr 18, 2008 at 1:53 PM, Aleksey Shipilev <
>
> aleksey.shipilev@gmail.com> wrote:
>
>
>
> > Hi there,
>  >
>  > I had instrumented the current implementation of j.u.IdentityHashMap
>  > and ran it through ThreadLocalBench and SerialBench. Surprisingly, the
>  > collision rate there (frequency of sutiations when objects have
>  > identical identityHashCode) is rather high - 8 collisions per one
>  > get() on average with load factor of 0.75.
>  >
>  > That mean identityHashCode does not disperse elements well and the
>  > map's performance is subject to clustering. Moreover, the collision
>  > chains are sometimes so big that they're covering a half of storage
>  > array and IHM looses access performance degrading to linear search.
>  > And also the performance of essentially the same IHMs (in terms of
>  > contents) may vary because of different clustering layouts.
>  >
>  > That's essentially the disadvantage of open-addressed hash tables and
>  > there are known ways to overwhelm this problem [1]. But before
>  > actually researching on this topic I need to answer some
>  > "java-spec-legal" questions:
>  >  1. Is it required for IdentityHashMap to use open addressing?
>  >  2. Is it required for IdentityHashMap to use linear probing as
>  > collision resolution scheme?
>  >  3. What's the reason of having the separate implementation of
>  > IdentityHashMap while it can be implemented with overriding of
>  > "equality" operation in ordinary HashMap?
>  >  4. Can identityHashCode be salted with some additional hash to break
>  > clustering?
>  >
>  > I'm actually interested in performance optimization of IHM, so I will
>  > appreciate anyone comments on this topic.
>  >
>  > Thanks,
>  > Aleksey.
>  >
>  > [1] http://en.wikipedia.org/wiki/Hash_table
>  >
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Nathan Beyer <nb...@gmail.com>.
Did you run these tests with IBM's VME or DRLVM? IdendityHashMap's
distribution of hash code is solely based on System.identityHashCode's
results. We need to determine if the VM is doing the best thing first.

-Nathan

On Fri, Apr 18, 2008 at 1:53 PM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Hi there,
>
> I had instrumented the current implementation of j.u.IdentityHashMap
> and ran it through ThreadLocalBench and SerialBench. Surprisingly, the
> collision rate there (frequency of sutiations when objects have
> identical identityHashCode) is rather high - 8 collisions per one
> get() on average with load factor of 0.75.
>
> That mean identityHashCode does not disperse elements well and the
> map's performance is subject to clustering. Moreover, the collision
> chains are sometimes so big that they're covering a half of storage
> array and IHM looses access performance degrading to linear search.
> And also the performance of essentially the same IHMs (in terms of
> contents) may vary because of different clustering layouts.
>
> That's essentially the disadvantage of open-addressed hash tables and
> there are known ways to overwhelm this problem [1]. But before
> actually researching on this topic I need to answer some
> "java-spec-legal" questions:
>  1. Is it required for IdentityHashMap to use open addressing?
>  2. Is it required for IdentityHashMap to use linear probing as
> collision resolution scheme?
>  3. What's the reason of having the separate implementation of
> IdentityHashMap while it can be implemented with overriding of
> "equality" operation in ordinary HashMap?
>  4. Can identityHashCode be salted with some additional hash to break
> clustering?
>
> I'm actually interested in performance optimization of IHM, so I will
> appreciate anyone comments on this topic.
>
> Thanks,
> Aleksey.
>
> [1] http://en.wikipedia.org/wiki/Hash_table
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Alexey Petrenko <al...@gmail.com>.
Aleksey,

I would support Mark here. Clear patch is very important because it
makes our lives much easier and we are not required to skip 70K of
irrelevant text to see 2 relevant lines :)
So I think that your idea is good but should be better delivered :)

Take a look at "Good Issue Resolution Guideline" for example.
http://harmony.apache.org/issue_resolution_guideline.html

SY, Alexey

2008/4/23, Mark Hindess <ma...@googlemail.com>:
>
> On 23 April 2008 at 0:36, "Aleksey Shipilev" <al...@gmail.com>
> wrote:
> > Hi Endre,
> >
> > On Tue, Apr 22, 2008 at 11:30 PM, Endre St=F8lsvik <En...@stolsvik.com> wro=
> > te:
> > > Aleksey Shipilev wrote:
> > > > The reason behind all that changes is that entire IdentityHashMap
> > > > implementation was thrown away and replaced by HashMap
> > >
> > >  Isn't it possible to actually record this fact using SVN, by
> > > deleting the file, and then adding it again (or svn copy it from
> > > HashMap) - so that it doesn't look like a *change*, but more what it
> > > actually is: a remove, and then an add (actually, a copy)?
> >
> > Unfortunately, that's not usable, you might play around to see why. If
> > you find a solution, please let me know :)
>
> Fortunately, it is not compulsory to create patches with "svn diff".
> I've just done:
>
> 1) apply your patch to a fresh checkout
> 2) cp modules/luni/src/main/java/java/util/HashMap.java \
>      modules/luni/src/main/java/java/util/IdentityHashMap.java.orig
> 3) diff -u modules/luni/src/main/java/java/util/IdentityHashMap.java.orig \
>           modules/luni/src/main/java/java/util/IdentityHashMap.java
>
> The resulting patch would be much more suitable for attachment to a JIRA.
>
> I'd still fix a few things about HashMap.java first though.
> -Mark.
>
>
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Mark Hindess <ma...@googlemail.com>.
On 23 April 2008 at 0:36, "Aleksey Shipilev" <al...@gmail.com> 
wrote:
> Hi Endre,
> 
> On Tue, Apr 22, 2008 at 11:30 PM, Endre St=F8lsvik <En...@stolsvik.com> wro=
> te:
> > Aleksey Shipilev wrote:
> > > The reason behind all that changes is that entire IdentityHashMap
> > > implementation was thrown away and replaced by HashMap
> >
> >  Isn't it possible to actually record this fact using SVN, by
> > deleting the file, and then adding it again (or svn copy it from
> > HashMap) - so that it doesn't look like a *change*, but more what it
> > actually is: a remove, and then an add (actually, a copy)?
>
> Unfortunately, that's not usable, you might play around to see why. If
> you find a solution, please let me know :)

Fortunately, it is not compulsory to create patches with "svn diff".
I've just done:

1) apply your patch to a fresh checkout
2) cp modules/luni/src/main/java/java/util/HashMap.java \
      modules/luni/src/main/java/java/util/IdentityHashMap.java.orig
3) diff -u modules/luni/src/main/java/java/util/IdentityHashMap.java.orig \
           modules/luni/src/main/java/java/util/IdentityHashMap.java

The resulting patch would be much more suitable for attachment to a JIRA.

I'd still fix a few things about HashMap.java first though.
-Mark.



Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Hi Endre,

On Tue, Apr 22, 2008 at 11:30 PM, Endre Stølsvik <En...@stolsvik.com> wrote:
> Aleksey Shipilev wrote:
> > The reason behind all that changes is that entire IdentityHashMap
> > implementation was thrown away and replaced by HashMap
>  Isn't it possible to actually record this fact using SVN, by deleting the
> file, and then adding it again (or svn copy it from HashMap) - so that it
> doesn't look like a *change*, but more what it actually is: a remove, and
> then an add (actually, a copy)?
Unfortunately, that's not usable, you might play around to see why. If
you find a solution, please let me know :)

The reason is, svn diff is preparing the patch by diffing
IdentityHashMap.java and IdentityHashMap.java.svn-base, so it does not
matter how would you replace the contents of IdentityHashMap.java -
the results will be the same patch. Moreover, if you'll do "svn rm /
svn copy" the history for HashMap.java will be forwarded to
IdentityHashMap.java and "svn diff" will not generate anything.
Despite the fact this change might be more clean while committing (and
probably should be exploited by committer), it's hard for
non-committer to share the patch in JIRA. I don't know though whether
such change will discard the IdentityHashMap modification history.


Thanks,
Aleksey.

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Endre Stølsvik <En...@stolsvik.com>.
Aleksey Shipilev wrote:
> Good comments, Mark, thanks! But I have to chill your fury a little :)
> 
> The reason behind all that changes is that entire IdentityHashMap
> implementation was thrown away and replaced by HashMap

Isn't it possible to actually record this fact using SVN, by deleting 
the file, and then adding it again (or svn copy it from HashMap) - so 
that it doesn't look like a *change*, but more what it actually is: a 
remove, and then an add (actually, a copy)?

Endre.

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Nathan Beyer <nd...@apache.org>.
On Tue, Apr 22, 2008 at 4:03 PM, Mark Hindess <ma...@googlemail.com>
wrote:

>
> On 22 April 2008 at 23:02, "Aleksey Shipilev" <al...@gmail.com>
> >
> wrote:
> >
> > Good comments, Mark, thanks! But I have to chill your fury a little :)
>
> I'm might be a little bothered but I'm certainly not furious.  I just
> wish to point out issues so that we can make things better.  I'd not
> want to see the patch applied in its current form but I've already
> moved on from the "clean" issue - by proving to myself that it could be
> cleaned up - and I'm looking at the "non-breaking" part now.
>
> > The reason behind all that changes is that entire IdentityHashMap
> > implementation was thrown away and replaced by HashMap, then the
> > relevant changes that transform it back to IdentityHashMap were done
> > (changing hashCode() to System.identityHashCode(), equals to ==, etc).
>
> sigh... this should really have been made clear in the JIRA - see below
> for an example of how this could be done.
>
> > So if you diff the HashMap.java and IdentityHashMap.java after the
> > patch applied, you should notice very small difference. That's a
> > good sign because thus we can easily identify and generalize the
> > common code in both implementations, as we suggested to do. That's the
> > generalization we talked about with Nathan and Tim, right?
>
> Well, I've not looked at HashMap.java but I suspect that it might have
> methods ordered alphabetically too which is horrible and should be
> fixed.
>
> Also, when I reverted the spurious javadoc changes back to the
> original form, I replaced a number of instances of the specific word
> IdentityHashMap with the less specific word Map.  Given the context
> doesn't detract from the meaning and does make it easier to read.
> Rather than make the IdentityHashMap javadoc more like the HashMap
> javadoc I suggest we change the HashMap javadoc in this way too.  (This
> is a personal preference but it would make the significant differences
> between HashMap.java and IdentityHashMap.java much more obvious.)
>
> In short, making them similar is great but lets make them both better
> rather than make IdentityHashMap worse.


I agree with this approach; clean HashMap, copy to IdentityHashMap, make
Identity-specific changes.


>
>
> > Keeping this in mind, you can see that the assumption of "minimal
> > change" can't be preserved
>
> Nonsense.  I have a 33% smaller cleaner patch created in ~10 minutes
> using emacs and ediff mode to prove it.  And armed with the extra
> information that was missing from the JIRA I can suggest an even better
> way see below.
>
> > and reading out the patch itself is pretty useless.
>
> I disagree; I read it and I don't think it was a useless exercise.  If
> my reading of the patch resulted in what you referred to above as "good
> comments" then clearly it wasn't useless.
>
> Of course, it is not the whole story but then I was never under the
> illusion that it was and I am moving on to look at the result of the
> patch next.  Personally, I think people underestimate the value of
> reading raw patches.  You do get to see just the changes - in a clean
> patch anyway - and "bad smells"[0] are often easier to see this way.
>

Agreed. This is something that I didn't quite get at first, but after
working with several different projects and being committer here, the power
of patches and diffs has certainly won me over. I have a much greater
respect for SCM tools now.

I love the "code smells" reference -- this is a turn-of-phrase that I
frequently use.

-Nathan


>
> > Rather, it makes sense to apply the patch to local copy and then
> > see whether the differences between IHM and HM are enough, whether
> > resulting IHM implementation contradicts with spec, whether IHM passes
> > the tests and runs the workloads well and stuff.
>
> I'm getting to that part.  You stated that the patch was "clean,
> non-breaking, consistent implementation, ready to be committed" and I
> was testing those assertions in that order.  Although I've probably made
> my mind up about the last one too.
>
> > Of course, to avoid these confusions I can rewrite entire Collections
> > at once and then contribute a mega-patch, but I don't think this is
> > the proper way of doing things.
>
> That's an interesting goal but I don't believe mega-patches are required
> to achieve it.  Before I became a committer, I did a huge amount of
> work unifying the native code and I did it in *lots* of small steps.
> Each step was clear.  I sometimes submitted a series of patches (and/or
> scripts) for a single JIRA to make the steps as transparent as possible.
> For example, for HARMONY-5771, you might do something like:
>
>  1) Attached a script (or documented) doing:
>
>    svn copy modules/luni/src/main/java/java/util/HashMap.java \
>    modules/luni/src/main/java/java/util/IdentityHashMap.java
>
>  2) Attach a patch to be applied after the copy with the changes.
>
> I think what you had done would be much clearer and I might not have
> started this discussion.  Certainly reading the patch would have been
> much easier and the *important* changes that really need reviewing would
> have been obvious not cluttered like the current patch.[1]
>
> Having said that, I think that would still be the wrong thing to do in
> this case and you'd need:
>
>  0) Attach a patch to fix some ugly things about HashMap
>
> to avoid duplicating the IMHO less wonderful aspects of HashMap in
> IdentityHashMap.
>
> > I prefer gradual updates instead.
>
> Well, at least, we can agree about that. ;-)
>
> -Mark.
>
> [0]
> http://sis36.berkeley.edu/projects/streek/agile/bad-smells-in-code.html
>
> [1] The svn diff on the commits list would still be a mess but hopefully
> the committer would use a log message that made it clear there was a
> more readable version in the JIRA.
>
> > On Tue, Apr 22, 2008 at 9:02 PM, Mark Hindess
> > <ma...@googlemail.com>>
> wrote:
> > >
> > >  On 22 April 2008 at 15:23, "Aleksey Shipilev" <
> aleksey.shipilev@gmail.com<ht...@gmail.com>
> >
> > >  wrote:
> > >
> > > > Tim,
> > >  >
> > >  > I support your proposal and try to make distinction between
> > >  > proof-of-concept and clean patches.
> > >  >
> > >  > Performance improvements could be notorious work, requiring a lot
> of
> > >  > resources. The bad thing is, they could give nothing after the work
> is
> > >  > done, that's why we are doing thought experiments and
> proof-of-concept
> > >  > patches before actually implementing something. Proof-of-concept
> may
> > >  > also give a hint on what maximum improvement we can get and set the
> > >  > expectations for clean patch.
> > >  >
> > >  > So we are not looking for hiding the problems, rather we're
> exploring
> > >  > what could it worth.
> > >  >
> > >  > F.ex. the experiments on IdentityHashMap performance finally led to
> > >  > clean, non-breaking, consistent implementation, ready to be
> committed
> > >  > patch [1]. Can we focus on reviewing it?
> > >
> > >  "clean [snip] patch"?  Okay, I'll take the bait...
> > >
> > >  1) It is good to fix the assert in the test case, but if you are
> going to
> > >  change it at all you might as well replace:
> > >
> > >   assertTrue("TestB. removed entries incorrectly", map.size() == 11 &&
> > >  !result);
> > >
> > >  with:
> > >
> > >   assertEquals("TestB. removed entries incorrectly, size is not
> correct",
> > >                11, map.size());
> > >   assertFalse("TestB. removed entries incorrectly, result is not
> correct",
> > >               result);
> > >
> > >
> > >  2) I think the IdentityHashMap javadoc comment change removes too
> much
> > >  important information.
> > >
> > >
> > >  3) I don't understand why you make arbitrary whitespace changes -
> such
> > >  as rewriting:
> > >
> > >  public class IdentityHashMap<K, V> extends AbstractMap<K, V>
> implements
> > >         Map<K, V>, Serializable, Cloneable {
> > >
> > >  to be longer than 80-characters.
> > >
> > >
> > >  4) You rename DEFAULT_MAX_SIZE to DEFAULT_SIZE but I don't really see
> > >  the relevance of this to the changes.
> > >
> > >
> > >  5) You seem to remove a number of correct comments from the
> > >  declarations of threshold, DEFAULT_MAX_SIZE, loadFactor, and
> modCount.
> > >
> > >  I almost I gave up reading the patch at this point out of frustration
> > >  at the number of irrelevant changes.  Such as:
> > >
> > >  a) the re-ordering of often unchanged public methods - previously the
> > >  public methods were grouped by similar functionality which seems
> logical
> > >  to me but now they are in alphabetical order.  This makes the patch
> much
> > >  bigger and harder to read.  If there is a good reason to re-order the
> > >  methods, do it in a separate patch.
> > >
> > >
> > >  b) The arbitrary changes to javadoc like:
> > >
> > >      /**
> > >  -     * Searches this Map for the specified value.
> > >  -     *
> > >  -     *
> > >  +     * Searches this IdentityHashMap for the specified value.
> > >  +     *
> > >       * @param value
> > >       *            the object to search for
> > >  -     * @return true if <code>value</code> is a value of this Map,
> false
> > >  +     * @return true if <code>value</code> is a value of this
> IdentityHash
> > Map,
> > >  false
> > >       *         otherwise
> > >       */
> > >
> > >  c) The arbitrary renaming of IdentityHashMapEntry to Entry.
> > >
> > >  Is it unrealistic to expect clean patches with justifications for
> > >  changes?  Can we optimise patches as well as code?
> > >
> > >  If I apply the patch and then revert the whitespace changes,
> > >  javadoc reformatting, renaming of DEFAULT_MAX_SIZE, renaming of
> > >  IdentityHashMapEntry, etc. the the resulting svn diff contains at
> least
> > >  33% fewer +/- lines than the one on the JIRA and is still
> functionally
> > >  the same.
> > >
> > >  As someone who tries to read svn commit messages (and all committers
> or
> > >  would-be-committers should be doing this to some extent!) patches
> like
> > >  this are a nightmare.
> > >
> > >  Regards,
> > >  -Mark.
> > >
> > >
> > >
> > >  >
> > >  > Thanks,
> > >  > Aleksey.
> > >  >
> > >  > [1] https://issues.apache.org/jira/browse/HARMONY-5771
> > >  >
> > >  > On Tue, Apr 22, 2008 at 12:09 PM, Tim Ellison <
> t.p.ellison@gmail.com<ht...@gmail.com>>
> wr
> > ote:
> > >  > > Aleksey Shipilev wrote:
> > >  > >
> > >  > > > Right, Tim. It was just the proof-of-concept "Does
> IdentityHashMap
> > >  > > > tuning worth it?".
> > >  > > >
> > >  > >
> > >  > >  I think it is unhelpful to mix actual, concrete suggestions for
> perfo
> > rmanc
> > >  > e
> > >  > > improvements, with proof-of-concept ideas that break the
> specification
> >  or
> > >  > > security.
> > >  > >
> > >  > >  Please can people mark clearly in their mail whether they are
> suggest
> > ing
> > >  > > the idea only as "a thought experiment", otherwise somebody may
> apply
> > the
> > >  > > change without properly understanding the consequences.
> > >  > >
> > >  > >  I realize the onus is on the committer to ensure it is safe but
> hidin
> > g suc
> > >  > h
> > >  > > problems is, as I said, unhelpful.
> > >  > >
> > >  > >  Thanks,
> > >  > >  Tim
> > >  > >
> > >  > >
> > >  > >
> > >  > >
> > >  > >
> > >  > > > On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <
> t.p.ellison@gmail.com<ht...@gmail.com>
> >
> > >  > > wrote:
> > >  > > >
> > >  > > > > Sergey Salishev wrote:
> > >  > > > >
> > >  > > > >
> > >  > > > > > It also should be noticed that replacing the
> IdentityHashMap wit
> > h
> > >  > > HashMap
> > >  > > > > >
> > >  > > > > in
> > >  > > > >
> > >  > > > > > Thread impl gives 4.5x boost on ThreadLocalBench.
> > >  > > > > >
> > >  > > > > >
> > >  > > > >  But it would be wrong since it opens a security
> vulnerability.
> > >  > > > >
> > >  > > > >  See:
> > >  > > > >  http://markmail.org/message/rwq3kif5n33wp65m
> > >  > > > >  http://svn.apache.org/viewvc?view=rev&revision=477355
> > >  > > > >
> > >  > > > >
> > >  > > > >  Regards,
> > >  > > > >  Tim
> > >  > > > >
> > >  > > > >
> > >  > > >
> > >  > > >
> > >  > >
> > >  >
> > >
> > >
> > >
> >
>
>
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Mark Hindess <ma...@googlemail.com>.
On 22 April 2008 at 23:02, "Aleksey Shipilev" <al...@gmail.com> 
wrote:
>
> Good comments, Mark, thanks! But I have to chill your fury a little :)

I'm might be a little bothered but I'm certainly not furious.  I just
wish to point out issues so that we can make things better.  I'd not
want to see the patch applied in its current form but I've already
moved on from the "clean" issue - by proving to myself that it could be
cleaned up - and I'm looking at the "non-breaking" part now.

> The reason behind all that changes is that entire IdentityHashMap
> implementation was thrown away and replaced by HashMap, then the
> relevant changes that transform it back to IdentityHashMap were done
> (changing hashCode() to System.identityHashCode(), equals to ==, etc).

sigh... this should really have been made clear in the JIRA - see below
for an example of how this could be done.

> So if you diff the HashMap.java and IdentityHashMap.java after the
> patch applied, you should notice very small difference. That's a
> good sign because thus we can easily identify and generalize the
> common code in both implementations, as we suggested to do. That's the
> generalization we talked about with Nathan and Tim, right?

Well, I've not looked at HashMap.java but I suspect that it might have
methods ordered alphabetically too which is horrible and should be
fixed.

Also, when I reverted the spurious javadoc changes back to the
original form, I replaced a number of instances of the specific word
IdentityHashMap with the less specific word Map.  Given the context
doesn't detract from the meaning and does make it easier to read.
Rather than make the IdentityHashMap javadoc more like the HashMap
javadoc I suggest we change the HashMap javadoc in this way too.  (This
is a personal preference but it would make the significant differences
between HashMap.java and IdentityHashMap.java much more obvious.)

In short, making them similar is great but lets make them both better
rather than make IdentityHashMap worse.

> Keeping this in mind, you can see that the assumption of "minimal
> change" can't be preserved

Nonsense.  I have a 33% smaller cleaner patch created in ~10 minutes
using emacs and ediff mode to prove it.  And armed with the extra
information that was missing from the JIRA I can suggest an even better
way see below.

> and reading out the patch itself is pretty useless.

I disagree; I read it and I don't think it was a useless exercise.  If
my reading of the patch resulted in what you referred to above as "good
comments" then clearly it wasn't useless.

Of course, it is not the whole story but then I was never under the
illusion that it was and I am moving on to look at the result of the
patch next.  Personally, I think people underestimate the value of
reading raw patches.  You do get to see just the changes - in a clean
patch anyway - and "bad smells"[0] are often easier to see this way.

> Rather, it makes sense to apply the patch to local copy and then
> see whether the differences between IHM and HM are enough, whether
> resulting IHM implementation contradicts with spec, whether IHM passes
> the tests and runs the workloads well and stuff.

I'm getting to that part.  You stated that the patch was "clean,
non-breaking, consistent implementation, ready to be committed" and I
was testing those assertions in that order.  Although I've probably made
my mind up about the last one too.

> Of course, to avoid these confusions I can rewrite entire Collections
> at once and then contribute a mega-patch, but I don't think this is
> the proper way of doing things.

That's an interesting goal but I don't believe mega-patches are required
to achieve it.  Before I became a committer, I did a huge amount of
work unifying the native code and I did it in *lots* of small steps.
Each step was clear.  I sometimes submitted a series of patches (and/or
scripts) for a single JIRA to make the steps as transparent as possible.
For example, for HARMONY-5771, you might do something like:

  1) Attached a script (or documented) doing:

    svn copy modules/luni/src/main/java/java/util/HashMap.java \
    modules/luni/src/main/java/java/util/IdentityHashMap.java

  2) Attach a patch to be applied after the copy with the changes.

I think what you had done would be much clearer and I might not have
started this discussion.  Certainly reading the patch would have been
much easier and the *important* changes that really need reviewing would
have been obvious not cluttered like the current patch.[1]

Having said that, I think that would still be the wrong thing to do in
this case and you'd need:

  0) Attach a patch to fix some ugly things about HashMap

to avoid duplicating the IMHO less wonderful aspects of HashMap in
IdentityHashMap.

> I prefer gradual updates instead.

Well, at least, we can agree about that. ;-)

-Mark.

[0] http://sis36.berkeley.edu/projects/streek/agile/bad-smells-in-code.html

[1] The svn diff on the commits list would still be a mess but hopefully
the committer would use a log message that made it clear there was a
more readable version in the JIRA.

> On Tue, Apr 22, 2008 at 9:02 PM, Mark Hindess
> <ma...@googlemail.com> wrote:
> >
> >  On 22 April 2008 at 15:23, "Aleksey Shipilev" <al...@gmail.com>
> >  wrote:
> >
> > > Tim,
> >  >
> >  > I support your proposal and try to make distinction between
> >  > proof-of-concept and clean patches.
> >  >
> >  > Performance improvements could be notorious work, requiring a lot of
> >  > resources. The bad thing is, they could give nothing after the work is
> >  > done, that's why we are doing thought experiments and proof-of-concept
> >  > patches before actually implementing something. Proof-of-concept may
> >  > also give a hint on what maximum improvement we can get and set the
> >  > expectations for clean patch.
> >  >
> >  > So we are not looking for hiding the problems, rather we're exploring
> >  > what could it worth.
> >  >
> >  > F.ex. the experiments on IdentityHashMap performance finally led to
> >  > clean, non-breaking, consistent implementation, ready to be committed
> >  > patch [1]. Can we focus on reviewing it?
> >
> >  "clean [snip] patch"?  Okay, I'll take the bait...
> >
> >  1) It is good to fix the assert in the test case, but if you are going to
> >  change it at all you might as well replace:
> >
> >   assertTrue("TestB. removed entries incorrectly", map.size() == 11 &&
> >  !result);
> >
> >  with:
> >
> >   assertEquals("TestB. removed entries incorrectly, size is not correct",
> >                11, map.size());
> >   assertFalse("TestB. removed entries incorrectly, result is not correct",
> >               result);
> >
> >
> >  2) I think the IdentityHashMap javadoc comment change removes too much
> >  important information.
> >
> >
> >  3) I don't understand why you make arbitrary whitespace changes - such
> >  as rewriting:
> >
> >  public class IdentityHashMap<K, V> extends AbstractMap<K, V> implements
> >         Map<K, V>, Serializable, Cloneable {
> >
> >  to be longer than 80-characters.
> >
> >
> >  4) You rename DEFAULT_MAX_SIZE to DEFAULT_SIZE but I don't really see
> >  the relevance of this to the changes.
> >
> >
> >  5) You seem to remove a number of correct comments from the
> >  declarations of threshold, DEFAULT_MAX_SIZE, loadFactor, and modCount.
> >
> >  I almost I gave up reading the patch at this point out of frustration
> >  at the number of irrelevant changes.  Such as:
> >
> >  a) the re-ordering of often unchanged public methods - previously the
> >  public methods were grouped by similar functionality which seems logical
> >  to me but now they are in alphabetical order.  This makes the patch much
> >  bigger and harder to read.  If there is a good reason to re-order the
> >  methods, do it in a separate patch.
> >
> >
> >  b) The arbitrary changes to javadoc like:
> >
> >      /**
> >  -     * Searches this Map for the specified value.
> >  -     *
> >  -     *
> >  +     * Searches this IdentityHashMap for the specified value.
> >  +     *
> >       * @param value
> >       *            the object to search for
> >  -     * @return true if <code>value</code> is a value of this Map, false
> >  +     * @return true if <code>value</code> is a value of this IdentityHash
> Map,
> >  false
> >       *         otherwise
> >       */
> >
> >  c) The arbitrary renaming of IdentityHashMapEntry to Entry.
> >
> >  Is it unrealistic to expect clean patches with justifications for
> >  changes?  Can we optimise patches as well as code?
> >
> >  If I apply the patch and then revert the whitespace changes,
> >  javadoc reformatting, renaming of DEFAULT_MAX_SIZE, renaming of
> >  IdentityHashMapEntry, etc. the the resulting svn diff contains at least
> >  33% fewer +/- lines than the one on the JIRA and is still functionally
> >  the same.
> >
> >  As someone who tries to read svn commit messages (and all committers or
> >  would-be-committers should be doing this to some extent!) patches like
> >  this are a nightmare.
> >
> >  Regards,
> >  -Mark.
> >
> >
> >
> >  >
> >  > Thanks,
> >  > Aleksey.
> >  >
> >  > [1] https://issues.apache.org/jira/browse/HARMONY-5771
> >  >
> >  > On Tue, Apr 22, 2008 at 12:09 PM, Tim Ellison <t....@gmail.com> wr
> ote:
> >  > > Aleksey Shipilev wrote:
> >  > >
> >  > > > Right, Tim. It was just the proof-of-concept "Does IdentityHashMap
> >  > > > tuning worth it?".
> >  > > >
> >  > >
> >  > >  I think it is unhelpful to mix actual, concrete suggestions for perfo
> rmanc
> >  > e
> >  > > improvements, with proof-of-concept ideas that break the specification
>  or
> >  > > security.
> >  > >
> >  > >  Please can people mark clearly in their mail whether they are suggest
> ing
> >  > > the idea only as "a thought experiment", otherwise somebody may apply 
> the
> >  > > change without properly understanding the consequences.
> >  > >
> >  > >  I realize the onus is on the committer to ensure it is safe but hidin
> g suc
> >  > h
> >  > > problems is, as I said, unhelpful.
> >  > >
> >  > >  Thanks,
> >  > >  Tim
> >  > >
> >  > >
> >  > >
> >  > >
> >  > >
> >  > > > On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <t....@gmail.com>
> >  > > wrote:
> >  > > >
> >  > > > > Sergey Salishev wrote:
> >  > > > >
> >  > > > >
> >  > > > > > It also should be noticed that replacing the IdentityHashMap wit
> h
> >  > > HashMap
> >  > > > > >
> >  > > > > in
> >  > > > >
> >  > > > > > Thread impl gives 4.5x boost on ThreadLocalBench.
> >  > > > > >
> >  > > > > >
> >  > > > >  But it would be wrong since it opens a security vulnerability.
> >  > > > >
> >  > > > >  See:
> >  > > > >  http://markmail.org/message/rwq3kif5n33wp65m
> >  > > > >  http://svn.apache.org/viewvc?view=rev&revision=477355
> >  > > > >
> >  > > > >
> >  > > > >  Regards,
> >  > > > >  Tim
> >  > > > >
> >  > > > >
> >  > > >
> >  > > >
> >  > >
> >  >
> >
> >
> >
> 



Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Good comments, Mark, thanks! But I have to chill your fury a little :)

The reason behind all that changes is that entire IdentityHashMap
implementation was thrown away and replaced by HashMap, then the
relevant changes that transform it back to IdentityHashMap were done
(changing hashCode() to System.identityHashCode(), equals to ==, etc).
So if you diff the HashMap.java and IdentityHashMap.java after the
patch applied, you should notice very small difference. That's a good
sign because thus we can easily identify and generalize the common
code in both implementations, as we suggested to do. That's the
generalization we talked about with Nathan and Tim, right?

Keeping this in mind, you can see that the assumption of "minimal
change" can't be preserved and reading out the patch itself is pretty
useless. Rather, it makes sense to apply the patch to local copy and
then see whether the differences between IHM and HM are enough,
whether resulting IHM implementation contradicts with spec, whether
IHM passes the tests and runs the workloads well and stuff.

Of course, to avoid these confusions I can rewrite entire Collections
at once and then contribute a mega-patch, but I don't think this is
the proper way of doing things. I prefer gradual updates instead.

Thanks,
Aleksey.

On Tue, Apr 22, 2008 at 9:02 PM, Mark Hindess
<ma...@googlemail.com> wrote:
>
>  On 22 April 2008 at 15:23, "Aleksey Shipilev" <al...@gmail.com>
>  wrote:
>
> > Tim,
>  >
>  > I support your proposal and try to make distinction between
>  > proof-of-concept and clean patches.
>  >
>  > Performance improvements could be notorious work, requiring a lot of
>  > resources. The bad thing is, they could give nothing after the work is
>  > done, that's why we are doing thought experiments and proof-of-concept
>  > patches before actually implementing something. Proof-of-concept may
>  > also give a hint on what maximum improvement we can get and set the
>  > expectations for clean patch.
>  >
>  > So we are not looking for hiding the problems, rather we're exploring
>  > what could it worth.
>  >
>  > F.ex. the experiments on IdentityHashMap performance finally led to
>  > clean, non-breaking, consistent implementation, ready to be committed
>  > patch [1]. Can we focus on reviewing it?
>
>  "clean [snip] patch"?  Okay, I'll take the bait...
>
>  1) It is good to fix the assert in the test case, but if you are going to
>  change it at all you might as well replace:
>
>   assertTrue("TestB. removed entries incorrectly", map.size() == 11 &&
>  !result);
>
>  with:
>
>   assertEquals("TestB. removed entries incorrectly, size is not correct",
>                11, map.size());
>   assertFalse("TestB. removed entries incorrectly, result is not correct",
>               result);
>
>
>  2) I think the IdentityHashMap javadoc comment change removes too much
>  important information.
>
>
>  3) I don't understand why you make arbitrary whitespace changes - such
>  as rewriting:
>
>  public class IdentityHashMap<K, V> extends AbstractMap<K, V> implements
>         Map<K, V>, Serializable, Cloneable {
>
>  to be longer than 80-characters.
>
>
>  4) You rename DEFAULT_MAX_SIZE to DEFAULT_SIZE but I don't really see
>  the relevance of this to the changes.
>
>
>  5) You seem to remove a number of correct comments from the
>  declarations of threshold, DEFAULT_MAX_SIZE, loadFactor, and modCount.
>
>  I almost I gave up reading the patch at this point out of frustration
>  at the number of irrelevant changes.  Such as:
>
>  a) the re-ordering of often unchanged public methods - previously the
>  public methods were grouped by similar functionality which seems logical
>  to me but now they are in alphabetical order.  This makes the patch much
>  bigger and harder to read.  If there is a good reason to re-order the
>  methods, do it in a separate patch.
>
>
>  b) The arbitrary changes to javadoc like:
>
>      /**
>  -     * Searches this Map for the specified value.
>  -     *
>  -     *
>  +     * Searches this IdentityHashMap for the specified value.
>  +     *
>       * @param value
>       *            the object to search for
>  -     * @return true if <code>value</code> is a value of this Map, false
>  +     * @return true if <code>value</code> is a value of this IdentityHashMap,
>  false
>       *         otherwise
>       */
>
>  c) The arbitrary renaming of IdentityHashMapEntry to Entry.
>
>  Is it unrealistic to expect clean patches with justifications for
>  changes?  Can we optimise patches as well as code?
>
>  If I apply the patch and then revert the whitespace changes,
>  javadoc reformatting, renaming of DEFAULT_MAX_SIZE, renaming of
>  IdentityHashMapEntry, etc. the the resulting svn diff contains at least
>  33% fewer +/- lines than the one on the JIRA and is still functionally
>  the same.
>
>  As someone who tries to read svn commit messages (and all committers or
>  would-be-committers should be doing this to some extent!) patches like
>  this are a nightmare.
>
>  Regards,
>  -Mark.
>
>
>
>  >
>  > Thanks,
>  > Aleksey.
>  >
>  > [1] https://issues.apache.org/jira/browse/HARMONY-5771
>  >
>  > On Tue, Apr 22, 2008 at 12:09 PM, Tim Ellison <t....@gmail.com> wrote:
>  > > Aleksey Shipilev wrote:
>  > >
>  > > > Right, Tim. It was just the proof-of-concept "Does IdentityHashMap
>  > > > tuning worth it?".
>  > > >
>  > >
>  > >  I think it is unhelpful to mix actual, concrete suggestions for performanc
>  > e
>  > > improvements, with proof-of-concept ideas that break the specification or
>  > > security.
>  > >
>  > >  Please can people mark clearly in their mail whether they are suggesting
>  > > the idea only as "a thought experiment", otherwise somebody may apply the
>  > > change without properly understanding the consequences.
>  > >
>  > >  I realize the onus is on the committer to ensure it is safe but hiding suc
>  > h
>  > > problems is, as I said, unhelpful.
>  > >
>  > >  Thanks,
>  > >  Tim
>  > >
>  > >
>  > >
>  > >
>  > >
>  > > > On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <t....@gmail.com>
>  > > wrote:
>  > > >
>  > > > > Sergey Salishev wrote:
>  > > > >
>  > > > >
>  > > > > > It also should be noticed that replacing the IdentityHashMap with
>  > > HashMap
>  > > > > >
>  > > > > in
>  > > > >
>  > > > > > Thread impl gives 4.5x boost on ThreadLocalBench.
>  > > > > >
>  > > > > >
>  > > > >  But it would be wrong since it opens a security vulnerability.
>  > > > >
>  > > > >  See:
>  > > > >  http://markmail.org/message/rwq3kif5n33wp65m
>  > > > >  http://svn.apache.org/viewvc?view=rev&revision=477355
>  > > > >
>  > > > >
>  > > > >  Regards,
>  > > > >  Tim
>  > > > >
>  > > > >
>  > > >
>  > > >
>  > >
>  >
>
>
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Endre Stølsvik <En...@stolsvik.com>.
> a) the re-ordering of often unchanged public methods - previously the
> public methods were grouped by similar functionality which seems logical
> to me but now they are in alphabetical order.  This makes the patch much
> bigger and harder to read.  If there is a good reason to re-order the
> methods, do it in a separate patch.

If it makes ANY SENSE todo such things, then you are NOT using good 
enough tools. Have your IDE run the "Code outline" in alpha order if it 
pleases you. The file itself should obviously be in functional groups.

Ordering after functional groups is _the_ way, not _a_ way.

'javadoc' also understand this: when making the html, in the first 
block, listing the methods with the first sentence, it reorders the 
methods into alpha order, while the second block with the methods with 
the full javadoc is ordered as in the class. Perfect logic.

Alphabetical ordering of code - I'm getting aggressive thoughts by the 
mere idea. Such evilness must be fought.

Endre.

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Mark Hindess <ma...@googlemail.com>.
On 22 April 2008 at 15:23, "Aleksey Shipilev" <al...@gmail.com> 
wrote:
> Tim,
> 
> I support your proposal and try to make distinction between
> proof-of-concept and clean patches.
> 
> Performance improvements could be notorious work, requiring a lot of
> resources. The bad thing is, they could give nothing after the work is
> done, that's why we are doing thought experiments and proof-of-concept
> patches before actually implementing something. Proof-of-concept may
> also give a hint on what maximum improvement we can get and set the
> expectations for clean patch.
> 
> So we are not looking for hiding the problems, rather we're exploring
> what could it worth.
> 
> F.ex. the experiments on IdentityHashMap performance finally led to
> clean, non-breaking, consistent implementation, ready to be committed
> patch [1]. Can we focus on reviewing it?

"clean [snip] patch"?  Okay, I'll take the bait...

1) It is good to fix the assert in the test case, but if you are going to
change it at all you might as well replace:

  assertTrue("TestB. removed entries incorrectly", map.size() == 11 && 
!result);

with:

  assertEquals("TestB. removed entries incorrectly, size is not correct",
               11, map.size());
  assertFalse("TestB. removed entries incorrectly, result is not correct",
              result);


2) I think the IdentityHashMap javadoc comment change removes too much
important information.


3) I don't understand why you make arbitrary whitespace changes - such
as rewriting:

public class IdentityHashMap<K, V> extends AbstractMap<K, V> implements
        Map<K, V>, Serializable, Cloneable {

to be longer than 80-characters.


4) You rename DEFAULT_MAX_SIZE to DEFAULT_SIZE but I don't really see
the relevance of this to the changes.


5) You seem to remove a number of correct comments from the
declarations of threshold, DEFAULT_MAX_SIZE, loadFactor, and modCount.

I almost I gave up reading the patch at this point out of frustration
at the number of irrelevant changes.  Such as:

a) the re-ordering of often unchanged public methods - previously the
public methods were grouped by similar functionality which seems logical
to me but now they are in alphabetical order.  This makes the patch much
bigger and harder to read.  If there is a good reason to re-order the
methods, do it in a separate patch.


b) The arbitrary changes to javadoc like:

     /**
-     * Searches this Map for the specified value.
-     * 
-     * 
+     * Searches this IdentityHashMap for the specified value.
+     *
      * @param value
      *            the object to search for
-     * @return true if <code>value</code> is a value of this Map, false
+     * @return true if <code>value</code> is a value of this IdentityHashMap, 
false
      *         otherwise
      */

c) The arbitrary renaming of IdentityHashMapEntry to Entry.

Is it unrealistic to expect clean patches with justifications for
changes?  Can we optimise patches as well as code?

If I apply the patch and then revert the whitespace changes,
javadoc reformatting, renaming of DEFAULT_MAX_SIZE, renaming of
IdentityHashMapEntry, etc. the the resulting svn diff contains at least
33% fewer +/- lines than the one on the JIRA and is still functionally
the same.

As someone who tries to read svn commit messages (and all committers or
would-be-committers should be doing this to some extent!) patches like
this are a nightmare.

Regards,
-Mark.

> 
> Thanks,
> Aleksey.
> 
> [1] https://issues.apache.org/jira/browse/HARMONY-5771
> 
> On Tue, Apr 22, 2008 at 12:09 PM, Tim Ellison <t....@gmail.com> wrote:
> > Aleksey Shipilev wrote:
> >
> > > Right, Tim. It was just the proof-of-concept "Does IdentityHashMap
> > > tuning worth it?".
> > >
> >
> >  I think it is unhelpful to mix actual, concrete suggestions for performanc
> e
> > improvements, with proof-of-concept ideas that break the specification or
> > security.
> >
> >  Please can people mark clearly in their mail whether they are suggesting
> > the idea only as "a thought experiment", otherwise somebody may apply the
> > change without properly understanding the consequences.
> >
> >  I realize the onus is on the committer to ensure it is safe but hiding suc
> h
> > problems is, as I said, unhelpful.
> >
> >  Thanks,
> >  Tim
> >
> >
> >
> >
> >
> > > On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <t....@gmail.com>
> > wrote:
> > >
> > > > Sergey Salishev wrote:
> > > >
> > > >
> > > > > It also should be noticed that replacing the IdentityHashMap with
> > HashMap
> > > > >
> > > > in
> > > >
> > > > > Thread impl gives 4.5x boost on ThreadLocalBench.
> > > > >
> > > > >
> > > >  But it would be wrong since it opens a security vulnerability.
> > > >
> > > >  See:
> > > >  http://markmail.org/message/rwq3kif5n33wp65m
> > > >  http://svn.apache.org/viewvc?view=rev&revision=477355
> > > >
> > > >
> > > >  Regards,
> > > >  Tim
> > > >
> > > >
> > >
> > >
> >
> 



Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Tim,

I support your proposal and try to make distinction between
proof-of-concept and clean patches.

Performance improvements could be notorious work, requiring a lot of
resources. The bad thing is, they could give nothing after the work is
done, that's why we are doing thought experiments and proof-of-concept
patches before actually implementing something. Proof-of-concept may
also give a hint on what maximum improvement we can get and set the
expectations for clean patch.

So we are not looking for hiding the problems, rather we're exploring
what could it worth.

F.ex. the experiments on IdentityHashMap performance finally led to
clean, non-breaking, consistent implementation, ready to be committed
patch [1]. Can we focus on reviewing it?

Thanks,
Aleksey.

[1] https://issues.apache.org/jira/browse/HARMONY-5771

On Tue, Apr 22, 2008 at 12:09 PM, Tim Ellison <t....@gmail.com> wrote:
> Aleksey Shipilev wrote:
>
> > Right, Tim. It was just the proof-of-concept "Does IdentityHashMap
> > tuning worth it?".
> >
>
>  I think it is unhelpful to mix actual, concrete suggestions for performance
> improvements, with proof-of-concept ideas that break the specification or
> security.
>
>  Please can people mark clearly in their mail whether they are suggesting
> the idea only as "a thought experiment", otherwise somebody may apply the
> change without properly understanding the consequences.
>
>  I realize the onus is on the committer to ensure it is safe but hiding such
> problems is, as I said, unhelpful.
>
>  Thanks,
>  Tim
>
>
>
>
>
> > On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <t....@gmail.com>
> wrote:
> >
> > > Sergey Salishev wrote:
> > >
> > >
> > > > It also should be noticed that replacing the IdentityHashMap with
> HashMap
> > > >
> > > in
> > >
> > > > Thread impl gives 4.5x boost on ThreadLocalBench.
> > > >
> > > >
> > >  But it would be wrong since it opens a security vulnerability.
> > >
> > >  See:
> > >  http://markmail.org/message/rwq3kif5n33wp65m
> > >  http://svn.apache.org/viewvc?view=rev&revision=477355
> > >
> > >
> > >  Regards,
> > >  Tim
> > >
> > >
> >
> >
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Tim Ellison <t....@gmail.com>.
Aleksey Shipilev wrote:
> Right, Tim. It was just the proof-of-concept "Does IdentityHashMap
> tuning worth it?".

I think it is unhelpful to mix actual, concrete suggestions for 
performance improvements, with proof-of-concept ideas that break the 
specification or security.

Please can people mark clearly in their mail whether they are suggesting 
the idea only as "a thought experiment", otherwise somebody may apply 
the change without properly understanding the consequences.

I realize the onus is on the committer to ensure it is safe but hiding 
such problems is, as I said, unhelpful.

Thanks,
Tim


> On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <t....@gmail.com> wrote:
>> Sergey Salishev wrote:
>>
>>> It also should be noticed that replacing the IdentityHashMap with HashMap
>> in
>>> Thread impl gives 4.5x boost on ThreadLocalBench.
>>>
>>  But it would be wrong since it opens a security vulnerability.
>>
>>  See:
>>   http://markmail.org/message/rwq3kif5n33wp65m
>>   http://svn.apache.org/viewvc?view=rev&revision=477355
>>
>>
>>  Regards,
>>  Tim
>>
> 

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Right, Tim. It was just the proof-of-concept "Does IdentityHashMap
tuning worth it?".

On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <t....@gmail.com> wrote:
> Sergey Salishev wrote:
>
> > It also should be noticed that replacing the IdentityHashMap with HashMap
> in
> > Thread impl gives 4.5x boost on ThreadLocalBench.
> >
>
>  But it would be wrong since it opens a security vulnerability.
>
>  See:
>   http://markmail.org/message/rwq3kif5n33wp65m
>   http://svn.apache.org/viewvc?view=rev&revision=477355
>
>
>  Regards,
>  Tim
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Tim Ellison <t....@gmail.com>.
Sergey Salishev wrote:
> It also should be noticed that replacing the IdentityHashMap with HashMap in
> Thread impl gives 4.5x boost on ThreadLocalBench.

But it would be wrong since it opens a security vulnerability.

See:
   http://markmail.org/message/rwq3kif5n33wp65m
   http://svn.apache.org/viewvc?view=rev&revision=477355


Regards,
Tim

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Aleksey Shipilev <al...@gmail.com>.
Yep, I second that.

On Fri, Apr 18, 2008 at 11:01 PM, Sergey Salishev
<se...@gmail.com> wrote:
> Hi,
>
>  It also should be noticed that replacing the IdentityHashMap with HashMap in
>  Thread impl gives 4.5x boost on ThreadLocalBench.
>
>  Thanks.
>  Sergey.
>
>
>
>  On Fri, Apr 18, 2008 at 10:53 PM, Aleksey Shipilev <
>  aleksey.shipilev@gmail.com> wrote:
>
>  > Hi there,
>  >
>  > I had instrumented the current implementation of j.u.IdentityHashMap
>  > and ran it through ThreadLocalBench and SerialBench. Surprisingly, the
>  > collision rate there (frequency of sutiations when objects have
>  > identical identityHashCode) is rather high - 8 collisions per one
>  > get() on average with load factor of 0.75.
>  >
>  > That mean identityHashCode does not disperse elements well and the
>  > map's performance is subject to clustering. Moreover, the collision
>  > chains are sometimes so big that they're covering a half of storage
>  > array and IHM looses access performance degrading to linear search.
>  > And also the performance of essentially the same IHMs (in terms of
>  > contents) may vary because of different clustering layouts.
>  >
>  > That's essentially the disadvantage of open-addressed hash tables and
>  > there are known ways to overwhelm this problem [1]. But before
>  > actually researching on this topic I need to answer some
>  > "java-spec-legal" questions:
>  >  1. Is it required for IdentityHashMap to use open addressing?
>  >  2. Is it required for IdentityHashMap to use linear probing as
>  > collision resolution scheme?
>  >  3. What's the reason of having the separate implementation of
>  > IdentityHashMap while it can be implemented with overriding of
>  > "equality" operation in ordinary HashMap?
>  >  4. Can identityHashCode be salted with some additional hash to break
>  > clustering?
>  >
>  > I'm actually interested in performance optimization of IHM, so I will
>  > appreciate anyone comments on this topic.
>  >
>  > Thanks,
>  > Aleksey.
>  >
>  > [1] http://en.wikipedia.org/wiki/Hash_table
>  >
>

Re: [classlib][luni][performance] IdentityHashMap implementation

Posted by Sergey Salishev <se...@gmail.com>.
Hi,

It also should be noticed that replacing the IdentityHashMap with HashMap in
Thread impl gives 4.5x boost on ThreadLocalBench.

Thanks.
Sergey.

On Fri, Apr 18, 2008 at 10:53 PM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:

> Hi there,
>
> I had instrumented the current implementation of j.u.IdentityHashMap
> and ran it through ThreadLocalBench and SerialBench. Surprisingly, the
> collision rate there (frequency of sutiations when objects have
> identical identityHashCode) is rather high - 8 collisions per one
> get() on average with load factor of 0.75.
>
> That mean identityHashCode does not disperse elements well and the
> map's performance is subject to clustering. Moreover, the collision
> chains are sometimes so big that they're covering a half of storage
> array and IHM looses access performance degrading to linear search.
> And also the performance of essentially the same IHMs (in terms of
> contents) may vary because of different clustering layouts.
>
> That's essentially the disadvantage of open-addressed hash tables and
> there are known ways to overwhelm this problem [1]. But before
> actually researching on this topic I need to answer some
> "java-spec-legal" questions:
>  1. Is it required for IdentityHashMap to use open addressing?
>  2. Is it required for IdentityHashMap to use linear probing as
> collision resolution scheme?
>  3. What's the reason of having the separate implementation of
> IdentityHashMap while it can be implemented with overriding of
> "equality" operation in ordinary HashMap?
>  4. Can identityHashCode be salted with some additional hash to break
> clustering?
>
> I'm actually interested in performance optimization of IHM, so I will
> appreciate anyone comments on this topic.
>
> Thanks,
> Aleksey.
>
> [1] http://en.wikipedia.org/wiki/Hash_table
>