You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Liu, Tie" <ti...@baml.com> on 2014/01/02 20:21:41 UTC
propose change to java Schema.computeHash implementation to avoid unnessary object
allocation/hashcode calculation
Hi,
In current Schma.java we have following implementation:
public final int hashCode() {
if (hashCode == NO_HASHCODE)
hashCode = computeHash();
return hashCode;
}
int computeHash() { return getType().hashCode() + props.hashCode(); }
While hashCode is doing the checking of "if (hashCode == NO_HASHCODE)", the computeHash method is not. But the computeHash method is being called from Schema$Field.hashCode and the subclasses hashCode implementation like following:
public int hashCode() { return name.hashCode() + schema.computeHash(); } //this is from Schema$Field class
This is causing the the calculation of hashCode getting called unnecessarily extensively. The proposed changed is to add the "if" check inside the computeHash method instead:
int computeHash()
{
if (hashCode == NO_HASHCODE)
{
hashCode = getType().hashCode() + props.hashCode();
}
return hashCode;
}
We did a simple test to compare the performance difference, below is a summary of the heap snapshot of comparing the difference:
As a test I wrote a small program that creates a HashMap<Schema.Field, Integer>() and enters a loop simply identifying whether various Schema.Field instances are keys in the map. Obviously this is a pathological test case, but when running with the current implementation of Schema.Field it has (in about 30 seconds) used up nearly 8 GBytes of heap in instantiating intermediate objects associated with calling Schema.computeHash():
Heap
PSYoungGen total 17432576K, used 8666481K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
eden space 14942208K, 58% used [0x0000000340000000,0x0000000550f5c650,0x00000006d0000000)
from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
PSPermGen total 21504K, used 5782K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a5818,0x00000002fc300000)
When running with the modified implementation (and no other change) all the object allocation vanishes:
Heap
PSYoungGen total 17432576K, used 896532K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
eden space 14942208K, 6% used [0x0000000340000000,0x0000000376b852d0,0x00000006d0000000)
from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
PSPermGen total 21504K, used 5768K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a2240,0x00000002fc300000)
As a side-effect the test runs x3 faster with the modified hashCode() implementation.
Please let us know how you think of this proposal. If it's ok, I can create a jira and submit a patch for the change.
Thanks,
Tony
----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
RE: propose change to java Schema.computeHash implementation to avoid unnessary object
allocation/hashcode calculation
Posted by "Liu, Tie" <ti...@baml.com>.
Hi Doug,
The jira is created and the patch is attached: https://issues.apache.org/jira/browse/AVRO-1428
Please let me know if you have any comment on the patch.
Thanks,
Tony
-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: Friday, January 03, 2014 3:56 PM
To: dev@avro.apache.org
Cc: Murray, Paul; Rose, Jonathan
Subject: Re: propose change to java Schema.computeHash implementation to avoid unnessary object allocation/hashcode calculation
Tony,
This sounds like it could be a useful change. I have some concerns, but these would better be addressed in Jira once a patch has been posted.
https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute
Thanks,
Doug
On Thu, Jan 2, 2014 at 11:21 AM, Liu, Tie <ti...@baml.com> wrote:
> Hi,
> In current Schma.java we have following implementation:
> public final int hashCode() {
> if (hashCode == NO_HASHCODE)
> hashCode = computeHash();
> return hashCode;
> }
>
> int computeHash() { return getType().hashCode() + props.hashCode();
> }
>
> While hashCode is doing the checking of "if (hashCode == NO_HASHCODE)", the computeHash method is not. But the computeHash method is being called from Schema$Field.hashCode and the subclasses hashCode implementation like following:
> public int hashCode() { return name.hashCode() + schema.computeHash();
> } //this is from Schema$Field class
>
> This is causing the the calculation of hashCode getting called unnecessarily extensively. The proposed changed is to add the "if" check inside the computeHash method instead:
>
> int computeHash()
> {
> if (hashCode == NO_HASHCODE)
> {
> hashCode = getType().hashCode() + props.hashCode();
> }
> return hashCode;
> }
>
>
> We did a simple test to compare the performance difference, below is a summary of the heap snapshot of comparing the difference:
>
> As a test I wrote a small program that creates a HashMap<Schema.Field, Integer>() and enters a loop simply identifying whether various Schema.Field instances are keys in the map. Obviously this is a pathological test case, but when running with the current implementation of Schema.Field it has (in about 30 seconds) used up nearly 8 GBytes of heap in instantiating intermediate objects associated with calling Schema.computeHash():
>
> Heap
> PSYoungGen total 17432576K, used 8666481K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
> eden space 14942208K, 58% used [0x0000000340000000,0x0000000550f5c650,0x00000006d0000000)
> from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
> to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
> ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
> object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
> PSPermGen total 21504K, used 5782K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
> object space 21504K, 26% used
> [0x00000002fae00000,0x00000002fb3a5818,0x00000002fc300000)
>
> When running with the modified implementation (and no other change) all the object allocation vanishes:
>
> Heap
> PSYoungGen total 17432576K, used 896532K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
> eden space 14942208K, 6% used [0x0000000340000000,0x0000000376b852d0,0x00000006d0000000)
> from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
> to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
> ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
> object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
> PSPermGen total 21504K, used 5768K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
> object space 21504K, 26% used
> [0x00000002fae00000,0x00000002fb3a2240,0x00000002fc300000)
>
> As a side-effect the test runs x3 faster with the modified hashCode() implementation.
>
> Please let us know how you think of this proposal. If it's ok, I can create a jira and submit a patch for the change.
>
> Thanks,
>
> Tony
>
>
>
> ----------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
Re: propose change to java Schema.computeHash implementation to avoid
unnessary object allocation/hashcode calculation
Posted by Doug Cutting <cu...@apache.org>.
Tony,
This sounds like it could be a useful change. I have some concerns,
but these would better be addressed in Jira once a patch has been
posted.
https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute
Thanks,
Doug
On Thu, Jan 2, 2014 at 11:21 AM, Liu, Tie <ti...@baml.com> wrote:
> Hi,
> In current Schma.java we have following implementation:
> public final int hashCode() {
> if (hashCode == NO_HASHCODE)
> hashCode = computeHash();
> return hashCode;
> }
>
> int computeHash() { return getType().hashCode() + props.hashCode(); }
>
> While hashCode is doing the checking of "if (hashCode == NO_HASHCODE)", the computeHash method is not. But the computeHash method is being called from Schema$Field.hashCode and the subclasses hashCode implementation like following:
> public int hashCode() { return name.hashCode() + schema.computeHash(); } //this is from Schema$Field class
>
> This is causing the the calculation of hashCode getting called unnecessarily extensively. The proposed changed is to add the "if" check inside the computeHash method instead:
>
> int computeHash()
> {
> if (hashCode == NO_HASHCODE)
> {
> hashCode = getType().hashCode() + props.hashCode();
> }
> return hashCode;
> }
>
>
> We did a simple test to compare the performance difference, below is a summary of the heap snapshot of comparing the difference:
>
> As a test I wrote a small program that creates a HashMap<Schema.Field, Integer>() and enters a loop simply identifying whether various Schema.Field instances are keys in the map. Obviously this is a pathological test case, but when running with the current implementation of Schema.Field it has (in about 30 seconds) used up nearly 8 GBytes of heap in instantiating intermediate objects associated with calling Schema.computeHash():
>
> Heap
> PSYoungGen total 17432576K, used 8666481K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
> eden space 14942208K, 58% used [0x0000000340000000,0x0000000550f5c650,0x00000006d0000000)
> from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
> to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
> ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
> object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
> PSPermGen total 21504K, used 5782K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
> object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a5818,0x00000002fc300000)
>
> When running with the modified implementation (and no other change) all the object allocation vanishes:
>
> Heap
> PSYoungGen total 17432576K, used 896532K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
> eden space 14942208K, 6% used [0x0000000340000000,0x0000000376b852d0,0x00000006d0000000)
> from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
> to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
> ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
> object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
> PSPermGen total 21504K, used 5768K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
> object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a2240,0x00000002fc300000)
>
> As a side-effect the test runs x3 faster with the modified hashCode() implementation.
>
> Please let us know how you think of this proposal. If it's ok, I can create a jira and submit a patch for the change.
>
> Thanks,
>
> Tony
>
>
>
> ----------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.