You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/06/03 05:33:00 UTC
[jira] [Work logged] (AVRO-3527) Generated equals() and hashCode() for SpecificRecords
[ https://issues.apache.org/jira/browse/AVRO-3527?focusedWorklogId=777936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777936 ]
ASF GitHub Bot logged work on AVRO-3527:
----------------------------------------
Author: ASF GitHub Bot
Created on: 03/Jun/22 05:32
Start Date: 03/Jun/22 05:32
Worklog Time Spent: 10m
Work Description: steven-aerts opened a new pull request, #1708:
URL: https://github.com/apache/avro/pull/1708
Update the compiler to generate the implementation of the `.equals()` and `.hashCode() function, instead of relying on the
implementation of GenericData. This improves the performance of those functions significantly.
The generated implementations are factor 10 to 20 faster for `.equals()` and a factor 5 to 10 for `.hashCode()`.
The implementation generates the same hashCode as the genericData, which is validated by existing tests
Result of Perf test before the change:
```
Benchmark Mode Cnt Score Error Units
SpecficTest.equals thrpt 3 12598610.194 +/- 11160265.279 ops/s
SpecficTest.hashCode thrpt 3 24729446.862 +/- 29051332.794 ops/s
```
Results using generated functions:
```
Benchmark Mode Cnt Score Error Units
SpecficTest.equals thrpt 3 211314296.950 +/- 104154793.126 ops/s
SpecficTest.hashCode thrpt 3 180349506.632 +/- 143639246.771 ops/s
```
### Jira
- [x] My PR addresses the following: [AVRO-3527](https://issues.apache.org/jira/browse/AVRO-3527) Generated equals() and hashCode() for SpecificRecords
### Tests
- [x] My PR adds the following unit tests:
* TestUtf8#testHashCodeSameAsString()
* TestGeneratedCode#ignoredFields()
* JMH test for SpecificRecords `equals()` and `hashCode()`
### Commits
- [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Documentation
- [x] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain Javadoc that explain what it does
Issue Time Tracking
-------------------
Worklog Id: (was: 777936)
Remaining Estimate: 0h
Time Spent: 10m
> Generated equals() and hashCode() for SpecificRecords
> -----------------------------------------------------
>
> Key: AVRO-3527
> URL: https://issues.apache.org/jira/browse/AVRO-3527
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Reporter: Steven Aerts
> Priority: Major
> Attachments: equals_hashcode_after.txt, equals_hashcode_before.txt, flame_graph.jpeg
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When profiling our production system, we found that it was spending almost 40% of its overall time in the {{SpecificRecordBase.hashCode()}} and {{SpecificRecordBase.equals()}} implementations.
> In some sections of its logic we see that almost all time is spend in those function, as can be seen in attached flame graph (blue "pyramids")
> !flame_graph.jpeg|width=385,height=99!
> By generating the {{.equals()}} and {{.hashCode()}} all this overhead disappeared and this application became 35% faster overall.
> Also on other AVRO heavy applications we saw noticeable performance gains where we hadn't expect them due to this improvement.
> A generated implementation of {{.hashCode()}} becomes 5 to 10 times faster than its generic counterpart. For {{.equals()}} it is 10 to 20 times faster.
> Which is also visible in the attached JMH benchmarks.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)