You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/06/03 05:33:00 UTC

[jira] [Work logged] (AVRO-3527) Generated equals() and hashCode() for SpecificRecords

     [ https://issues.apache.org/jira/browse/AVRO-3527?focusedWorklogId=777936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777936 ]

ASF GitHub Bot logged work on AVRO-3527:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Jun/22 05:32
            Start Date: 03/Jun/22 05:32
    Worklog Time Spent: 10m 
      Work Description: steven-aerts opened a new pull request, #1708:
URL: https://github.com/apache/avro/pull/1708

   Update the compiler to generate the implementation of the `.equals()` and `.hashCode() function, instead of relying on the
   implementation of GenericData.  This improves the performance of those functions significantly.
   
   The generated implementations are factor 10 to 20 faster for `.equals()` and a factor 5 to 10 for `.hashCode()`.
   
   The implementation generates the same hashCode as the genericData, which is validated by existing tests
   
   Result of Perf test before the change:
   
   ```
   Benchmark              Mode  Cnt          Score             Error  Units
   SpecficTest.equals    thrpt    3   12598610.194 +/-  11160265.279  ops/s
   SpecficTest.hashCode  thrpt    3   24729446.862 +/-  29051332.794  ops/s
   ```
   
   Results using generated functions:
   
   ```
   Benchmark              Mode  Cnt          Score             Error  Units
   SpecficTest.equals    thrpt    3  211314296.950 +/- 104154793.126  ops/s
   SpecficTest.hashCode  thrpt    3  180349506.632 +/- 143639246.771  ops/s
   ```
   
   ### Jira
   
   - [x] My PR addresses the following: [AVRO-3527](https://issues.apache.org/jira/browse/AVRO-3527) Generated equals() and hashCode() for SpecificRecords
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   * TestUtf8#testHashCodeSameAsString()
   * TestGeneratedCode#ignoredFields()
   * JMH test for SpecificRecords `equals()` and `hashCode()`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 777936)
    Remaining Estimate: 0h
            Time Spent: 10m

> Generated equals() and hashCode() for SpecificRecords
> -----------------------------------------------------
>
>                 Key: AVRO-3527
>                 URL: https://issues.apache.org/jira/browse/AVRO-3527
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Steven Aerts
>            Priority: Major
>         Attachments: equals_hashcode_after.txt, equals_hashcode_before.txt, flame_graph.jpeg
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When profiling our production system, we found that it was spending almost 40% of its overall time in the {{SpecificRecordBase.hashCode()}} and {{SpecificRecordBase.equals()}} implementations.
> In some sections of its logic we see that almost all time is spend in those function, as can be seen in attached flame graph  (blue "pyramids")
> !flame_graph.jpeg|width=385,height=99!
> By generating the {{.equals()}} and {{.hashCode()}} all this overhead disappeared and this application became 35% faster overall. 
> Also on other AVRO heavy applications we saw noticeable performance gains where we hadn't expect them due to this improvement.
> A generated implementation of {{.hashCode()}} becomes 5 to 10 times faster than its generic counterpart. For {{.equals()}} it is 10 to 20 times faster.
> Which is also visible in the attached JMH benchmarks.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)