You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2022/06/03 05:32:51 UTC

[GitHub] [avro] steven-aerts opened a new pull request, #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

steven-aerts opened a new pull request, #1708:
URL: https://github.com/apache/avro/pull/1708

   Update the compiler to generate the implementation of the `.equals()` and `.hashCode() function, instead of relying on the
   implementation of GenericData.  This improves the performance of those functions significantly.
   
   The generated implementations are factor 10 to 20 faster for `.equals()` and a factor 5 to 10 for `.hashCode()`.
   
   The implementation generates the same hashCode as the genericData, which is validated by existing tests
   
   Result of Perf test before the change:
   
   ```
   Benchmark              Mode  Cnt          Score             Error  Units
   SpecficTest.equals    thrpt    3   12598610.194 +/-  11160265.279  ops/s
   SpecficTest.hashCode  thrpt    3   24729446.862 +/-  29051332.794  ops/s
   ```
   
   Results using generated functions:
   
   ```
   Benchmark              Mode  Cnt          Score             Error  Units
   SpecficTest.equals    thrpt    3  211314296.950 +/- 104154793.126  ops/s
   SpecficTest.hashCode  thrpt    3  180349506.632 +/- 143639246.771  ops/s
   ```
   
   ### Jira
   
   - [x] My PR addresses the following: [AVRO-3527](https://issues.apache.org/jira/browse/AVRO-3527) Generated equals() and hashCode() for SpecificRecords
   
   ### Tests
   
   - [x] My PR adds the following unit tests:
   * TestUtf8#testHashCodeSameAsString()
   * TestGeneratedCode#ignoredFields()
   * JMH test for SpecificRecords `equals()` and `hashCode()`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] opwvhk commented on a diff in pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
opwvhk commented on code in PR #1708:
URL: https://github.com/apache/avro/pull/1708#discussion_r888739620


##########
lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/FieldTest.java:
##########
@@ -16,6 +16,8 @@
 @org.apache.avro.specific.AvroGenerated
 public class FieldTest extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
   private static final long serialVersionUID = 4609235620572341636L;
+
+

Review Comment:
   Where do these empty lines come from?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] steven-aerts commented on a diff in pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
steven-aerts commented on code in PR #1708:
URL: https://github.com/apache/avro/pull/1708#discussion_r888787906


##########
lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/FieldTest.java:
##########
@@ -16,6 +16,8 @@
 @org.apache.avro.specific.AvroGenerated
 public class FieldTest extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
   private static final long serialVersionUID = 4609235620572341636L;
+
+

Review Comment:
   @opwvhk, they were [introduced 9 months ago](https://github.com/apache/avro/pull/885/files#diff-fe8e85cc928036e36a87661cc5cc15a4ba6bddaff0b283910dee74209758fa51) with the introduction of the custom logical types.
   
   The test ignores white-spaces changes so it is only visible now.  Do you want me to do something about it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] opwvhk commented on pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
opwvhk commented on PR #1708:
URL: https://github.com/apache/avro/pull/1708#issuecomment-1150927691

   > I used to think that Avro required that the code-generator version _must_ be in sync with the avro library version, because that's the way my company always did it, but I've learned since that it's not always the case.
   
   This is the case for ANTLR, and is a common practice because some generated code is so tightly coupled to the library, that using the same version is a requirement to ensure compatibility. This is not the case for Avro.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] opwvhk commented on a diff in pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
opwvhk commented on code in PR #1708:
URL: https://github.com/apache/avro/pull/1708#discussion_r888813943


##########
lang/java/tools/src/test/compiler/output-string/avro/examples/baseball/FieldTest.java:
##########
@@ -16,6 +16,8 @@
 @org.apache.avro.specific.AvroGenerated
 public class FieldTest extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
   private static final long serialVersionUID = 4609235620572341636L;
+
+

Review Comment:
   Not unless spotless complains about it. It doesn't bother me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] github-code-scanning[bot] commented on a diff in pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #1708:
URL: https://github.com/apache/avro/pull/1708#discussion_r891495081


##########
lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java:
##########
@@ -1131,6 +1131,46 @@
     return word;
   }
 
+  public boolean canGenerateEqualsAndHashCode(Schema schema) {
+    return getUsedCustomLogicalTypeFactories(schema).isEmpty();
+  }
+
+  public boolean isPrimitiveType(Schema schema) {
+    return !isUnboxedJavaTypeNullable(schema) && getConvertedLogicalType(schema) == null;
+  }
+
+  public String hashCodeFor(Schema schema, String name) {
+    switch (javaUnbox(schema)) {

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [SpecificCompiler.javaUnbox](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/2708)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] steven-aerts commented on pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
steven-aerts commented on PR #1708:
URL: https://github.com/apache/avro/pull/1708#issuecomment-1193776812

   @RyanSkraba is there still something I can do to help to get this change submitted?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on PR #1708:
URL: https://github.com/apache/avro/pull/1708#issuecomment-1148926792

   Thanks for the PR, this looks really interesting!  I set the target for 1.12.0 because it's a change in how classes are generated... but I'm curious whether that should really be the case.  I really don't think there's any reason this could be a major change!  What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on PR #1708:
URL: https://github.com/apache/avro/pull/1708#issuecomment-1149852870

   > Do you see that as a reason to introduce it in a major version only?
   
   Thanks!  I think it's a good idea to bump it to the next major version, but we usually do one later in the year.
   
   I used to think that Avro required that the code-generator version *must* be in sync with the avro library version, because that's the way my company always did it, but I've learned since that it's not always the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] steven-aerts commented on a diff in pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
steven-aerts commented on code in PR #1708:
URL: https://github.com/apache/avro/pull/1708#discussion_r891942775


##########
lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SpecificCompiler.java:
##########
@@ -1131,6 +1131,46 @@
     return word;
   }
 
+  public boolean canGenerateEqualsAndHashCode(Schema schema) {
+    return getUsedCustomLogicalTypeFactories(schema).isEmpty();
+  }
+
+  public boolean isPrimitiveType(Schema schema) {
+    return !isUnboxedJavaTypeNullable(schema) && getConvertedLogicalType(schema) == null;
+  }
+
+  public String hashCodeFor(Schema schema, String name) {
+    switch (javaUnbox(schema)) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] steven-aerts commented on pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
steven-aerts commented on PR #1708:
URL: https://github.com/apache/avro/pull/1708#issuecomment-1149496779

   @RyanSkraba there is one small corner case, which let me doubt.
   
   When using `CharSequence` as `stringType`, the `equals` method uses a newly introduced `Utf8#compareSequences`.  As we could not use `java.lang.CharSequence#compare` which is only [available since java 11](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/CharSequence.html#compare(java.lang.CharSequence,java.lang.CharSequence)).
   
   If we choose to introduce this change as a minor version.  It would mean that for the `Charsequence` `stringType`, the code generated by an 1.11.1 compiler is incompatible with a 1.11.0 avro library.  Everything else should work in that constellation.
   
   Do you see that as a reason to introduce it in a major version only?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1708: AVRO-3527: Generated equals() and hashCode() for SpecificRecords

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on PR #1708:
URL: https://github.com/apache/avro/pull/1708#issuecomment-1196435707

   Hey, thanks so much for your patience!  The RC1 for 1.11.1 is out for testing, so hopefully we should be getting around to PRs targetted for 1.12.0 in August.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org