You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2021/08/04 21:13:27 UTC

[GitHub] [avro] belugabehr opened a new pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

belugabehr opened a new pull request #1301:
URL: https://github.com/apache/avro/pull/1301


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [AVRO-3184](https://issues.apache.org/jira/browse/AVRO-3184) issues 
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   Performance, no change in functionality.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r701247652



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -69,6 +69,16 @@
 
   private static final GenericData INSTANCE = new GenericData();
 
+  private static final Map<Class<?>, String> PRIMATIVE_DATUM_TYPES = new IdentityHashMap<>();

Review comment:
       Quick note: this should be `PRIMITIVE`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
martin-g commented on pull request #1301:
URL: https://github.com/apache/avro/pull/1301#issuecomment-919405585


   @RyanSkraba Merge it! It could be improved later if needed!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702953724



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -892,6 +902,9 @@ public int resolveUnion(Schema union, Object datum) {
   protected String getSchemaName(Object datum) {
     if (datum == null || datum == JsonProperties.NULL_VALUE)
       return Type.NULL.getName();
+    String primativeType = PRIMATIVE_DATUM_TYPES.get(datum.getClass());
+    if (primativeType != null)
+      return primativeType;

Review comment:
       I guess the subsequent code wouldn't be **entirely** dead, since somebody might have an implementation that turns their special `MyInteger` class into a `Type.INT`.  Hypothetically, this change would make it impossible for a subclass to specify that all `java.lang.Integer` should be represented by a `Type.LONG`, I guess! 
   
   While I doubt anybody is actually doing that, why don't we play it with a bit of security and do: 
   
   ```
   String primativeType = getPrimitiveTypeCache().get(datum.getClass());
   ```
   
   where `getPrimitiveTypeCache()` is protected and returns `PRIMATIVE_DATUMS_TYPES` by default?  That adds one method call but would give a subclass an "escape route" if they do have a specialised behaviour.  Alternatively, a protected data member that is initialised to the static hash instance perhaps?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702642170



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -69,6 +69,16 @@
 
   private static final GenericData INSTANCE = new GenericData();
 
+  private static final Map<Class<?>, String> PRIMATIVE_DATUM_TYPES = new IdentityHashMap<>();

Review comment:
       That's true, but `java.lang.Class` does not override `#equals()`, so it inherits `Object#equals()` which uses `==`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] belugabehr commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
belugabehr commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702475769



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -892,6 +902,9 @@ public int resolveUnion(Schema union, Object datum) {
   protected String getSchemaName(Object datum) {
     if (datum == null || datum == JsonProperties.NULL_VALUE)
       return Type.NULL.getName();
+    String primativeType = PRIMATIVE_DATUM_TYPES.get(datum.getClass());
+    if (primativeType != null)
+      return primativeType;

Review comment:
       Hello,
   
   It was my desire to remove these methods, but I feared I would be breaking backwards compatibility (even more than this already does).  It may still be possible for developers to override these methods. For example:
   
   ```java
       if (isInteger(datum))
         return Type.INT.getName();
   ```
   
   A developer could allow for the primitive case to be handled by the cache, but also extend the functionality to allow for other logical types to, in this case, return an INT type by overriding `isInteger`.  I'm happy to take away that ability if it offers no real-world use cases, but I didn't want to be so heavy-handed with it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on pull request #1301:
URL: https://github.com/apache/avro/pull/1301#issuecomment-916141604


   Nice catch for the Utf8 instance!  If you're happy with the state of this change, let's merge it.
   
   We should mention in the release notes that there might be a migration for any subclass of SpecificData that uses isXxxx primitives.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r698395051



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -69,6 +69,16 @@
 
   private static final GenericData INSTANCE = new GenericData();
 
+  private static final Map<Class<?>, String> PRIMATIVE_DATUM_TYPES = new IdentityHashMap<>();

Review comment:
       Any reason to use IdentityHashMap in favour of HashMap ?

##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -892,6 +902,9 @@ public int resolveUnion(Schema union, Object datum) {
   protected String getSchemaName(Object datum) {
     if (datum == null || datum == JsonProperties.NULL_VALUE)
       return Type.NULL.getName();
+    String primativeType = PRIMATIVE_DATUM_TYPES.get(datum.getClass());
+    if (primativeType != null)
+      return primativeType;

Review comment:
       Shouldn't we remove the later checks for `isInteger(datum)`, `isLong(datum)`, etc. ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] belugabehr commented on pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1301:
URL: https://github.com/apache/avro/pull/1301#issuecomment-916991862


   @RyanSkraba All ready for merge! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702650851



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -892,6 +902,9 @@ public int resolveUnion(Schema union, Object datum) {
   protected String getSchemaName(Object datum) {
     if (datum == null || datum == JsonProperties.NULL_VALUE)
       return Type.NULL.getName();
+    String primativeType = PRIMATIVE_DATUM_TYPES.get(datum.getClass());
+    if (primativeType != null)
+      return primativeType;

Review comment:
       In addition, `PRIMATIVE_DATUM_TYPES` handles primitives' wrapper classes, like `Integer.class`. Does it need to handle `Integer.TYPE` too ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r701997015



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -892,6 +902,9 @@ public int resolveUnion(Schema union, Object datum) {
   protected String getSchemaName(Object datum) {
     if (datum == null || datum == JsonProperties.NULL_VALUE)
       return Type.NULL.getName();
+    String primativeType = PRIMATIVE_DATUM_TYPES.get(datum.getClass());
+    if (primativeType != null)
+      return primativeType;

Review comment:
       I couldn't find any implementations that override these methods.   They should be safe to remove, but it looks like the intention was that subclasses of GenericData would be able to change their behaviour by overriding these methods. 
   
   That expectation (which nobody uses) would be broken by the primitive cache anyway... 
   
   @belugabehr what do you think?  Should these methods be cleaned up?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba merged pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
RyanSkraba merged pull request #1301:
URL: https://github.com/apache/avro/pull/1301


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702649923



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -892,6 +902,9 @@ public int resolveUnion(Schema union, Object datum) {
   protected String getSchemaName(Object datum) {
     if (datum == null || datum == JsonProperties.NULL_VALUE)
       return Type.NULL.getName();
+    String primativeType = PRIMATIVE_DATUM_TYPES.get(datum.getClass());
+    if (primativeType != null)
+      return primativeType;

Review comment:
       I am not sure whether I was clear enough.
   https://github.com/apache/avro/blob/cc0eb0bee35681f33a0b5324b106ef744be86d1f/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L905-L907 is the new `if` branch.
   If I understand it correctly if the datum is any of the "primitive" types then the following `if` branches won't ever be reached: https://github.com/apache/avro/blob/cc0eb0bee35681f33a0b5324b106ef744be86d1f/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L918-L931
   So, even if a user overrides `isInteger()` it won't be called at all, because of https://github.com/apache/avro/blob/cc0eb0bee35681f33a0b5324b106ef744be86d1f/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L905-L907
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] RyanSkraba commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702957775



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -69,6 +69,16 @@
 
   private static final GenericData INSTANCE = new GenericData();
 
+  private static final Map<Class<?>, String> PRIMATIVE_DATUM_TYPES = new IdentityHashMap<>();

Review comment:
       They're equivalent -- let's go with @belugabehr 's choice!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] belugabehr commented on pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1301:
URL: https://github.com/apache/avro/pull/1301#issuecomment-916198144


   Yup.  Should be good to go now.  Thanks for all your helpful feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] belugabehr commented on a change in pull request #1301: AVRO-3184: Cache Datum Type Strings in Resolve Union

Posted by GitBox <gi...@apache.org>.
belugabehr commented on a change in pull request #1301:
URL: https://github.com/apache/avro/pull/1301#discussion_r702476115



##########
File path: lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
##########
@@ -69,6 +69,16 @@
 
   private static final GenericData INSTANCE = new GenericData();
 
+  private static final Map<Class<?>, String> PRIMATIVE_DATUM_TYPES = new IdentityHashMap<>();

Review comment:
       It's faster to use `IdentityHashMap` as it uses the primitive `==` method instead of `equals`.  When coming up with this solution, I did some research and from everything I can find, it seems that as long as the classes comes from the same class loader (as is the case here since it's populated by the class itself and not an outside reference), then this should be perfectly acceptable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org