You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/03 06:20:54 UTC

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #4953: Core, API: Replace unneccessary uses of hash with hashCode

amogh-jahagirdar opened a new pull request, #4953:
URL: https://github.com/apache/iceberg/pull/4953

   To get rid of some build warnings 
   
   `warning: [ObjectsHashCodeUnnecessaryVarargs] java.util.Objects.hash(non-varargs) should be replaced with java.util.Objects.hashCode(value) to avoid unnecessary varargs array allocations.
       return Objects.hash(fields);
                          ^
       (see https://github.com/palantir/gradle-baseline#baseline-error-prone-checks)
     Did you mean 'return Objects.hashCode(fields)`
   
   In some hashCode implementations we unnecessarily use Objects.hash which accepts variadic arguments. Under the hood, this will create an array and do Arrays.hash. However, in these cases since we're passing in a single object reference, hashCode suffices, and we can avoid ant unnecessary array allocations,
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #4953: Core, API: Replace unneccessary uses of hash with hashCode

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#discussion_r889123826


##########
api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java:
##########
@@ -172,7 +172,7 @@ public boolean equals(Object o) {
 
   @Override
   public int hashCode() {
-    return Objects.hash(wrapperSet);

Review Comment:
   1.) Yeah I was thinking about this. It would change the value of the hashcode compared to using hash, because before we're doing an array allocation and then hashing that, now we're doing the hash on the object itself. There's an example of this https://www.baeldung.com/java-objects-hash-vs-objects-hashcode .
   
   However it still would retain the property that any equal objects would have the same hashcode within the same JVM, which is sufficient. Across different JVMs (in a distributed context) we aren't guaranteed consistent hashcode for the same object. But this is an issue regardless of the method that's used https://martin.kleppmann.com/2012/06/18/java-hashcode-unsafe-for-distributed-systems.html
   
   
   2.) That's a good point, although I was kind of against suppressing mostly because it seemed really unnecessary. In the case of more fields, I think it makes sense to just tackle that then because presumably it would be part of a bigger change anyways. I'm not sure about if for these classes new fields would ever be added but of course, hard to predict that 
   
    Let me know what you think ! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1150334748

   No problem at all folks, want to make sure we're reaching the right decision! :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #4953: Core, API: Replace unnecessary hash uses with hashCode

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1150351731

   Thanks for the reviews all! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1146858012

   I'm okay with either solution. I don't think that we guarantee stability of `hashCode`, nor should we.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #4953: Core, API: Replace unneccessary uses of hash with hashCode

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#discussion_r889075617


##########
api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java:
##########
@@ -172,7 +172,7 @@ public boolean equals(Object o) {
 
   @Override
   public int hashCode() {
-    return Objects.hash(wrapperSet);

Review Comment:
   I think we should instead suppress the warning, because:
   1. this might change the hashcode of the class object? We should verify
   2. if more fields are added, we need to get back to use `Objects.hash` again



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#discussion_r889214205


##########
api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java:
##########
@@ -172,7 +172,7 @@ public boolean equals(Object o) {
 
   @Override
   public int hashCode() {
-    return Objects.hash(wrapperSet);

Review Comment:
   Updated the error prone config! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1146738542

   Though I do appreciate the cleanup in general đź‘Ť
   
   Just I agree most of all with Jack’s original argument.
   
   > I think we should instead suppress the warning, because:
   
   I think we should instead suppress the warning, because:
   1. this might change the hashcode of the class object? We should verify
   2. if more fields are added, we need to get back to use Objects.hash again
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #4953: Core, API: Replace unneccessary uses of hash with hashCode

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#discussion_r889170956


##########
api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java:
##########
@@ -172,7 +172,7 @@ public boolean equals(Object o) {
 
   @Override
   public int hashCode() {
-    return Objects.hash(wrapperSet);

Review Comment:
   Thanks for the explanation. I see this line in your blog:
   
   
   > The problem is that although the documentation says hashCode() doesn’t provide a consistency guarantee, the Java standard library behaves as if it did provide the guarantee. People start relying on it, and since backwards-compatibility is rated so highly in the Java community, it will probably never ever be changed, even though the documentation would allow it to be changed. So the JVM gets the worst of both worlds: a hash table implementation that is open to DoS attacks, but also a hash function that can’t always safely be used for communication between processes. :(
   > 
   > Therefore…
   > 
   > So what I’d like to ask for is this: if you’re building a distributed framework based on the JVM, please don’t use Java’s hashCode() for anything that needs to work across different processes. Because it’ll look like it works fine when you use it with strings and numbers, and then someday a brave soul will use (e.g.) a protocol buffers object, and then spend days banging their head against a wall trying to figure out why messages are getting sent to the wrong servers.
   
   
   In that case, it feels like we should really update the errorprone config to not report warning for not using hashCode...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1150335877

   Oh @amogh-jahagirdar you might want to update the PR title again! That way if anybody is looking through old PRs, it will be reflective of the actual changes! Not a huge deal but worth mentioning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1147754785

   > I don't think that we guarantee stability of hashCode, nor should we.
   
   Yes that's a good point, it's hard to guarantee and also not worth the effort. In that case I think we can just move to use `hashCode`, sorry for the back and forth.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #4953: Core, API: Replace unneccessary uses of hash with hashCode

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#discussion_r889170956


##########
api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java:
##########
@@ -172,7 +172,7 @@ public boolean equals(Object o) {
 
   @Override
   public int hashCode() {
-    return Objects.hash(wrapperSet);

Review Comment:
   Thanks for the explanation. I see this line in your blog:
   
   ```
   The problem is that although the documentation says hashCode() doesn’t provide a consistency guarantee, the Java standard library behaves as if it did provide the guarantee. People start relying on it, and since backwards-compatibility is rated so highly in the Java community, it will probably never ever be changed, even though the documentation would allow it to be changed. So the JVM gets the worst of both worlds: a hash table implementation that is open to DoS attacks, but also a hash function that can’t always safely be used for communication between processes. :(
   
   Therefore…
   
   So what I’d like to ask for is this: if you’re building a distributed framework based on the JVM, please don’t use Java’s hashCode() for anything that needs to work across different processes. Because it’ll look like it works fine when you use it with strings and numbers, and then someday a brave soul will use (e.g.) a protocol buffers object, and then spend days banging their head against a wall trying to figure out why messages are getting sent to the wrong servers.
   ```
   
   In that case, it feels like we should really update the errorprone config to not report warning for not using hashCode...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #4953: Core, API: Replace unnecessary hash uses with hashCode

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1150339109

   > Oh @amogh-jahagirdar you might want to update the PR title again! That way if anybody is looking through old PRs, it will be reflective of the actual changes! Not a huge deal but worth mentioning.
   
   Definitely, done! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #4953: Core, API: Replace unnecessary hash uses with hashCode

Posted by GitBox <gi...@apache.org>.
jackye1995 merged PR #4953:
URL: https://github.com/apache/iceberg/pull/4953


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1147876252

   @kbendick any additional comment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #4953: Core, API: Replace unneccessary uses of hash with hashCode

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#discussion_r889123826


##########
api/src/main/java/org/apache/iceberg/util/CharSequenceSet.java:
##########
@@ -172,7 +172,7 @@ public boolean equals(Object o) {
 
   @Override
   public int hashCode() {
-    return Objects.hash(wrapperSet);

Review Comment:
   1.) Yeah I was thinking about this. It would change the value of the hashcode compared to using hash, because before we're doing an array allocation and then hashing that, now we're doing the hash on the object itself. There's an example of this https://www.baeldung.com/java-objects-hash-vs-objects-hashcode .
   
   However it still would retain the property that any equal objects would have the same hashcode within the same JVM, which is sufficient. Across different JVMs (in a distributed context) we aren't guaranteed consistent hashcode for the same object. But this is an issue regardless of the method that's used https://martin.kleppmann.com/2012/06/18/java-hashcode-unsafe-for-distributed-systems.html
   
   
   2.) That's a good point, although I was kind of against suppressing mostly because it seemed really unnecessary. In the case of more fields, I think it makes sense to just tackle that then. I'm not sure about if for these classes new fields would ever be added but of course, hard to predict that 
   
    Let me know what you think ! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1146737859

   I’m worried about suppressing this rule entirely. Does that mean future classes won’t report this warning when they’re being written? Those classes would then be doing an unnecessary allocation to an array pretty often.
   
   If so, I’d prefer suppressing this specific instance if we don’t care to change it. Having looked at the blog briefly, I understand the argument that the JVM hash code truly isn’t consistent across versions etc.
   
   But in the general and common case this rule still warns against an unnecessary array allocation, right? It seems strange to globally block that rule over locally suppressing or just saying it’s fine to change.
   
   Is there some bigger argument I’m missing?
   
   > So what I’d like to ask for is this: if you’re building a distributed framework based on the JVM, please don’t use Java’s hashCode() for anything that needs to work across different processes. Because it’ll look like it works fine when you use it with strings and numbers, and then someday a brave soul will use (e.g.) a protocol buffers object, and then spend days banging their head against a wall trying to figure out why messages are getting sent to the wrong servers.
   
   While this very well may be true about Java hashCode and I won’t dispute your argument.
   
   But can we confirm that this isn’t being used this way in this library at times and downstream in this way and assumed somewhat deterministic? Not in my opinion. Iceberg is used in many places. I’d prefer to add a local suppression, but if we want to change it then just change it here and keep the warning.
   
   It’s good advice to not call `Objects.hash` with only one argument because of the array allocation still.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #4953: Ignore ObjectsHashCodeUnnecessaryVarargs error prone warning

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1150333821

   > > I don't think that we guarantee stability of hashCode, nor should we.
   > 
   > Yes that's a good point, it's hard to guarantee and also not worth the effort. In that case I think we can just move to use `hashCode`, sorry for the back and forth.
   
   Nope. No additional comments. This looks good to me.
   
   I agree we don't guarantee hashCode (nor should we), but the rule is beneficial to avoid the unnecessary array allocation. Thanks for this and sorry for the back and forth @amogh-jahagirdar!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #4953: Core, API: Replace unnecessary hash uses with hashCode

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #4953:
URL: https://github.com/apache/iceberg/pull/4953#issuecomment-1150350306

   Looks good to me, thanks for the review! @kbendick @rdblue @singhpk234 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org