You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2020/12/03 22:37:00 UTC

[jira] [Commented] (AVRO-2905) org.apache.avro.util.Utf8 bug in `hashCode`

    [ https://issues.apache.org/jira/browse/AVRO-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243559#comment-17243559 ] 

David Mollitor commented on AVRO-2905:
--------------------------------------

Woops.  I am just seeing this.  Thanks for catching my mistake.

 

I just wanted to say that the "hasHash" field is something I noticed being done in the JDK String code, so I wanted to keep this code close to what they are doing.  I wasn't just making it up. :)

 

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/String.java#L167-L171

> org.apache.avro.util.Utf8 bug in `hashCode`
> -------------------------------------------
>
>                 Key: AVRO-2905
>                 URL: https://issues.apache.org/jira/browse/AVRO-2905
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.10.0
>            Reporter: Kyoungha Min
>            Assignee: Kyoungha Min
>            Priority: Major
>             Fix For: 1.10.1
>
>
> Bug : utf8 hash reset is not correctly done, as a result, returns different hash when the values are logically equal.
>  
> cause commit : 
>  [https://github.com/apache/avro/commit/d5d1cd70e920f45468bf696d3668b496a9adbab3#diff-7001731cdafe4fc24f86e71c7e8a0b61]
> [https://github.com/apache/avro/commit/8e345c48e8f9708c27af63754965777314dac85c#diff-7001731cdafe4fc24f86e71c7e8a0b61]
>  
> reproduce : 
> {code:java}
>  org.apache.avro.util.Utf8 utf8 = new org.apache.avro.util.Utf8();
>  utf8.set("hello");
>  System.out.println(utf8.hashCode()); // 99162322
>  utf8.set("hello");
>  System.out.println(utf8.hashCode()); // -1538739392
> {code}
>  
> Suggested fix.
>  # get rid of `hasHash` field. In my opinion, it leads to more bug-prone by adding additional variable to take care of. `hash != 0` should be enough to check the presence of hash. Redundant computation for `0 hash` elements should be negligible.
>  # reset `hash = 0` on every mutation call.
>  # not needed, but I want to see some minor optimization like avoiding memory access by creating local variable (int hash = 0; .... this.hash = hash).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)