You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Liya Fan (Jira)" <ji...@apache.org> on 2020/03/02 02:23:00 UTC

[jira] [Commented] (CALCITE-3836) The hash codes of RelNodes are unreliable

    [ https://issues.apache.org/jira/browse/CALCITE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048739#comment-17048739 ] 

Liya Fan commented on CALCITE-3836:
-----------------------------------

[~zabetak] and [~julianhyde] Thanks a lot for your valuable feedback and sorry for my late reply.

Your reasoning makes sense to me, and maybe we should not be overly pessimistic about JVM implementations. 

However, IMO, I do not think it is a good idea to use the default identity hash code, even if the hash code remains unchanged. Essentially, the identity hash code is a random number that changes with different runs of the program. This leads to randomized program behavior, which makes it difficult to reproduce problems in the program. This in turn makes it difficult to diagnose and debug problems in the program. 

Do we need to consider the above factor?

> The hash codes of RelNodes are unreliable
> -----------------------------------------
>
>                 Key: CALCITE-3836
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3836
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Liya Fan
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> For all sub-classes of AbstractRelNode, the {{hashCode}} methods depend on {{AbstractRelNode#hashCode}}, because it is declared as final. 
> {{AbstractRelNode#hashCode}} depends on {{Object#hashCode}}, which is called identify hash code. The details of identity hash code depends on the specific JVM implementation. For many JVMs, the implementation is based on the object address in the memory. The problem is that, the address of an object may change in a JVM, due to GC, memory contraction, etc. So the hash code of an object may change, even if the content of the object is not changed (This can be confirmed from the JavaDoc of {{Object#hashCode}}). 
> This problem may cause severe issues that are hard to diagnose and debug, like an object is in the hash table, but cannot be retrieved; duplicate objects in the hash map, etc. 
> To solve the problem, we compute a hash code solely from the node id. This is consistent with the previous semantics, and solves the above problem. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)