You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/01/03 10:34:12 UTC

[GitHub] [flink] liyafan82 commented on a change in pull request #10756: [FLINK-15465][FLINK-11964][table-runtime-blink] Fix hash table bugs

liyafan82 commented on a change in pull request #10756: [FLINK-15465][FLINK-11964][table-runtime-blink] Fix hash table bugs
URL: https://github.com/apache/flink/pull/10756#discussion_r362764978
 
 

 ##########
 File path: flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/hashtable/BaseHybridHashTable.java
 ##########
 @@ -516,4 +511,10 @@ public static int hash(int hashCode, int level) {
 		return code >= 0 ? code : -(code + 1);
 	}
 
+	/**
+	 * Partition level hash again, for avoid two layer hash conflict.
+	 */
+	static int partitionLevelHash(int hash) {
+		return hash ^ (hash >>> 16);
+	}
 
 Review comment:
   @JingsongLi Thanks a lot for bringing this up. 
   I agree with you that this is computationally efficient. 
   
   However, according to my experince, this may not be a good hash function. In practice, we usually encounter small integers (with most high bits equal to 0), so we have hash >>> 16 == 0, and hash ^ (hash >>> 16) will be equal to hash.
   
   IMO, hash ^ (hash << 16) is much better, as it maps small integers uniformly into the space of all integers. 
   
   That being that, this is just my personal suggestion. The final choice should be dependent on the real scenario. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services