You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2019/03/04 12:33:10 UTC

[GitHub] [drill] weijietong commented on issue #1662: DRILL-6825: apply different hash algorithms to different data types

weijietong commented on issue #1662: DRILL-6825: apply different hash algorithms to different data types
URL: https://github.com/apache/drill/pull/1662#issuecomment-469236472
 
 
   The IntegerHashing's method was also used in ClickHouse for integer types(see: https://github.com/yandex/ClickHouse/blob/master/dbms/src/Common/HashTable/Hash.h   intHash32 method). CK does a fine hashing method choosing according to the data types and keys width which is valuable for us to learn. As you mentioned Murmur3Hash does not have a good performance at the shorter integer case.So it's better to use the IntegerHash at the integer keys case.
   
   The Boost implementation's discussion you mentioned I had read before. But I think it's reasonable why Boost still keep the current implementation now as a base library. 
   
   The reason to keep seed away from the hash32 function and involve the Boost's hash_combine method is that I want to change the current hashing strategy later. I plan to change the hash32(hash32(hash32)) row iterate model to `hash32() hash_combine hash32() hash_combine hash32()` column combine model at the multi-keys case. The row iterate module has a data dependency and will hurt the cpu pipeline performance.
   
   Other hashing methods I know can be found here: https://github.com/benalexau/hash-bench.  It's a java hashing method collection. The benchmark I run showed that https://github.com/OpenHFT/Zero-Allocation-Hashing/blob/master/src/main/java/net/openhft/hashing/LongHashFunction.java 's city_1_1 has a best performance at 32,64 bytes key width.
   
   I also wonder whether we can do the join keys data type implication at the project node later. So the HashJoin and Exchange node can also benefit from this PR.
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services