You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2019/03/06 06:56:30 UTC

[GitHub] [drill] weijietong edited a comment on issue #1662: DRILL-6825: apply different hash algorithms to different data types

weijietong edited a comment on issue #1662: DRILL-6825: apply different hash algorithms to different data types
URL: https://github.com/apache/drill/pull/1662#issuecomment-469987352
 
 
   @Ben-Zvi  I have done a benchmark about IntegerHash and MurmurHash3_32. The result shows that IntegerHash has nearly 2 times performance than MurmurHash3_32.
   ```
   HashBenchInteger.hashInteger  IntegerHash       N/A  avgt    5   3.987 ± 0.162  ns/op
   HashBenchInteger.hashInteger   Murmur3_32       N/A  avgt    5   6.626 ± 0.085  ns/op
   HashBenchInteger.hashLong     IntegerHash       N/A  avgt    5   4.903 ± 0.320  ns/op
   HashBenchInteger.hashLong      Murmur3_32       N/A  avgt    5   8.525 ± 0.649  ns/op
   ```
   The city_1_1 hash algorithm mentioned above only has a better performance than Murmur3_32 but not good than IntegerHash.
   
   I also run two sql queries comparison which are using IntegerHash and MurmurHash separately to the long data types. The queries are hashing hotspot:
   Q1:
   ```
   select c.c_custkey, count(*)
   from dfs.`/tpch100/customer` c
   group by c.c_custkey
   ```
   Q2:
   ```
   select c.c_custkey,c.c_nationkey, count(*)
   from dfs.`/tpch100/customer` c
   group by c.c_custkey,c.c_nationkey
   ```
   The dataset is tpc-h scale 100.  The query result shows that : 
   To Q1:
   IntegerHash has a 5% query performance improvement than MurmurHash3_32(IntegerHash: 14.029 sec, MurmurHash3_32: 14.870 sec ).
   To Q2:
   IntegerHash has a 16% query performance improvement than MurmurHash3_32(IntegerHash: 20.499 sec,MurmurHash3_32: 23.921 sec).
   
   It is clear that the more integer datatype columns grouped by, the more query performance will be gained.
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services