You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@quickstep.apache.org by "Harshad Deshmukh (JIRA)" <ji...@apache.org> on 2017/03/24 02:05:44 UTC

[jira] [Commented] (QUICKSTEP-85) Don't use hash table load factor when exact estimate is known

    [ https://issues.apache.org/jira/browse/QUICKSTEP-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939565#comment-15939565 ] 

Harshad Deshmukh commented on QUICKSTEP-85:
-------------------------------------------

I tried the change in the branch - https://github.com/hbdeshmukh/incubator-quickstep/tree/ht-load-factor 
The results are not encouraging - https://docs.google.com/spreadsheets/d/1FPC-bIG666sW6-Cfqw9Z9aaUtx69d_kIxqvtDIjUMHI/edit#gid=0

Some queries see less than 10% improvement and others see similar amount of degradation. 

The comparison of hash table sizes is as follows: "MB" stands for master branch and "LF" stands for load factor branch. The hash table sizes are in megabytes.
----
Query2
MB:  915.528, 1220.7, 0.00133514, 45.7766
LF:  915.528, 1220.7, 0.00133514, 45.7766

Query3
MB:  6866.46
LF:  6866.46

Query5
MB:  686.646, 6866.46, 0.00133514, 53.406
LF:  572.205, 6866.46, 0.00133514, 53.406

Query7
MB:  0.00133514, 45.7766, 6866.46, 0.00133514, 686.646
LF:  0.00133514, 45.7766, 5722.05, 0.00133514, 686.646

Query8
MB:  6866.46, 0.00133514, 45.7766
LF:  6866.46, 0.00122833, 45.7766

Query9
MB:  0.00133514, 45.7766, 4272.46, 6866.46
LF:  0.00122833, 45.7766, 4272.46, 5722.05

Query10
MB:  0.00133514, 686.646, 6866.46, 686.646
LF:  0.00122833, 686.646, 6866.46, 572.205

Query12
MB:  6866.46
LF:  5722.05

Query14
MB:  915.528
LF:  762.94

Query15
MB:  0.000167847, 45.7766
LF:  0.000167847, 38.1472

Query16
MB:  915.528
LF:  915.528

Query17
MB:  2746.76
LF:  2746.76

Query18
MB:  686.646, 6866.46
LF:  572.205, 6866.46

Query19
MB:  915.528
LF:  762.94

Query20
MB:  4272.46
LF:  4272.46

Query21
MB:  27467.6, 45.7766, 27467.6
LF:  27467.6, 45.7766, 22889.6
----

There were some gaps in my understanding of the load factor. Turns out the memory savings due to change in load factor isn't huge, hence the gains are not substantial.  I am going to revisit this issue later. If someone has more ideas, please feel free to chip in. 

> Don't use hash table load factor when exact estimate is known
> -------------------------------------------------------------
>
>                 Key: QUICKSTEP-85
>                 URL: https://issues.apache.org/jira/browse/QUICKSTEP-85
>             Project: Apache Quickstep
>          Issue Type: Improvement
>          Components: Storage
>            Reporter: Harshad Deshmukh
>            Assignee: Harshad Deshmukh
>
> By default the join hash tables use a load factor of 0.5, which means we allocate double the amount of estimated space required by the hash table. In cases when the estimate is accurate (either the stats are perfect or the hash table is being built on a stored table) we should not use this load factor, thereby saving space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)