You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/08/27 13:42:00 UTC

[jira] [Commented] (IMPALA-7635) Reduce size of hash tables in-memory by packing buckets more densely

    [ https://issues.apache.org/jira/browse/IMPALA-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405834#comment-17405834 ] 

ASF subversion and git services commented on IMPALA-7635:
---------------------------------------------------------

Commit 2040b2621f73de4dfaba80dd938221758952183f in impala's branch refs/heads/master from Amogh Margoor
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2040b26 ]

IMPALA-7635: Reducing HashTable size by packing it's buckets efficiently.

HashTable implementation in Impala comprises of contiguous array
of Buckets and each Bucket contains either data or pointer to
linked list of duplicate entries named DuplicateNode.
These are the structures of Bucket and DuplicateNode:

  struct DuplicateNode {
    bool matched;
    DuplicateNode* next;
    HtData htdata;
  };

  struct Bucket {
    bool filled;
    bool matched;
    bool hasDuplicates;
    uint32_t hash;
    union {
      HtData htdata;
      DuplicateNode* duplicates;
    } bucketData;
  };

Size of Bucket is currently 16 bytes and size of DuplicateNode is
24 bytes. If we can remove the booleans from both struct size of
Bucket would reduce to 12 bytes and DuplicateNode will be 16 bytes.
One of the ways we can remove booleans is to fold it into pointers
already part of struct. Pointers store addresses and on
architectures like x86 and ARM the linear address is only 48 bits
long. With level 5 paging Intel is planning to expand it to 57-bit
long which means we can use most significant 7 bits i.e., 58 to 64
bits to store these booleans. This patch reduces the size of Bucket
and DuplicateNode by implementing this folding. However, there is
another requirement regarding Size of Bucket to be power of 2 and
also for the number of buckets in Hash table to be power of 2.
These requirements are for the following reasons:
1. Memory Allocator allocates memory in power of 2 to avoid
   internal fragmentation. Hence, num of buckets * sizeof(Buckets)
   should be power of 2.
2. Number of buckets being power of 2 enables faster modulo
   operation i.e., instead of slow modulo: (hash % N), faster
   (hash & (N-1)) can be used.

Due to this, 4 bytes 'hash' field from Bucket is removed and
stored separately in new array hash_array_ in HashTable.
This ensures sizeof(Bucket) is 8 which is power of 2.

New Classes:
------------
As a part of patch, TaggedPointer is introduced which is a template
class to store a pointer and 7-bit tag together in 64 bit integer.
This structure contains the ownership of the pointer and will take care
of allocation and deallocation of the object being pointed to.
However derived classes can opt out of the ownership of the object
and let the client manage it. It's derived classes for Bucket and
DuplicateNode do the same. These classes are TaggedBucketData and
TaggedDuplicateNode.

Benchmark:
----------
As a part of this patch a new Micro Benchmark for HashTable has
been introduced, which will help in measuring these:
1. Runtime for building hash table and probing it.
2. Memory consumed after building the Table.
This would help measuring the impact of changes to the HashTable's
data structure and algorithm.
Saw 25-30% reduction in memory consumed and no significant
difference in performance (0.91X-1.2X).

Other Benchmarks:
1. Billion row Synthetic benchmark on single node, single daemon:
   a. 2-3% improvement in Join GEOMEAN for Probe benchmark.
   b. 17% and 21% reduction in PeakMemoryUsage and
      CumulativeBytes allocated respectively
2. TPCH-42: 0-1.5% improvement in GEOMEAN runtime

Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
Reviewed-on: http://gerrit.cloudera.org:8080/17592
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Reduce size of hash tables in-memory by packing buckets more densely
> --------------------------------------------------------------------
>
>                 Key: IMPALA-7635
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7635
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Tim Armstrong
>            Assignee: Amogh Margoor
>            Priority: Major
>              Labels: perf
>
> Currently the hash tables used for hash join and aggregation use 16 bytes per bucket and 24 bytes per additional duplicate for a key:
> {code}
>   /// Linked list of entries used for duplicates.
>   struct DuplicateNode {
>     /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
>     /// row in the DuplicateNode has been matched.
>     /// From an abstraction point of view, this is an awkward place to store this
>     /// information.
>     /// TODO: Fold this flag in the next pointer below.
>     bool matched;
>     /// Chain to next duplicate node, NULL when end of list.
>     DuplicateNode* next;
>     HtData htdata;
>   };
>   struct Bucket {
>     /// Whether this bucket contains a vaild entry, or it is empty.
>     bool filled;
>     /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
>     /// row in the bucket has been matched.
>     /// From an abstraction point of view, this is an awkward place to store this
>     /// information but it is efficient. This space is otherwise unused.
>     bool matched;
>     /// Used in case of duplicates. If true, then the bucketData union should be used as
>     /// 'duplicates'.
>     bool hasDuplicates;
>     /// Cache of the hash for data.
>     /// TODO: Do we even have to cache the hash value?
>     uint32_t hash;
>     /// Either the data for this bucket or the linked list of duplicates.
>     union {
>       HtData htdata;
>       DuplicateNode* duplicates;
>     } bucketData;
>   };
> {code}
> There are some comments in the code that suggest folding the boolean values into the upper bits of the pointers (since on amd64 the address space is only 48 bits, but moving to 57 bits apparently - see https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf). That would reduce the bucket to 12 bytes of actual data.
> This would give us the opportunity to reduce memory requirements of joins and the pressure on caches significantly, provided we can work out the implementation issues and the cost of the bit manipulation doesn't exceed the benefit (my intuition is that cache effects are way more important but I could be wrong).
> Here's a rough idea of what we could do:
> # Implement folding of booleans into the pointer and mark struct Bucket as packed so that it doesn't just undo the work with additional padding.
> # Modifying Hashtable to work with the new bucket structure. This needs a little thought since the bucket allocations must be a power-of-two size in bytes, but we also need the hash table entries to be a power-of-two in order for masking the hash to get the bucket number to work. I think either we could just leave wasted space in the buffer or switch to a non-power-of-two number of buckets and using an alternative method of getting the bucket from the hash: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
> # Run benchmarks to see if it's beneficial. The effect probably depends on the data set size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org