You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2021/03/18 22:14:00 UTC
[jira] [Created] (IMPALA-10595) Consider using a batched DuplicateNode for the hash table

Joe McDonnell created IMPALA-10595:
--------------------------------------

             Summary: Consider using a batched DuplicateNode for the hash table
                 Key: IMPALA-10595
                 URL: https://issues.apache.org/jira/browse/IMPALA-10595
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 4.0
            Reporter: Joe McDonnell


hash-table.h's DuplicateNode structure is currently 24 bytes:

 
{noformat}
  /// Linked list of entries used for duplicates.
  struct DuplicateNode {
    /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
    /// row in the DuplicateNode has been matched.
    /// From an abstraction point of view, this is an awkward place to store this
    /// information.
    /// TODO: Fold this flag in the next pointer below.
    bool matched;

    /// padding to an 8 byte boundary

    /// Chain to next duplicate node, NULL when end of list.
    DuplicateNode* next;
    HtData htdata;
  };{noformat}
The matched boolean occupies 8 bytes due to the alignment of next and htdata. Apart from the space usage, this linked list is also not very cache friendly during iteration. Multiple DuplicateNodes are allocated from a single chunk of memory, so it is possible that there is some locality by happenstance, but there are no guarantees. One way to make this more cache friendly is to batch multiple duplicates into a single structure. For example, consider this structure:

 
{noformat}
  /// Linked list of entries used for duplicates.
  struct DuplicateNode {
    /// Used for full outer and right {outer, anti, semi} joins. Indicates whether the
    /// row in the DuplicateNode has been matched.
    /// From an abstraction point of view, this is an awkward place to store this
    /// information.
    /// TODO: Fold this flag in the next pointer below.
    bool matched[BATCH_SIZE];

    HtData htdata[BATCH_SIZE];
    /// Chain to next duplicate node, NULL when end of list.
    DuplicateNode* next;
  };{noformat}
This amortizes the matched boolean and the next pointer over multiple data pointers, reducing the overhead. If BATCH_SIZE exceeded 8, the matched could be packed further as individual bits on a single field if needed.

BATCH_SIZE=2 => 32 bytes

BATCH_SIZE=3 => 40 bytes

BATCH_SIZE=4 => 48 bytes
||Num Duplicates||BATCH_SIZE=1||BATCH_SIZE=2||BATCH_SIZE=3||BATCH_SIZE=4||
|1|0 bytes|0 bytes|0 bytes|0 bytes|
|2|48 bytes|32 bytes|40 bytes|48 bytes|
|3|72 bytes|64 bytes|40 bytes|48 bytes|
|4|96 bytes|64 bytes|80 bytes|48 bytes|
|5|120 bytes|96 bytes|80 bytes|96 bytes|
|6|144 bytes|96 bytes|80 bytes|96 bytes|
|7|168 bytes|128 bytes|120 bytes|96 bytes|
|8|192 bytes|128 bytes|120 bytes|96 bytes|
|9|216 bytes|160 bytes|120 bytes|144 bytes|
|10|240 bytes|160 bytes|160 bytes|144 bytes|

It seems like BATCH_SIZE=3 or 4 could make the duplicates code more cache friendly without regressing memory usage or requiring any complicated logic. BATCH_SIZE=6 corresponds to 64 bytes and a single cache line.

Of course, there are also ways to make the logic more complicated (e.g. use a certain batch size for the first two or three and then switch to larger ones).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org