You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2019/01/05 04:49:59 UTC

[GitHub] Ben-Zvi opened a new pull request #1598: DRILL-6880: For Hash-Join hash-table build - treat null keys as an equal match

Ben-Zvi opened a new pull request #1598: DRILL-6880: For Hash-Join hash-table build - treat null keys as an equal match 
URL: https://github.com/apache/drill/pull/1598
 
 
   
      When a new key is inserted into the hash-table's, a matching bucket is found (i.e., from its hash value modulo the table's size), but that bucket may contain an old key(s). A generated code method ("isKeyMatchInternalBuild") is used to compare and find if the two keys are identical.
   
   Previously a special check in that method addressed the case where (any pair of matching) columns where both null; returning "no match" if so.  In such a case, the hash-join would have entered the new key as a new entry at that bucket, leading to a growing useless chain that could end up in O(n^2) work (as each new key checks against all old ones).
   
     *The fix*: Generate a new method - "areBothKeysNull" - to perform that special check prior to the actual key match; used only for hash-join during build time.  Then this method returns the **inverse logic** - return a **"match"** when any two matching key-columns are both null.  This would avoid the useless chain in the hash-table. (A linked list would be created in the Hash-Join Helper, but that's an O(1) work that would be too complex/inefficient to avoid).
   
      Also note that the case of the keys like: <1,null>, <2,null>, <3,null>, <4,null>, ...... would not be addressed by the above change, as most of these keys would end up in different buckets, but that is still O(1) work, and no long chain.
   
     *The implementation*: Customize the internal method _setupIsKeyMatchInternal_ to also generate the new method _areBothKeysNull_ . The other methods it generates should not change from before, except _isKeyMatchInternalBuild_ where the special "both null" check is now eliminated.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services