You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/10/01 22:39:00 UTC

[jira] [Work logged] (HIVE-24221) Use vectorizable expression to combine multiple columns in semijoin bloom filters

     [ https://issues.apache.org/jira/browse/HIVE-24221?focusedWorklogId=493721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493721 ]

ASF GitHub Bot logged work on HIVE-24221:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Oct/20 22:38
            Start Date: 01/Oct/20 22:38
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request #1544:
URL: https://github.com/apache/hive/pull/1544


   
   ### What changes were proposed in this pull request?
   
   Use hash(hash(hash(a,b),c),d) instead of hash(a,b,c,d) when constructing
   the multi-col semijoin reducer.
   
   ### Why are the changes needed?
   In order to use fully vectorized execution on multi-col semijoin reducers.
   
   ### Does this PR introduce _any_ user-facing change?
   Only changes in EXPLAIN plans.
   
   ### How was this patch tested?
   `mvn test -Dtest=TestTezPerfCliDriver -Dqfile="query50.q"`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 493721)
    Remaining Estimate: 0h
            Time Spent: 10m

> Use vectorizable expression to combine multiple columns in semijoin bloom filters
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-24221
>                 URL: https://issues.apache.org/jira/browse/HIVE-24221
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>         Environment: 
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, multi-column semijoin reducers use an n-ary call to GenericUDFMurmurHash to combine multiple values into one, which is used as an entry to the bloom filter. However, there are no vectorized operators that treat n-ary inputs. The same goes for the vectorized implementation of GenericUDFMurmurHash introduced in HIVE-23976. 
> The goal of this issue is to choose an alternative way to combine multiple values into one to pass in the bloom filter comprising only vectorized operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)