You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mustafa İman (Jira)" <ji...@apache.org> on 2020/12/09 01:41:00 UTC

[jira] [Created] (HIVE-24510) Vectorize compute_bit_vector

Mustafa İman created HIVE-24510:
-----------------------------------

             Summary: Vectorize compute_bit_vector
                 Key: HIVE-24510
                 URL: https://issues.apache.org/jira/browse/HIVE-24510
             Project: Hive
          Issue Type: Improvement
            Reporter: Mustafa İman
            Assignee: Mustafa İman


After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute stats functions are vectorizable. Only function that is not vectorizable is "compute_bit_vector" for ndv statistics computation. This causes "create table as select" and "insert overwrite select" queries to run in non-vectorized mode. 

Even a very naive implementation of vectorized compute_bit_vector gives about 50% performance improvement on simple "insert overwrite select" queries. That is because entire mapper or reducer can run in vectorized mode.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)