You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/02/20 07:26:44 UTC

[jira] [Commented] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

    [ https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874144#comment-15874144 ] 

Gopal V commented on HIVE-15987:
--------------------------------

-1 for Hive-2.x branch storage-api impl, we consider this for Hive-3.0 branch since this breaks external interfaces to ORC and 3rd party vectorized udfs.

> Replace ColumnVector.isNull boolean[] impl. with BitSet
> -------------------------------------------------------
>
>                 Key: HIVE-15987
>                 URL: https://issues.apache.org/jira/browse/HIVE-15987
>             Project: Hive
>          Issue Type: Improvement
>          Components: Vectorization
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>              Labels: incompatibleChange
>
> Most of data operations in Hive uses null operations. The current implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 1 boolean with a backing long array. Also logical operations between longs are much faster than ones with bytes as it uses less instructions per byte. So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. Such as simple reads will require more instructions per row. So it should include benchmark tests to show its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)