You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/11/04 09:45:00 UTC

[jira] [Updated] (HIVE-24354) ColumnVector should declare abstract convenience methods for getting values

     [ https://issues.apache.org/jira/browse/HIVE-24354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

László Bodor updated HIVE-24354:
--------------------------------
    Description: 
While writing HIVE-24245 I found that ColumnVector doesn't have any methods for getting a value from the vector, like:
{code}
ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
ColumnVector.getHash(n) // get the murmur hash for the nth element
{code}

Because of this, I ended up writing different vectorized UDAFs for different data types, and the only difference was a single line which was about obtaining a value from the vector. In the current vector expressions I can see a pattern where we copy the whole expression with an abstract logic and the loops (this is something I was thinking about in the scope of HIVE-21465 already), but I don't like that way. When I create an abstract vectorized udaf, and extend it for certain data types, I'm already allowed to bring in the overhead of function calls for every single value, but I don't think I violate basic vectorization principles, as we have vectors, so e.g. the object inspection overhead is already eliminated.
I propose some convenience methods like above, which can define a strict contract about how to retrieve data from a ColumnVector, I mean the nth elment of the vector in particular.


> ColumnVector should declare abstract convenience methods for getting values
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-24354
>                 URL: https://issues.apache.org/jira/browse/HIVE-24354
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> While writing HIVE-24245 I found that ColumnVector doesn't have any methods for getting a value from the vector, like:
> {code}
> ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
> ColumnVector.getHash(n) // get the murmur hash for the nth element
> {code}
> Because of this, I ended up writing different vectorized UDAFs for different data types, and the only difference was a single line which was about obtaining a value from the vector. In the current vector expressions I can see a pattern where we copy the whole expression with an abstract logic and the loops (this is something I was thinking about in the scope of HIVE-21465 already), but I don't like that way. When I create an abstract vectorized udaf, and extend it for certain data types, I'm already allowed to bring in the overhead of function calls for every single value, but I don't think I violate basic vectorization principles, as we have vectors, so e.g. the object inspection overhead is already eliminated.
> I propose some convenience methods like above, which can define a strict contract about how to retrieve data from a ColumnVector, I mean the nth elment of the vector in particular.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)