You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/11/04 09:45:00 UTC
[jira] [Updated] (HIVE-24354) ColumnVector should declare abstract
convenience methods for getting values
[ https://issues.apache.org/jira/browse/HIVE-24354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-24354:
--------------------------------
Description:
While writing HIVE-24245 I found that ColumnVector doesn't have any methods for getting a value from the vector, like:
{code}
ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
ColumnVector.getHash(n) // get the murmur hash for the nth element
{code}
Because of this, I ended up writing different vectorized UDAFs for different data types, and the only difference was a single line which was about obtaining a value from the vector. In the current vector expressions I can see a pattern where we copy the whole expression with an abstract logic and the loops (this is something I was thinking about in the scope of HIVE-21465 already), but I don't like that way. When I create an abstract vectorized udaf, and extend it for certain data types, I'm already allowed to bring in the overhead of function calls for every single value, but I don't think I violate basic vectorization principles, as we have vectors, so e.g. the object inspection overhead is already eliminated.
I propose some convenience methods like above, which can define a strict contract about how to retrieve data from a ColumnVector, I mean the nth elment of the vector in particular.
> ColumnVector should declare abstract convenience methods for getting values
> ---------------------------------------------------------------------------
>
> Key: HIVE-24354
> URL: https://issues.apache.org/jira/browse/HIVE-24354
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
>
> While writing HIVE-24245 I found that ColumnVector doesn't have any methods for getting a value from the vector, like:
> {code}
> ColumnVector.getValue(n) // get nth element...mutable?, not mutable? copy?
> ColumnVector.getHash(n) // get the murmur hash for the nth element
> {code}
> Because of this, I ended up writing different vectorized UDAFs for different data types, and the only difference was a single line which was about obtaining a value from the vector. In the current vector expressions I can see a pattern where we copy the whole expression with an abstract logic and the loops (this is something I was thinking about in the scope of HIVE-21465 already), but I don't like that way. When I create an abstract vectorized udaf, and extend it for certain data types, I'm already allowed to bring in the overhead of function calls for every single value, but I don't think I violate basic vectorization principles, as we have vectors, so e.g. the object inspection overhead is already eliminated.
> I propose some convenience methods like above, which can define a strict contract about how to retrieve data from a ColumnVector, I mean the nth elment of the vector in particular.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)