You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Colin Ma (JIRA)" <ji...@apache.org> on 2017/10/23 06:43:00 UTC
[jira] [Commented] (HIVE-16198) Vectorize GenericUDFIndex for ARRAY
[ https://issues.apache.org/jira/browse/HIVE-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214720#comment-16214720 ]
Colin Ma commented on HIVE-16198:
---------------------------------
hi, [~teddy.choi], [~mmccline], because of the problem HIVE-17133, I rebased the patch based on HIVE-2.3.0 with some minor changes. To evaluate the performance improvement, the following table is used:
{code}
hive> describe temperature_orc_5g;
t_date string
city string
temperatures array<double>
hive> show tblproperties temperature_orc_5g;
COLUMN_STATS_ACCURATE {"BASIC_STATS":"true"}
numFiles 20
numRows 100000000
rawDataSize 24100000000
totalSize 1793960785
{code}
Tested by HIVE on Spark, with the sql {color:#59afe1}select city, avg(temperatures\[0\]), avg(temperatures\[5\]) from temperature_orc_5g where temperatures\[2\] > 20 group by city limit 10{color}, the following are the result:
|| ||Disable vectorization||Enable vectorization||
|execution time|{color:#d04437}34s{color}|{color:#14892c}26s{color}|
Specifically, the detail time cost for the same task which will process 15154763 rows as follow table:
|| ||Disable vectorization||Enable vectorization||
|Time with RecorderReader|{color:#d04437}8.9s{color}|{color:#14892c}5.9s{color}|
|Time with filter operator|{color:#d04437}3.1s{color}|{color:#14892c}0.1s{color}|
|Time with groupBy and followup operators|10.8s|11.5s|
I think the improvement is obviously, do you know why the patch isn't committed until now, thanks.
> Vectorize GenericUDFIndex for ARRAY
> -----------------------------------
>
> Key: HIVE-16198
> URL: https://issues.apache.org/jira/browse/HIVE-16198
> Project: Hive
> Issue Type: Sub-task
> Components: UDF, Vectorization
> Reporter: Teddy Choi
> Assignee: Teddy Choi
> Attachments: HIVE-16198.1.patch, HIVE-16198.2.patch, HIVE-16198.3.patch
>
>
> Vectorize GenericUDFIndex for array data type.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)