You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Colin Ma (JIRA)" <ji...@apache.org> on 2017/10/23 06:43:00 UTC

[jira] [Commented] (HIVE-16198) Vectorize GenericUDFIndex for ARRAY

    [ https://issues.apache.org/jira/browse/HIVE-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214720#comment-16214720 ] 

Colin Ma commented on HIVE-16198:
---------------------------------

hi, [~teddy.choi], [~mmccline], because of the problem HIVE-17133, I rebased the patch based on HIVE-2.3.0 with some minor changes. To evaluate the performance improvement, the following table is used:
{code}
hive> describe temperature_orc_5g;
           t_date                      string                                        
           city                            string                                        
           temperatures        array<double>
hive> show tblproperties temperature_orc_5g;
           COLUMN_STATS_ACCURATE           {"BASIC_STATS":"true"}
           numFiles   20
           numRows 100000000
           rawDataSize           24100000000
           totalSize   1793960785
{code}
Tested by HIVE on Spark, with the sql {color:#59afe1}select city, avg(temperatures\[0\]), avg(temperatures\[5\]) from temperature_orc_5g where temperatures\[2\] > 20 group by city limit 10{color}, the following are the result:
|| ||Disable vectorization||Enable vectorization||
|execution time|{color:#d04437}34s{color}|{color:#14892c}26s{color}|
Specifically, the detail time cost for the same task which will process 15154763 rows as follow table:
|| ||Disable vectorization||Enable vectorization||
|Time with RecorderReader|{color:#d04437}8.9s{color}|{color:#14892c}5.9s{color}|
|Time with filter operator|{color:#d04437}3.1s{color}|{color:#14892c}0.1s{color}|
|Time with groupBy and followup operators|10.8s|11.5s|
I think the improvement is obviously, do you know why the patch isn't committed until now, thanks.

> Vectorize GenericUDFIndex for ARRAY
> -----------------------------------
>
>                 Key: HIVE-16198
>                 URL: https://issues.apache.org/jira/browse/HIVE-16198
>             Project: Hive
>          Issue Type: Sub-task
>          Components: UDF, Vectorization
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>         Attachments: HIVE-16198.1.patch, HIVE-16198.2.patch, HIVE-16198.3.patch
>
>
> Vectorize GenericUDFIndex for array data type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)