You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Rui Li (JIRA)" <ji...@apache.org> on 2015/06/29 15:37:04 UTC

[jira] [Updated] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]

     [ https://issues.apache.org/jira/browse/HIVE-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rui Li updated HIVE-11108:
--------------------------
    Attachment: HIVE-11108.1-spark.patch

The patch enables vectorization for SparkHashTableSinkOperator.
Did some local tests. The end to end performance gain is not very obvious, as HTS usually processes the small tables. But for the specific stage, performance can be improved by about 2X in some cases, e.g. the work is computing min/max.

> HashTableSinkOperator doesn't support vectorization [Spark Branch]
> ------------------------------------------------------------------
>
>                 Key: HIVE-11108
>                 URL: https://issues.apache.org/jira/browse/HIVE-11108
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-11108.1-spark.patch
>
>
> This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks.
> We should verify if it makes sense to make HTS support vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)