You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hivemall.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2017/01/26 09:32:24 UTC

[jira] [Updated] (HIVEMALL-20) Improve the performance of Hive integration in Spark

     [ https://issues.apache.org/jira/browse/HIVEMALL-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takeshi Yamamuro updated HIVEMALL-20:
-------------------------------------
    Labels: Spark  (was: )

> Improve the performance of Hive integration in Spark
> ----------------------------------------------------
>
>                 Key: HIVEMALL-20
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-20
>             Project: Hivemall
>          Issue Type: Improvement
>            Reporter: Takeshi Yamamuro
>            Assignee: Takeshi Yamamuro
>              Labels: Spark
>
> Most of Hivemall functions depend on Hive interfaces (UDF, GenericUDF, GenericUDTF, ...), but Spark currently has overheads to call these interfaces (https://github.com/myui/hivemall/blob/master/spark/spark-2.0/src/test/scala/org/apache/spark/sql/hive/benchmark/MiscBenchmark.scala). Therefore, some functions such as sigmoid and each_top_k have been re-implemented as native Spark functionality. This re-implementation seems to worsen maintainability, so we'd better off improving the overheads in Spark. This ticket is to track all related the activities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)