You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Jimmy Xiang (JIRA)" <ji...@apache.org> on 2015/02/02 01:35:34 UTC

[jira] [Commented] (HIVE-9492) Enable caching in MapInput for Spark

    [ https://issues.apache.org/jira/browse/HIVE-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300798#comment-14300798 ] 

Jimmy Xiang commented on HIVE-9492:
-----------------------------------

Has some problem to put it on RB now, will try again later. As to the configuration parameter, it is for the purpose to disable the caching to avoid the overhead if it doesn't help.

> Enable caching in MapInput for Spark
> ------------------------------------
>
>                 Key: HIVE-9492
>                 URL: https://issues.apache.org/jira/browse/HIVE-9492
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Jimmy Xiang
>             Fix For: spark-branch
>
>         Attachments: HIVE-9492.1-spark.patch, HIVE-9492.2-spark.patch, prototype.patch
>
>
> Because of the IOContext problem (HIVE-8920, HIVE-9084), RDD caching is currently disabled in MapInput. Prototyping shows that the problem can solved. Thus, we should formalize the prototype and enable the caching. A good query to test this is:
> {code}
> from (select * from dec union all select * from dec2) s
> insert overwrite table dec3 select s.name, sum(s.value) group by s.name
> insert overwrite table dec4 select s.name, s.value order by s.value;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)