You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2014/07/31 00:20:40 UTC

[jira] [Commented] (HIVE-7503) Support Hive's multi-table insert query with Spark

    [ https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080076#comment-14080076 ] 

Xuefu Zhang commented on HIVE-7503:
-----------------------------------

Assigned to myself for initial research.

> Support Hive's multi-table insert query with Spark
> --------------------------------------------------
>
>                 Key: HIVE-7503
>                 URL: https://issues.apache.org/jira/browse/HIVE-7503
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>
> For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert.  When we achieve this with Spark, it would be nice if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer.
> This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)