You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2016/02/17 15:50:18 UTC

[jira] [Updated] (HIVE-8208) Multi-table insertion optimization #1: don't always break operator tree. [Spark Branch]

     [ https://issues.apache.org/jira/browse/HIVE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xuefu Zhang updated HIVE-8208:
------------------------------
    Fix Version/s:     (was: spark-branch)

> Multi-table insertion optimization #1: don't always break operator tree. [Spark Branch]
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-8208
>                 URL: https://issues.apache.org/jira/browse/HIVE-8208
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Chao Sun
>
> Currently, with the current patch of multi-table insertion, it will break whenever there exists one TableScanOperator that can leads to multiple FileSinkOperators. Then, it identifies the lowest common ancestor (LCA), and breaks the tree there, creating same number of child SparkTasks as the number of FileSinkOperators.
> However, in the following situation it's better not to break the operator tree:
> Of all the paths from these FileSinkOperators to the LCA, if ReduceSinkOperator only exist in 0 or 1 path of them.
> In this case, we can do it in one spark job, and no need to break the operator tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)