You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2014/09/11 20:23:33 UTC

[jira] [Created] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

Xuefu Zhang created HIVE-8054:
---------------------------------

             Summary: Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]
                 Key: HIVE-8054
                 URL: https://issues.apache.org/jira/browse/HIVE-8054
             Project: Hive
          Issue Type: Improvement
          Components: Spark
            Reporter: Xuefu Zhang


Option hive.optimize.union.remove introduced in HIVE-3276 removes union operators from the operator graph in certain cases as an optimization reduce the number of MR jobs. While making sense in MR, this optimization is actually harmful to an execution engine such as Spark, which natives supports union without requiring additional jobs. This is because removing union operator creates disjointed operator graphs, each graph generating a job, and thus this optimization requires more jobs to run the query. Not to mention the additional complexity handling linked FS descriptors.

I propose that we disable such optimization when the execution engine is Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)