You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2017/01/04 15:44:58 UTC

[jira] [Created] (HIVE-15539) Optimize complex multi-insert queries in Calcite

Jesus Camacho Rodriguez created HIVE-15539:
----------------------------------------------

             Summary: Optimize complex multi-insert queries in Calcite
                 Key: HIVE-15539
                 URL: https://issues.apache.org/jira/browse/HIVE-15539
             Project: Hive
          Issue Type: Improvement
          Components: Parser
    Affects Versions: 2.2.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


Currently multi-insert queries are not optimized by Calcite. Proper integration with Calcite would include creating a _spool_ operator whose output is reused by every _insert_ statement; however, _spool_ operator has not been added to Calcite yet (CALCITE-481).

In the meantime, and since complex logic for multi-insert queries is in FROM clause, we can optimize the FROM clause with Calcite and connect the optimized result to the original query.

Initially, we will recognize three different cases:
- FROM clause is trivial, e.g., table reference, or not supported. No need to optimize with Calcite.
- FROM clause is a subquery. Optimize the subquery with Calcite.
- FROM clause is a join. Rewrite join into a subquery and optimize it with Calcite. Change references in INSERT statements to refer to subquery columns.

This should be beneficial for MERGE statements processing too, since MERGE statements are treated as multi-insert queries by Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)