You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Stamatis Zampetakis (JIRA)" <ji...@apache.org> on 2019/07/31 15:26:00 UTC

[jira] [Created] (CALCITE-3221) Add a sort-merge union algorithm

Stamatis Zampetakis created CALCITE-3221:
--------------------------------------------

             Summary: Add a sort-merge union algorithm
                 Key: CALCITE-3221
                 URL: https://issues.apache.org/jira/browse/CALCITE-3221
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.19.0
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Currently, the union operation offered by Calcite is based on a {{HashSet}} (see [EnumerableDefaults.union|https://github.com/apache/calcite/blob/d98856bf1a5f5c151d004b769e14bdd368a67234/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L2747]) and necessitates reading in memory all rows before returning a single result.   

Apart from increased memory consumption the operator is blocking and also destroys the order of its inputs.  

The goal of this issue is to add a new union algorithm (EnumerableMergeUnion ?) exploiting the fact that the inputs are sorted which consumes less memory and retains the order of its inputs.   

Most likely the implementation of the merge join can be useful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)