You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Stamatis Zampetakis (JIRA)" <ji...@apache.org> on 2019/07/31 22:50:00 UTC

[jira] [Comment Edited] (CALCITE-3221) Add a sort-merge union algorithm

    [ https://issues.apache.org/jira/browse/CALCITE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897439#comment-16897439 ] 

Stamatis Zampetakis edited comment on CALCITE-3221 at 7/31/19 10:49 PM:
------------------------------------------------------------------------

I thought of a couple of generalizations. (Feel free to disregard.)

First, allow this operator to take more than 2 inputs.

Second, allow it to compute aggregate functions, and allow it to take as few as 1 input. With aggregate functions and 1 input, this becomes the sort-based aggregate algorithm we've wanted forever (see CALCITE-853). With aggregate functions and 2 or more inputs, it becomes a combined UNION ALL and GROUP BY (Union all=true followed by Aggregate).


was (Author: julianhyde):
I thought of a couple of generalizations. (Feel free to disregard.)

First, allow this operator to take more than 2 inputs.

Second, allow it to compute aggregate functions, and allow it to take as few as 1 input. With aggregate functions and 1 input, this becomes the sort-based aggregate algorithm we've wanted forever (see CALCITE-8530). With aggregate functions and 2 or more inputs, it becomes a combined UNION ALL and GROUP BY (Union all=true followed by Aggregate).

> Add a sort-merge union algorithm
> --------------------------------
>
>                 Key: CALCITE-3221
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3221
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Minor
>
> Currently, the union operation offered by Calcite is based on a {{HashSet}} (see [EnumerableDefaults.union|https://github.com/apache/calcite/blob/d98856bf1a5f5c151d004b769e14bdd368a67234/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L2747]) and necessitates reading in memory all rows before returning a single result.   
> Apart from increased memory consumption the operator is blocking and also destroys the order of its inputs.  
> The goal of this issue is to add a new union algorithm (EnumerableMergeUnion ?) exploiting the fact that the inputs are sorted which consumes less memory and retains the order of its inputs.   
> Most likely the implementation of the merge join can be useful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)