You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Haisheng Yuan (JIRA)" <ji...@apache.org> on 2019/07/08 09:47:00 UTC
[jira] [Updated] (CALCITE-3181) Support limit per group in Window
[ https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haisheng Yuan updated CALCITE-3181:
-----------------------------------
Description:
We have a lot of queries like the following to retrieve top N tuples per group:
{code:java}
SELECT x, y FROM
(SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)
AS rn FROM t1) t2 WHERE rn <= 3;
{code}
The performance is not good if each group has a lot more tuples than wanted, because we will retrieve and sort all the tuples, instead of just doing a top-N heap sort.
In order to do optimization for this kind of query, we need to extend window to support limit, if and only if there is only 1 window function, and it is {{row_number()}}. We also need a substitute rule to push the limit into window. Of course, we also need to modify executor to support this optimization (can be later).
{code:java}
Filter (rn <= 3)
+- Window (window#0={Partition by x order by y ROW_NUMBER()})
{code}
to
{code:java}
Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
{code}
Thoughts? Objections?
was:
We have a lot of queries like the following to retrieve top N tuples per group:
{code:java}
SELECT x, y FROM
(SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)
AS rn FROM t1) t2 WHERE rn <= 3;
{code}
The performance is not good if each group has a lot more tuples than wanted, because we will retrieve and sort all the tuples, instead of just doing a top-N heap sort.
In order to do optimization for this kind of query, we need to extend window to support limit, if and only if there is only 1 window function, and it is {{row_number()}}. We also need a substitute rule to push the limit into window. Of course, we also need to modify executor to support this optimization (can be later).
Thoughts? Objections?
> Support limit per group in Window
> ---------------------------------
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Haisheng Yuan
> Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per group:
> {code:java}
> SELECT x, y FROM
> (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)
> AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted, because we will retrieve and sort all the tuples, instead of just doing a top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window to support limit, if and only if there is only 1 window function, and it is {{row_number()}}. We also need a substitute rule to push the limit into window. Of course, we also need to modify executor to support this optimization (can be later).
> {code:java}
> Filter (rn <= 3)
> +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)