You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2018/01/12 04:10:06 UTC
[jira] [Commented] (CALCITE-1317) Rewrite "MAX" as "ORDER BY ...
LIMIT ..."
[ https://issues.apache.org/jira/browse/CALCITE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323515#comment-16323515 ]
Julian Hyde commented on CALCITE-1317:
--------------------------------------
This is similar to the paper [WinMagic: Subquery Elimination Using Window Aggregation (Zuarte et al, 2003)|https://pdfs.semanticscholar.org/0bfa/e505ad588d00d4b204acf8ba4b5646eac244.pdf], which converts MAX in a correlated sub-query into OVER. For example it rewrites
{code}SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM tpcd.lineitem, tpcd.part
WHERE p_partkey = l_partkey AND
p_brand = 'Brand#23' AND
p_container = 'MED BOX' AND
l_quantity<(SELECT 0.2*avg(l_quantity)
FROM tpcd.lineitem
WHERE l_partkey = p_partkey);{code} to {code}WITH WinMagic AS
(SELECT l_extendedprice, l_quantity,
avg(l_quantity)over(partition by p_partkey)
AS avg_l_quantity
FROM tpcd.lineitem, tpcd.part
WHERE p_partkey = l_partkey and
p_brand = 'Brand#23' and
p_container = 'MED BOX' )
SELECT SUM(l_extendedprice) / 7.0 as avg_yearly
FROM WinMagic
WHERE l_quantity < 0.2 * avg_l_quantity;{code}
> Rewrite "MAX" as "ORDER BY ... LIMIT ..."
> -----------------------------------------
>
> Key: CALCITE-1317
> URL: https://issues.apache.org/jira/browse/CALCITE-1317
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Julian Hyde
>
> We could optimize
> {code}
> SELECT *
> FROM emp
> WHERE empno = (SELECT max(empno) FROM emp)
> {code}
> to
> {code}
> SELECT *
> FROM emp
> ORDER BY empno DESC LIMIT 1
> {code}
> (using the fact that {{empno}} is unique and non-NULL). Similarly, we can rewrite
> {code}
> SELECT max(sal)
> FROM emp
> {code}
> to
> {code}
> SELECT sal
> FROM emp
> ORDER BY sal DESC LIMIT 1
> {code}
> (not making any assumptions about whether {{sal}} is unique or allows NULL values) and we can rewrite a query to find the highest paid employee(s) in each department
> {code}
> SELECT *
> FROM emp AS e
> WHERE sal = (
> SELECT max(sal)
> FROM emp AS e2
> WHERE e2.deptno = e.deptno)
> {code}
> as
> {code}
> SELECT deptno, empno, sal
> FROM (
> SELECT deptno, empno, sal, FIRST_VALUE(sal) OVER w AS topSal
> FROM emp
> WINDOW w AS (PARTITION BY deptno ORDER BY sal DESC))
> WHERE sal = topSal
> {code}
> We might benefit from a generalized {{Sort(limit)}} operator that can find the top N within any prefix of the sort key, not just the top N overall.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)