You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2018/01/12 04:10:06 UTC

[jira] [Commented] (CALCITE-1317) Rewrite "MAX" as "ORDER BY ... LIMIT ..."

    [ https://issues.apache.org/jira/browse/CALCITE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323515#comment-16323515 ] 

Julian Hyde commented on CALCITE-1317:
--------------------------------------

This is similar to the paper [WinMagic: Subquery Elimination Using Window Aggregation (Zuarte et al, 2003)|https://pdfs.semanticscholar.org/0bfa/e505ad588d00d4b204acf8ba4b5646eac244.pdf], which converts MAX in a correlated sub-query into OVER. For example it rewrites

{code}SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM   tpcd.lineitem, tpcd.part
WHERE p_partkey = l_partkey AND
      p_brand = 'Brand#23' AND
      p_container = 'MED BOX' AND
      l_quantity<(SELECT 0.2*avg(l_quantity)
                  FROM tpcd.lineitem
                  WHERE l_partkey = p_partkey);{code} to {code}WITH WinMagic AS
  (SELECT l_extendedprice, l_quantity,
    avg(l_quantity)over(partition by p_partkey)
              AS avg_l_quantity
   FROM  tpcd.lineitem, tpcd.part
   WHERE p_partkey = l_partkey and
         p_brand = 'Brand#23' and
         p_container =  'MED BOX' )
 SELECT SUM(l_extendedprice) / 7.0 as avg_yearly
 FROM WinMagic
 WHERE l_quantity < 0.2 * avg_l_quantity;{code}

> Rewrite "MAX" as "ORDER BY ... LIMIT ..."
> -----------------------------------------
>
>                 Key: CALCITE-1317
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1317
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> We could optimize 
> {code}
> SELECT *
> FROM emp
> WHERE empno = (SELECT max(empno) FROM emp)
> {code}
> to
> {code}
> SELECT *
> FROM emp
> ORDER BY empno DESC LIMIT 1
> {code}
> (using the fact that {{empno}} is unique and non-NULL). Similarly, we can rewrite
> {code}
> SELECT max(sal)
> FROM emp
> {code}
> to
> {code}
> SELECT sal
> FROM emp
> ORDER BY sal DESC LIMIT 1
> {code}
> (not making any assumptions about whether {{sal}} is unique or allows NULL values) and we can rewrite a query to find the highest paid employee(s) in each department
> {code}
> SELECT *
> FROM emp AS e
> WHERE sal = (
>   SELECT max(sal)
>   FROM emp AS e2
>   WHERE e2.deptno = e.deptno)
> {code}
> as
> {code}
> SELECT deptno, empno, sal
> FROM (
>   SELECT deptno, empno, sal, FIRST_VALUE(sal) OVER w AS topSal
>   FROM emp
>   WINDOW w AS (PARTITION BY deptno ORDER BY sal DESC))
> WHERE sal = topSal
> {code}
> We might benefit from a generalized {{Sort(limit)}} operator that can find the top N within any prefix of the sort key, not just the top N overall.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)