You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Krishnakant Agrawal <kk...@gmail.com> on 2018/12/26 10:44:55 UTC

DISTINCT not being handled correctly in RelToSqlConverter

Hi All,

When creating a RelNode for a Query with a DISTINCT keyword in it, I use
the relBuilder.distinct() api to apply distinct.
It creates a LogicalAggregate with all the Fields of the LogicalProject as
the GroupKey.

This is a problem when one of those projections is a windowing function
(i.e SUM(col1) over (partition by  col2) ).

The group by key now contains an aggregate function which is wrong.

The output is something like( SqlNode.toSqlString() ):-
SELECT sum(col1) OVER (partition by  col2) from t1 group by sum(col1) over
(partition by  col2).

I have a fix ready for this.
Basically, sub-querying the projection containing the aggregate function
and apply the Group By(due to DISTINCT) outside the sub-query.

Please let me know if this is an actual bug or my assumptions are wrong.

Thanks,
KrishnaKant

Re: DISTINCT not being handled correctly in RelToSqlConverter

Posted by Krishnakant Agrawal <kk...@gmail.com>.
Hi Julian,

You voiced my thoughts exactly.
I tried on several DB's too. Nowhere is an Agg function allowed in Group By
clause.
Probably, we need to Check for Agg function in Group By keys and set
needNew as true, if needed.
That code is not so straight-forward as RelNodes can get fairly complex and
the actual Expression for the Group Key can be buried several RelNodes down
the RelTree.

Logged a JIRA:- https://issues.apache.org/jira/browse/CALCITE-2757

Thanks,
KrishnaKant



On Fri, Dec 28, 2018 at 1:28 AM Julian Hyde <jh...@apache.org> wrote:

> Sounds like a bug. Please log it.
>
> Re-stating what you just said. My understanding is that
>
>   SELECT DISTINCT sum(x) OVER (PARTITION BY y) FROM t
>
> is valid (per SQL standard) but
>
>   SELECT sum(x) OVER (PARTITION BY y)
>   FROM t
>   GROUP BY sum(x) OVER (PARTITION BY y)
>
> is not. For example, given the query
>
>   select sum(deptno) over (partition by loc)
>   from dept
>   group by  sum(deptno) over (partition by loc);
>
> Oracle gives
>
>   ORA-00934: group function is not allowed here
>
> Therefore we should generate a sub-query, something like this:
>
>   SELECT c1
>   FROM (
>     SELECT sum(deptno) OVER (PARTITION BY loc)
>     FROM dept) AS t
>   GROUP BY c1;
>
> RelToSqlConverter has a mechanism to figure out whether a sub-select is
> necessary. See needNew[1]. The fix is probably in that code.
>
> Julian
>
> [1]
> https://github.com/apache/calcite/blob/9d50e6d7418579c5a73d872e6aec5924ed97c239/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1027
>
>
>
> > On Dec 26, 2018, at 2:44 AM, Krishnakant Agrawal <kk...@gmail.com>
> wrote:
> >
> > Hi All,
> >
> > When creating a RelNode for a Query with a DISTINCT keyword in it, I use
> > the relBuilder.distinct() api to apply distinct.
> > It creates a LogicalAggregate with all the Fields of the LogicalProject
> as
> > the GroupKey.
> >
> > This is a problem when one of those projections is a windowing function
> > (i.e SUM(col1) over (partition by  col2) ).
> >
> > The group by key now contains an aggregate function which is wrong.
> >
> > The output is something like( SqlNode.toSqlString() ):-
> > SELECT sum(col1) OVER (partition by  col2) from t1 group by sum(col1)
> over
> > (partition by  col2).
> >
> > I have a fix ready for this.
> > Basically, sub-querying the projection containing the aggregate function
> > and apply the Group By(due to DISTINCT) outside the sub-query.
> >
> > Please let me know if this is an actual bug or my assumptions are wrong.
> >
> > Thanks,
> > KrishnaKant
>
>

Re: DISTINCT not being handled correctly in RelToSqlConverter

Posted by Julian Hyde <jh...@apache.org>.
Sounds like a bug. Please log it.

Re-stating what you just said. My understanding is that 

  SELECT DISTINCT sum(x) OVER (PARTITION BY y) FROM t

is valid (per SQL standard) but

  SELECT sum(x) OVER (PARTITION BY y)
  FROM t
  GROUP BY sum(x) OVER (PARTITION BY y)

is not. For example, given the query

  select sum(deptno) over (partition by loc)
  from dept
  group by  sum(deptno) over (partition by loc);

Oracle gives

  ORA-00934: group function is not allowed here

Therefore we should generate a sub-query, something like this:

  SELECT c1
  FROM (
    SELECT sum(deptno) OVER (PARTITION BY loc)
    FROM dept) AS t
  GROUP BY c1;

RelToSqlConverter has a mechanism to figure out whether a sub-select is necessary. See needNew[1]. The fix is probably in that code.

Julian

[1] https://github.com/apache/calcite/blob/9d50e6d7418579c5a73d872e6aec5924ed97c239/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1027



> On Dec 26, 2018, at 2:44 AM, Krishnakant Agrawal <kk...@gmail.com> wrote:
> 
> Hi All,
> 
> When creating a RelNode for a Query with a DISTINCT keyword in it, I use
> the relBuilder.distinct() api to apply distinct.
> It creates a LogicalAggregate with all the Fields of the LogicalProject as
> the GroupKey.
> 
> This is a problem when one of those projections is a windowing function
> (i.e SUM(col1) over (partition by  col2) ).
> 
> The group by key now contains an aggregate function which is wrong.
> 
> The output is something like( SqlNode.toSqlString() ):-
> SELECT sum(col1) OVER (partition by  col2) from t1 group by sum(col1) over
> (partition by  col2).
> 
> I have a fix ready for this.
> Basically, sub-querying the projection containing the aggregate function
> and apply the Group By(due to DISTINCT) outside the sub-query.
> 
> Please let me know if this is an actual bug or my assumptions are wrong.
> 
> Thanks,
> KrishnaKant