You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Krishnakant Agrawal <kk...@gmail.com> on 2018/12/26 10:44:55 UTC
DISTINCT not being handled correctly in RelToSqlConverter
Hi All,
When creating a RelNode for a Query with a DISTINCT keyword in it, I use
the relBuilder.distinct() api to apply distinct.
It creates a LogicalAggregate with all the Fields of the LogicalProject as
the GroupKey.
This is a problem when one of those projections is a windowing function
(i.e SUM(col1) over (partition by col2) ).
The group by key now contains an aggregate function which is wrong.
The output is something like( SqlNode.toSqlString() ):-
SELECT sum(col1) OVER (partition by col2) from t1 group by sum(col1) over
(partition by col2).
I have a fix ready for this.
Basically, sub-querying the projection containing the aggregate function
and apply the Group By(due to DISTINCT) outside the sub-query.
Please let me know if this is an actual bug or my assumptions are wrong.
Thanks,
KrishnaKant
Re: DISTINCT not being handled correctly in RelToSqlConverter
Posted by Krishnakant Agrawal <kk...@gmail.com>.
Hi Julian,
You voiced my thoughts exactly.
I tried on several DB's too. Nowhere is an Agg function allowed in Group By
clause.
Probably, we need to Check for Agg function in Group By keys and set
needNew as true, if needed.
That code is not so straight-forward as RelNodes can get fairly complex and
the actual Expression for the Group Key can be buried several RelNodes down
the RelTree.
Logged a JIRA:- https://issues.apache.org/jira/browse/CALCITE-2757
Thanks,
KrishnaKant
On Fri, Dec 28, 2018 at 1:28 AM Julian Hyde <jh...@apache.org> wrote:
> Sounds like a bug. Please log it.
>
> Re-stating what you just said. My understanding is that
>
> SELECT DISTINCT sum(x) OVER (PARTITION BY y) FROM t
>
> is valid (per SQL standard) but
>
> SELECT sum(x) OVER (PARTITION BY y)
> FROM t
> GROUP BY sum(x) OVER (PARTITION BY y)
>
> is not. For example, given the query
>
> select sum(deptno) over (partition by loc)
> from dept
> group by sum(deptno) over (partition by loc);
>
> Oracle gives
>
> ORA-00934: group function is not allowed here
>
> Therefore we should generate a sub-query, something like this:
>
> SELECT c1
> FROM (
> SELECT sum(deptno) OVER (PARTITION BY loc)
> FROM dept) AS t
> GROUP BY c1;
>
> RelToSqlConverter has a mechanism to figure out whether a sub-select is
> necessary. See needNew[1]. The fix is probably in that code.
>
> Julian
>
> [1]
> https://github.com/apache/calcite/blob/9d50e6d7418579c5a73d872e6aec5924ed97c239/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1027
>
>
>
> > On Dec 26, 2018, at 2:44 AM, Krishnakant Agrawal <kk...@gmail.com>
> wrote:
> >
> > Hi All,
> >
> > When creating a RelNode for a Query with a DISTINCT keyword in it, I use
> > the relBuilder.distinct() api to apply distinct.
> > It creates a LogicalAggregate with all the Fields of the LogicalProject
> as
> > the GroupKey.
> >
> > This is a problem when one of those projections is a windowing function
> > (i.e SUM(col1) over (partition by col2) ).
> >
> > The group by key now contains an aggregate function which is wrong.
> >
> > The output is something like( SqlNode.toSqlString() ):-
> > SELECT sum(col1) OVER (partition by col2) from t1 group by sum(col1)
> over
> > (partition by col2).
> >
> > I have a fix ready for this.
> > Basically, sub-querying the projection containing the aggregate function
> > and apply the Group By(due to DISTINCT) outside the sub-query.
> >
> > Please let me know if this is an actual bug or my assumptions are wrong.
> >
> > Thanks,
> > KrishnaKant
>
>
Re: DISTINCT not being handled correctly in RelToSqlConverter
Posted by Julian Hyde <jh...@apache.org>.
Sounds like a bug. Please log it.
Re-stating what you just said. My understanding is that
SELECT DISTINCT sum(x) OVER (PARTITION BY y) FROM t
is valid (per SQL standard) but
SELECT sum(x) OVER (PARTITION BY y)
FROM t
GROUP BY sum(x) OVER (PARTITION BY y)
is not. For example, given the query
select sum(deptno) over (partition by loc)
from dept
group by sum(deptno) over (partition by loc);
Oracle gives
ORA-00934: group function is not allowed here
Therefore we should generate a sub-query, something like this:
SELECT c1
FROM (
SELECT sum(deptno) OVER (PARTITION BY loc)
FROM dept) AS t
GROUP BY c1;
RelToSqlConverter has a mechanism to figure out whether a sub-select is necessary. See needNew[1]. The fix is probably in that code.
Julian
[1] https://github.com/apache/calcite/blob/9d50e6d7418579c5a73d872e6aec5924ed97c239/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1027
> On Dec 26, 2018, at 2:44 AM, Krishnakant Agrawal <kk...@gmail.com> wrote:
>
> Hi All,
>
> When creating a RelNode for a Query with a DISTINCT keyword in it, I use
> the relBuilder.distinct() api to apply distinct.
> It creates a LogicalAggregate with all the Fields of the LogicalProject as
> the GroupKey.
>
> This is a problem when one of those projections is a windowing function
> (i.e SUM(col1) over (partition by col2) ).
>
> The group by key now contains an aggregate function which is wrong.
>
> The output is something like( SqlNode.toSqlString() ):-
> SELECT sum(col1) OVER (partition by col2) from t1 group by sum(col1) over
> (partition by col2).
>
> I have a fix ready for this.
> Basically, sub-querying the projection containing the aggregate function
> and apply the Group By(due to DISTINCT) outside the sub-query.
>
> Please let me know if this is an actual bug or my assumptions are wrong.
>
> Thanks,
> KrishnaKant