You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Zhong Yu (JIRA)" <ji...@apache.org> on 2018/02/27 13:39:00 UTC

[jira] [Created] (CALCITE-2195) AggregateJoinTransposeRule fails to aggregate over unique column

Zhong Yu created CALCITE-2195:
---------------------------------

             Summary: AggregateJoinTransposeRule fails to aggregate over unique column
                 Key: CALCITE-2195
                 URL: https://issues.apache.org/jira/browse/CALCITE-2195
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.15.0
            Reporter: Zhong Yu
            Assignee: Julian Hyde
             Fix For: 1.16.0


The following query, in which "A.sal" is unique,
{code:java}
select sum(A.sal)
from (select distinct sal from sales.emp) as A
join sales.emp as B on A.sal=B.sal
{code}
 causes AggregateJoinTransposeRule to fail with message
{code:java}
java.lang.AssertionError: type mismatch:
aggCall type:
INTEGER
inferred type:
BIGINT
{code}
Apparently, this is a bug in the rule when `unique` is true on the A side, in which case the rule does not aggregate on the A side, `leftSubTotal==null`, causing `splitter.topSplit()` to only sum over `count(*)` coming from the B side.

A solution would be to introduce `splitter.singleton()` on the A side, so that it can be fed to topSplit() to be multiplied by the count.

In the case that the `unique` side does not contain the column of an agg call, it seems that we should do `other_singleton()` on this side, and feed it to topSplit(). However, realize that the `other()` expression is necessarily a `count(*)`, or a scalar function of `count(*)`, because it does not depend on any column values. In the same way, the proposed `other_singleton()` necessarily returns 1, or some constant value. `topSplit()` would not have any need of that constant value.Therefore in this case, we don't need a split on this side, just leave its subtotal as null.

 

I'm working on a pull-request based on these understandings. BTW, is there a reference to the algorithm used in the code? I can only find some old papers that don't exactly cover the logic of the code. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)