You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Steven Talbot (Jira)" <ji...@apache.org> on 2019/12/11 23:26:00 UTC

[jira] [Created] (CALCITE-3593) RelToSqlConverter changes target of ambiguous having clause with a Project on top of an Aggregate

Steven Talbot created CALCITE-3593:
--------------------------------------

             Summary: RelToSqlConverter changes target of ambiguous having clause with a Project on top of an Aggregate
                 Key: CALCITE-3593
                 URL: https://issues.apache.org/jira/browse/CALCITE-3593
             Project: Calcite
          Issue Type: Bug
            Reporter: Steven Talbot


Best shown with the shell of a test in RelToSqlConverter test.

 

The following SQL on BigQuery 

 
{code:java}
select product_id - 1000 as product_id
from (
 select product_id, avg(gross_weight) as agw
 from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
 where net_weight < 100
 group by product_id having product_id > 0){code}
produces one result, because the having filter applies to the product id before subtraction, of course.

Running it through the machinery in that test translates it to:

 
{noformat}
SELECT product_id - 1000 AS product_id
from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
WHERE net_weight < 100
GROUP BY product_id
HAVING product_id > 0{noformat}
 

This changes the meaning of the query: now the HAVING is on the after-subtraction product_id and you get no results, rather than the one result.

 

Note that this is _not_ true in HyperSQL, as it has different semantics around the HAVING namespace.

 
{noformat}
select "product_id" - 1000 as "product_id"
from (
 select "product_id", avg("gross_weight") as agw
 from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM (VALUES(0))) as product
 where "net_weight" < 100
 group by "product_id" having "product_id" > 0){noformat}
becomes 

 
{noformat}
SELECT "product_id" - 1000 AS "product_id"
from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM (VALUES(0))) as product
WHERE "net_weight" < 100
GROUP BY "product_id"
HAVING "product_id" > 0{noformat}
 

But the meaning is preserved: both return a row. 

 

I'm not enough of a SQL standards expert to know which one is being more compliant, but presumably both would have to be supported via some sort of flag?

 

I think the fix would be to force the subselect on dialects such as BigQuery that have this behavior. Probably something that looks a lot like [https://github.com/apache/calcite/blob/3530daaa8cad43aad6845b6c79e4bc1ca0e72f5f/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1043-L1047]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)