You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by 徐仁和 <xu...@gmail.com> on 2022/03/10 10:13:42 UTC
Some different behaviors using `SqlConformance#isGroupByAlias=true`
Hi community
I found some different behaviors in the SQL's verification phase using
`SqlConformance#isGroupByAlias=true`.
This case is that the alias in the grouping is equal to the name of the
table's column.
For example:
CREATE TABLE test_tbl (
id integer,
col1 varchar
);
id | col1
----+------
1 | abc
2 | abb
3 | abcd
4 | a
5 | abc
-- origin sql
SELECT LENGTH(`col1`) AS `col1`, COUNT(*)
FROM `test_tbl`
GROUP BY `col1`
-- converted rel
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
LogicalProject(col1=[LENGTH($1)])
LogicalTableScan(table=[[default, test_tbl]])
According to rel, we will get results as below:
col1| EXPR$1
----+-------
3 | 3
1 | 1
4 | 1
But, It's different from mysql.
MySQL-Result(MySQL5.7 or MySQL8.0):
+------+----------+
| col1 | COUNT(*) |
+------+----------+
| 1 | 1 |
| 3 | 1 |
| 3 | 2 |
| 4 | 1 |
+------+----------+
The behavior of MySQL is grouping by table[test_tbl]'s column[col1]
firstly, not grouping by LENGTH(`col1`).
And it calc expression of LENGTH(`col1`) after executing aggregate.
I tried this case in PostgreSQL and SQLite, and found that the two
sql-engine's behavior is the same as MySQL.
PostgreSQL-Result(PostgreSQL 13):
col1 | count
------+-------
1 | 1
3 | 1
4 | 1
3 | 2
SQLite-Result:(SQLite 3)
col1 | count
------+-------
1 | 1
3 | 1
4 | 1
3 | 2
PS:
-- Plan of PostgreSQL:
EXPLAIN SELECT LENGTH("col1") AS "col1", COUNT(*)
FROM "test_tbl"
GROUP BY "col1";
QUERY PLAN
-------------------------------------------------------------------
HashAggregate (cost=12.10..13.85 rows=140 width=528)
Group Key: col1
-> Seq Scan on test_tbl (cost=0.00..11.40 rows=140 width=516)
(3 rows)
Best
Xurenhe
Re: Some different behaviors using `SqlConformance#isGroupByAlias=true`
Posted by 徐仁和 <xu...@gmail.com>.
Thanks for Yanjing Wang.
It's relevant to the CALCITE-4512:
https://issues.apache.org/jira/browse/CALCITE-4512
I do some investigation on some sql-engines, and I hope to help the
discussion of CALCITE-4512.
Yanjing Wang <zh...@gmail.com> 于2022年3月11日周五 10:30写道:
> Hi Xuren,
> This jira CALCITE-4512 <https://issues.apache.org/jira/browse/CALCITE-4512
> >
> may be similar to your problem, hope helpful for you.
>
> 徐仁和 <xu...@gmail.com> 于2022年3月10日周四 18:14写道:
>
> > Hi community
> >
> > I found some different behaviors in the SQL's verification phase using
> > `SqlConformance#isGroupByAlias=true`.
> > This case is that the alias in the grouping is equal to the name of the
> > table's column.
> >
> >
> > For example:
> > CREATE TABLE test_tbl (
> > id integer,
> > col1 varchar
> > );
> > id | col1
> > ----+------
> > 1 | abc
> > 2 | abb
> > 3 | abcd
> > 4 | a
> > 5 | abc
> >
> >
> > -- origin sql
> > SELECT LENGTH(`col1`) AS `col1`, COUNT(*)
> > FROM `test_tbl`
> > GROUP BY `col1`
> >
> > -- converted rel
> > LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
> > LogicalProject(col1=[LENGTH($1)])
> > LogicalTableScan(table=[[default, test_tbl]])
> >
> > According to rel, we will get results as below:
> > col1| EXPR$1
> > ----+-------
> > 3 | 3
> > 1 | 1
> > 4 | 1
> >
> >
> > But, It's different from mysql.
> >
> > MySQL-Result(MySQL5.7 or MySQL8.0):
> > +------+----------+
> > | col1 | COUNT(*) |
> > +------+----------+
> > | 1 | 1 |
> > | 3 | 1 |
> > | 3 | 2 |
> > | 4 | 1 |
> > +------+----------+
> >
> > The behavior of MySQL is grouping by table[test_tbl]'s column[col1]
> > firstly, not grouping by LENGTH(`col1`).
> > And it calc expression of LENGTH(`col1`) after executing aggregate.
> >
> >
> > I tried this case in PostgreSQL and SQLite, and found that the two
> > sql-engine's behavior is the same as MySQL.
> >
> > PostgreSQL-Result(PostgreSQL 13):
> > col1 | count
> > ------+-------
> > 1 | 1
> > 3 | 1
> > 4 | 1
> > 3 | 2
> >
> > SQLite-Result:(SQLite 3)
> > col1 | count
> > ------+-------
> > 1 | 1
> > 3 | 1
> > 4 | 1
> > 3 | 2
> >
> >
> > PS:
> > -- Plan of PostgreSQL:
> > EXPLAIN SELECT LENGTH("col1") AS "col1", COUNT(*)
> > FROM "test_tbl"
> > GROUP BY "col1";
> > QUERY PLAN
> > -------------------------------------------------------------------
> > HashAggregate (cost=12.10..13.85 rows=140 width=528)
> > Group Key: col1
> > -> Seq Scan on test_tbl (cost=0.00..11.40 rows=140 width=516)
> > (3 rows)
> >
> >
> > Best
> > Xurenhe
> >
>
Re: Some different behaviors using `SqlConformance#isGroupByAlias=true`
Posted by Yanjing Wang <zh...@gmail.com>.
Hi Xuren,
This jira CALCITE-4512 <https://issues.apache.org/jira/browse/CALCITE-4512>
may be similar to your problem, hope helpful for you.
徐仁和 <xu...@gmail.com> 于2022年3月10日周四 18:14写道:
> Hi community
>
> I found some different behaviors in the SQL's verification phase using
> `SqlConformance#isGroupByAlias=true`.
> This case is that the alias in the grouping is equal to the name of the
> table's column.
>
>
> For example:
> CREATE TABLE test_tbl (
> id integer,
> col1 varchar
> );
> id | col1
> ----+------
> 1 | abc
> 2 | abb
> 3 | abcd
> 4 | a
> 5 | abc
>
>
> -- origin sql
> SELECT LENGTH(`col1`) AS `col1`, COUNT(*)
> FROM `test_tbl`
> GROUP BY `col1`
>
> -- converted rel
> LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
> LogicalProject(col1=[LENGTH($1)])
> LogicalTableScan(table=[[default, test_tbl]])
>
> According to rel, we will get results as below:
> col1| EXPR$1
> ----+-------
> 3 | 3
> 1 | 1
> 4 | 1
>
>
> But, It's different from mysql.
>
> MySQL-Result(MySQL5.7 or MySQL8.0):
> +------+----------+
> | col1 | COUNT(*) |
> +------+----------+
> | 1 | 1 |
> | 3 | 1 |
> | 3 | 2 |
> | 4 | 1 |
> +------+----------+
>
> The behavior of MySQL is grouping by table[test_tbl]'s column[col1]
> firstly, not grouping by LENGTH(`col1`).
> And it calc expression of LENGTH(`col1`) after executing aggregate.
>
>
> I tried this case in PostgreSQL and SQLite, and found that the two
> sql-engine's behavior is the same as MySQL.
>
> PostgreSQL-Result(PostgreSQL 13):
> col1 | count
> ------+-------
> 1 | 1
> 3 | 1
> 4 | 1
> 3 | 2
>
> SQLite-Result:(SQLite 3)
> col1 | count
> ------+-------
> 1 | 1
> 3 | 1
> 4 | 1
> 3 | 2
>
>
> PS:
> -- Plan of PostgreSQL:
> EXPLAIN SELECT LENGTH("col1") AS "col1", COUNT(*)
> FROM "test_tbl"
> GROUP BY "col1";
> QUERY PLAN
> -------------------------------------------------------------------
> HashAggregate (cost=12.10..13.85 rows=140 width=528)
> Group Key: col1
> -> Seq Scan on test_tbl (cost=0.00..11.40 rows=140 width=516)
> (3 rows)
>
>
> Best
> Xurenhe
>