You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Yanjing Wang <zh...@gmail.com> on 2022/02/24 11:41:40 UTC

[Discuss] Converting SELECT Fields to STAR may cause duplicate column exception when executing ctas

Hi community,

I'm trying to convert a plan to a sql of SPARK dialect, but sometimes the
fields will be converted to star, see the following example

*create table users(id int);*
*create table depts(id int);*

*select a.id <http://a.id>, b.id <http://b.id> as id0 from users1 a, depts1
b*

converting to plan results
LogicalProject(id=[$0], id0=[$1])
  LogicalJoin(condition=[true], joinType=[inner])
    HiveTableScan(table=[[default, users1]])
    HiveTableScan(table=[[default, depts1]])

converting the plan to SPARK sql results


*SELECT *FROM `default`.`users1`CROSS JOIN `default`.`depts1`*

because PROJECT and JOIN row type are identical, the *PROJECT* has been
converted to *SELECT ** .

the problem is the SPARK sql can't be used to create table, such as


*CREATE TABLE tmp AS SELECT *FROM `default`.`users1`CROSS JOIN
`default`.`depts1`*

this ctas will throw  '*Found duplicate column(s) in the table definition'*

I wonder if we can add a config to indicate the *PROJECT *couldn't be
converted to star.

how do you think?

Re: [Discuss] Converting SELECT Fields to STAR may cause duplicate column exception when executing ctas

Posted by Yanjing Wang <zh...@gmail.com>.

It seems that *RelToSqlConverter#visit(Project e) *method's code piece *if
(!isStar(e.getProjects(), e.getInput().getRowType(), e.getRowType()))* has
caused this problem.

I have logged a jira CALCITE-5023
<https://issues.apache.org/jira/browse/CALCITE-5023> for this problem and a
pr-2731 <https://github.com/apache/calcite/pull/2731>, welcome review,
thanks.


Julian Hyde <jh...@apache.org> 于2022年2月25日周五 06:07写道：

> Which code is generating this "SELECT *" query? It seems wrong to
> generate "SELECT *". It should generate a list of columns with unique
> names instead.
>
> On Thu, Feb 24, 2022 at 3:41 AM Yanjing Wang <zh...@gmail.com>
> wrote:
> >
> > Hi community,
> >
> > I'm trying to convert a plan to a sql of SPARK dialect, but sometimes the
> > fields will be converted to star, see the following example
> >
> > *create table users(id int);*
> > *create table depts(id int);*
> >
> > *select a.id <http://a.id>, b.id <http://b.id> as id0 from users1 a,
> depts1
> > b*
> >
> > converting to plan results
> > LogicalProject(id=[$0], id0=[$1])
> >   LogicalJoin(condition=[true], joinType=[inner])
> >     HiveTableScan(table=[[default, users1]])
> >     HiveTableScan(table=[[default, depts1]])
> >
> > converting the plan to SPARK sql results
> >
> >
> > *SELECT *FROM `default`.`users1`CROSS JOIN `default`.`depts1`*
> >
> > because PROJECT and JOIN row type are identical, the *PROJECT* has been
> > converted to *SELECT ** .
> >
> > the problem is the SPARK sql can't be used to create table, such as
> >
> >
> > *CREATE TABLE tmp AS SELECT *FROM `default`.`users1`CROSS JOIN
> > `default`.`depts1`*
> >
> > this ctas will throw  '*Found duplicate column(s) in the table
> definition'*
> >
> > I wonder if we can add a config to indicate the *PROJECT *couldn't be
> > converted to star.
> >
> > how do you think?
>

Re: [Discuss] Converting SELECT Fields to STAR may cause duplicate column exception when executing ctas

Posted by Julian Hyde <jh...@apache.org>.

Which code is generating this "SELECT *" query? It seems wrong to
generate "SELECT *". It should generate a list of columns with unique
names instead.

On Thu, Feb 24, 2022 at 3:41 AM Yanjing Wang <zh...@gmail.com> wrote:
>
> Hi community,
>
> I'm trying to convert a plan to a sql of SPARK dialect, but sometimes the
> fields will be converted to star, see the following example
>
> *create table users(id int);*
> *create table depts(id int);*
>
> *select a.id <http://a.id>, b.id <http://b.id> as id0 from users1 a, depts1
> b*
>
> converting to plan results
> LogicalProject(id=[$0], id0=[$1])
>   LogicalJoin(condition=[true], joinType=[inner])
>     HiveTableScan(table=[[default, users1]])
>     HiveTableScan(table=[[default, depts1]])
>
> converting the plan to SPARK sql results
>
>
> *SELECT *FROM `default`.`users1`CROSS JOIN `default`.`depts1`*
>
> because PROJECT and JOIN row type are identical, the *PROJECT* has been
> converted to *SELECT ** .
>
> the problem is the SPARK sql can't be used to create table, such as
>
>
> *CREATE TABLE tmp AS SELECT *FROM `default`.`users1`CROSS JOIN
> `default`.`depts1`*
>
> this ctas will throw  '*Found duplicate column(s) in the table definition'*
>
> I wonder if we can add a config to indicate the *PROJECT *couldn't be
> converted to star.
>
> how do you think?