You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Tao Li <ta...@zillow.com> on 2021/03/02 17:32:49 UTC

A problem with ZetaSQL

Hi all,

I was following the instructions from this doc to play with ZetaSQL  https://beam.apache.org/documentation/dsls/sql/overview/

The query is really simple:

options.as(BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))

I am seeing this error with ZetaSQL  :

Exception in thread "main" java.lang.UnsupportedOperationException: Unknown Calcite type: INTEGER
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
                at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
                at org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)

This query works fine when using Calcite (by just removing setPlannerName call). Am I missing anything here? For example I am specifying 'com.google.guava:guava:23.0' as the dependency.

Thanks!

Re: A problem with ZetaSQL

Posted by Tao Li <ta...@zillow.com>.

Robin/Brian,

I see. Thanks so much for your help!

From: Robin Qiu <ro...@google.com>
Date: Friday, March 5, 2021 at 12:31 AM
To: Brian Hulette <bh...@google.com>
Cc: Tao Li <ta...@zillow.com>, "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: A problem with ZetaSQL

Hi Tao,

In ZetaSQL all "integers" are 64 bits. So if your integers in column 1 and 2 are 32-bit it won't work. In terms of Beam schema it corresponds to INT64 type.

Best,
Robin

On Thu, Mar 4, 2021 at 6:07 PM Brian Hulette <bh...@google.com>> wrote:
Ah, I suspect this is because our ZetaSQL planner only supports 64 bit integers (see https://beam.apache.org/documentation/dsls/sql/zetasql/data-types/#integer-type<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Fdsls%2Fsql%2Fzetasql%2Fdata-types%2F%23integer-type&data=04%7C01%7Ctaol%40zillow.com%7Ca0d4f147c6cb452d000608d8dfb11429%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637505298932886941%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TdBKZ%2Fb1oJK5hCzW0zxaX4Yml8ObOqEwnSBSCW7ess8%3D&reserved=0>). +Robin Qiu<ma...@google.com> maybe we should have a better error message for this?

On Thu, Mar 4, 2021 at 5:24 PM Tao Li <ta...@zillow.com>> wrote:
Brian the schema is really simple. Just 3 primitive type columns:

root
|-- column_1: integer (nullable = true)
|-- column_2: integer (nullable = true)
|-- column_3: string (nullable = true)

From: Brian Hulette <bh...@google.com>>
Date: Thursday, March 4, 2021 at 2:29 PM
To: Tao Li <ta...@zillow.com>>
Cc: "user@beam.apache.org<ma...@beam.apache.org>" <us...@beam.apache.org>>
Subject: Re: A problem with ZetaSQL

Thanks, It would also be helpful to know what avroSchema is, or at least the types of its fields, so we can understand what the schema of the PCollection is.

On Tue, Mar 2, 2021 at 11:00 AM Tao Li <ta...@zillow.com>> wrote:
Hi Brian,

Here is my code to create the PCollection<Row>.

PCollection<FileIO.ReadableFile> files = pipeline
                .apply(FileIO.match().filepattern(path))
                .apply(FileIO.readMatches());

PCollection<Row> input =  files
                .apply(ParquetIO.readFiles(avroSchema))
                .apply(MapElements
                        .into(TypeDescriptors.rows())
                        .via(AvroUtils.getGenericRecordToRowFunction(AvroUtils.toBeamSchema(avroSchema))))
                .setCoder(RowCoder.of(AvroUtils.toBeamSchema(avroSchema)));

From: Brian Hulette <bh...@google.com>>
Reply-To: "user@beam.apache.org<ma...@beam.apache.org>" <us...@beam.apache.org>>
Date: Tuesday, March 2, 2021 at 10:31 AM
To: user <us...@beam.apache.org>>
Subject: Re: A problem with ZetaSQL

Thanks for reporting this Tao - could you share what the type of your input PCollection is?

On Tue, Mar 2, 2021 at 9:33 AM Tao Li <ta...@zillow.com>> wrote:
Hi all,

I was following the instructions from this doc to play with ZetaSQL  https://beam.apache.org/documentation/dsls/sql/overview/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Fdsls%2Fsql%2Foverview%2F&data=04%7C01%7Ctaol%40zillow.com%7Ca0d4f147c6cb452d000608d8dfb11429%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637505298932886941%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NHls0A5FN3HVwnbXmTJtppUikUkmYt9AmtPj2OuaVJk%3D&reserved=0>

The query is really simple:

options.as<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Foptions.as%2F&data=04%7C01%7Ctaol%40zillow.com%7Ca0d4f147c6cb452d000608d8dfb11429%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637505298932896898%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WHHeq15IoNXiPmg5grfi%2Bmzi%2FXAp1u%2Bf96DXgPrD6%2Fg%3D&reserved=0>(BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))

I am seeing this error with ZetaSQL  :

Exception in thread "main" java.lang.UnsupportedOperationException: Unknown Calcite type: INTEGER
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
                at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
                at org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)

This query works fine when using Calcite (by just removing setPlannerName call). Am I missing anything here? For example I am specifying 'com.google.guava:guava:23.0' as the dependency.

Thanks!

Re: A problem with ZetaSQL

Posted by Robin Qiu <ro...@google.com>.

Hi Tao,

In ZetaSQL all "integers" are 64 bits. So if your integers in column 1 and
2 are 32-bit it won't work. In terms of Beam schema it corresponds to INT64
type.

Best,
Robin

On Thu, Mar 4, 2021 at 6:07 PM Brian Hulette <bh...@google.com> wrote:

> Ah, I suspect this is because our ZetaSQL planner only supports 64 bit
> integers (see
> https://beam.apache.org/documentation/dsls/sql/zetasql/data-types/#integer-type
> ). +Robin Qiu <ro...@google.com> maybe we should have a better error
> message for this?
>
> On Thu, Mar 4, 2021 at 5:24 PM Tao Li <ta...@zillow.com> wrote:
>
>> Brian the schema is really simple. Just 3 primitive type columns:
>>
>>
>>
>> root
>>
>> |-- column_1: integer (nullable = true)
>>
>> |-- column_2: integer (nullable = true)
>>
>> |-- column_3: string (nullable = true)
>>
>>
>>
>>
>>
>> *From: *Brian Hulette <bh...@google.com>
>> *Date: *Thursday, March 4, 2021 at 2:29 PM
>> *To: *Tao Li <ta...@zillow.com>
>> *Cc: *"user@beam.apache.org" <us...@beam.apache.org>
>> *Subject: *Re: A problem with ZetaSQL
>>
>>
>>
>> Thanks, It would also be helpful to know what avroSchema is, or at least
>> the types of its fields, so we can understand what the schema of the
>> PCollection is.
>>
>>
>>
>> On Tue, Mar 2, 2021 at 11:00 AM Tao Li <ta...@zillow.com> wrote:
>>
>> Hi Brian,
>>
>>
>>
>> Here is my code to create the PCollection<Row>.
>>
>>
>>
>> PCollection<FileIO.ReadableFile> files = pipeline
>>
>>                 .apply(FileIO.match().filepattern(path))
>>
>>                 .apply(FileIO.readMatches());
>>
>>
>>
>> PCollection<Row> input =  files
>>
>>                 .apply(ParquetIO.readFiles(avroSchema))
>>
>>                 .apply(MapElements
>>
>>                         .into(TypeDescriptors.rows())
>>
>>
>> .via(AvroUtils.getGenericRecordToRowFunction(AvroUtils.toBeamSchema(avroSchema))))
>>
>>
>> .setCoder(RowCoder.of(AvroUtils.toBeamSchema(avroSchema)));
>>
>>
>>
>>
>>
>> *From: *Brian Hulette <bh...@google.com>
>> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
>> *Date: *Tuesday, March 2, 2021 at 10:31 AM
>> *To: *user <us...@beam.apache.org>
>> *Subject: *Re: A problem with ZetaSQL
>>
>>
>>
>> Thanks for reporting this Tao - could you share what the type of your
>> input PCollection is?
>>
>>
>>
>> On Tue, Mar 2, 2021 at 9:33 AM Tao Li <ta...@zillow.com> wrote:
>>
>> Hi all,
>>
>>
>>
>> I was following the instructions from this doc to play with ZetaSQL
>> https://beam.apache.org/documentation/dsls/sql/overview/
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Fdsls%2Fsql%2Foverview%2F&data=04%7C01%7Ctaol%40zillow.com%7C44e3c1a43333455172a108d8df5d0428%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637504937882864479%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RAXCN9Fbze5N41n35EkgY%2BkNn7pvN1Exib6%2BUr7Df3k%3D&reserved=0>
>>
>>
>>
>> The query is really simple:
>>
>>
>>
>> options.as
>> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Foptions.as%2F&data=04%7C01%7Ctaol%40zillow.com%7C44e3c1a43333455172a108d8df5d0428%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637504937882864479%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UdcvMpWl%2FfmUhxlIu7igK1yTRMDWgIpA7bV2yKYlInU%3D&reserved=0>
>> (BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
>>
>> input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))
>>
>>
>>
>> I am seeing this error with ZetaSQL  :
>>
>>
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException:
>> Unknown Calcite type: INTEGER
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
>>
>>                 at
>> com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
>>
>>                 at
>> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)
>>
>>
>>
>> This query works fine when using Calcite (by just removing setPlannerName
>> call). Am I missing anything here? For example I am specifying
>> 'com.google.guava:guava:23.0' as the dependency.
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>>

Re: A problem with ZetaSQL

Posted by Brian Hulette <bh...@google.com>.

Ah, I suspect this is because our ZetaSQL planner only supports 64 bit
integers (see
https://beam.apache.org/documentation/dsls/sql/zetasql/data-types/#integer-type
). +Robin Qiu <ro...@google.com> maybe we should have a better error
message for this?

On Thu, Mar 4, 2021 at 5:24 PM Tao Li <ta...@zillow.com> wrote:

> Brian the schema is really simple. Just 3 primitive type columns:
>
>
>
> root
>
> |-- column_1: integer (nullable = true)
>
> |-- column_2: integer (nullable = true)
>
> |-- column_3: string (nullable = true)
>
>
>
>
>
> *From: *Brian Hulette <bh...@google.com>
> *Date: *Thursday, March 4, 2021 at 2:29 PM
> *To: *Tao Li <ta...@zillow.com>
> *Cc: *"user@beam.apache.org" <us...@beam.apache.org>
> *Subject: *Re: A problem with ZetaSQL
>
>
>
> Thanks, It would also be helpful to know what avroSchema is, or at least
> the types of its fields, so we can understand what the schema of the
> PCollection is.
>
>
>
> On Tue, Mar 2, 2021 at 11:00 AM Tao Li <ta...@zillow.com> wrote:
>
> Hi Brian,
>
>
>
> Here is my code to create the PCollection<Row>.
>
>
>
> PCollection<FileIO.ReadableFile> files = pipeline
>
>                 .apply(FileIO.match().filepattern(path))
>
>                 .apply(FileIO.readMatches());
>
>
>
> PCollection<Row> input =  files
>
>                 .apply(ParquetIO.readFiles(avroSchema))
>
>                 .apply(MapElements
>
>                         .into(TypeDescriptors.rows())
>
>
> .via(AvroUtils.getGenericRecordToRowFunction(AvroUtils.toBeamSchema(avroSchema))))
>
>                 .setCoder(RowCoder.of(AvroUtils.toBeamSchema(avroSchema)));
>
>
>
>
>
> *From: *Brian Hulette <bh...@google.com>
> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
> *Date: *Tuesday, March 2, 2021 at 10:31 AM
> *To: *user <us...@beam.apache.org>
> *Subject: *Re: A problem with ZetaSQL
>
>
>
> Thanks for reporting this Tao - could you share what the type of your
> input PCollection is?
>
>
>
> On Tue, Mar 2, 2021 at 9:33 AM Tao Li <ta...@zillow.com> wrote:
>
> Hi all,
>
>
>
> I was following the instructions from this doc to play with ZetaSQL
> https://beam.apache.org/documentation/dsls/sql/overview/
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Fdsls%2Fsql%2Foverview%2F&data=04%7C01%7Ctaol%40zillow.com%7C44e3c1a43333455172a108d8df5d0428%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637504937882864479%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RAXCN9Fbze5N41n35EkgY%2BkNn7pvN1Exib6%2BUr7Df3k%3D&reserved=0>
>
>
>
> The query is really simple:
>
>
>
> options.as
> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Foptions.as%2F&data=04%7C01%7Ctaol%40zillow.com%7C44e3c1a43333455172a108d8df5d0428%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637504937882864479%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UdcvMpWl%2FfmUhxlIu7igK1yTRMDWgIpA7bV2yKYlInU%3D&reserved=0>
> (BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
>
> input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))
>
>
>
> I am seeing this error with ZetaSQL  :
>
>
>
> Exception in thread "main" java.lang.UnsupportedOperationException:
> Unknown Calcite type: INTEGER
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
>
>                 at
> com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
>
>                 at
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
>
>                 at
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
>
>                 at
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)
>
>
>
> This query works fine when using Calcite (by just removing setPlannerName
> call). Am I missing anything here? For example I am specifying
> 'com.google.guava:guava:23.0' as the dependency.
>
>
>
> Thanks!
>
>
>
>
>
>

Re: A problem with ZetaSQL

Posted by Tao Li <ta...@zillow.com>.

Brian the schema is really simple. Just 3 primitive type columns:

root
|-- column_1: integer (nullable = true)
|-- column_2: integer (nullable = true)
|-- column_3: string (nullable = true)


From: Brian Hulette <bh...@google.com>
Date: Thursday, March 4, 2021 at 2:29 PM
To: Tao Li <ta...@zillow.com>
Cc: "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: A problem with ZetaSQL

Thanks, It would also be helpful to know what avroSchema is, or at least the types of its fields, so we can understand what the schema of the PCollection is.

On Tue, Mar 2, 2021 at 11:00 AM Tao Li <ta...@zillow.com>> wrote:
Hi Brian,

Here is my code to create the PCollection<Row>.

PCollection<FileIO.ReadableFile> files = pipeline
                .apply(FileIO.match().filepattern(path))
                .apply(FileIO.readMatches());

PCollection<Row> input =  files
                .apply(ParquetIO.readFiles(avroSchema))
                .apply(MapElements
                        .into(TypeDescriptors.rows())
                        .via(AvroUtils.getGenericRecordToRowFunction(AvroUtils.toBeamSchema(avroSchema))))
                .setCoder(RowCoder.of(AvroUtils.toBeamSchema(avroSchema)));


From: Brian Hulette <bh...@google.com>>
Reply-To: "user@beam.apache.org<ma...@beam.apache.org>" <us...@beam.apache.org>>
Date: Tuesday, March 2, 2021 at 10:31 AM
To: user <us...@beam.apache.org>>
Subject: Re: A problem with ZetaSQL

Thanks for reporting this Tao - could you share what the type of your input PCollection is?

On Tue, Mar 2, 2021 at 9:33 AM Tao Li <ta...@zillow.com>> wrote:
Hi all,

I was following the instructions from this doc to play with ZetaSQL  https://beam.apache.org/documentation/dsls/sql/overview/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Fdsls%2Fsql%2Foverview%2F&data=04%7C01%7Ctaol%40zillow.com%7C44e3c1a43333455172a108d8df5d0428%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637504937882864479%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RAXCN9Fbze5N41n35EkgY%2BkNn7pvN1Exib6%2BUr7Df3k%3D&reserved=0>

The query is really simple:

options.as<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Foptions.as%2F&data=04%7C01%7Ctaol%40zillow.com%7C44e3c1a43333455172a108d8df5d0428%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637504937882864479%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=UdcvMpWl%2FfmUhxlIu7igK1yTRMDWgIpA7bV2yKYlInU%3D&reserved=0>(BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))

I am seeing this error with ZetaSQL  :

Exception in thread "main" java.lang.UnsupportedOperationException: Unknown Calcite type: INTEGER
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
                at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
                at org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)

This query works fine when using Calcite (by just removing setPlannerName call). Am I missing anything here? For example I am specifying 'com.google.guava:guava:23.0' as the dependency.

Thanks!

Re: A problem with ZetaSQL

Posted by Brian Hulette <bh...@google.com>.

Thanks, It would also be helpful to know what avroSchema is, or at least
the types of its fields, so we can understand what the schema of the
PCollection is.

On Tue, Mar 2, 2021 at 11:00 AM Tao Li <ta...@zillow.com> wrote:

> Hi Brian,
>
>
>
> Here is my code to create the PCollection<Row>.
>
>
>
> PCollection<FileIO.ReadableFile> files = pipeline
>
>                 .apply(FileIO.match().filepattern(path))
>
>                 .apply(FileIO.readMatches());
>
>
>
> PCollection<Row> input =  files
>
>                 .apply(ParquetIO.readFiles(avroSchema))
>
>                 .apply(MapElements
>
>                         .into(TypeDescriptors.rows())
>
>
> .via(AvroUtils.getGenericRecordToRowFunction(AvroUtils.toBeamSchema(avroSchema))))
>
>                 .setCoder(RowCoder.of(AvroUtils.toBeamSchema(avroSchema)));
>
>
>
>
>
> *From: *Brian Hulette <bh...@google.com>
> *Reply-To: *"user@beam.apache.org" <us...@beam.apache.org>
> *Date: *Tuesday, March 2, 2021 at 10:31 AM
> *To: *user <us...@beam.apache.org>
> *Subject: *Re: A problem with ZetaSQL
>
>
>
> Thanks for reporting this Tao - could you share what the type of your
> input PCollection is?
>
>
>
> On Tue, Mar 2, 2021 at 9:33 AM Tao Li <ta...@zillow.com> wrote:
>
> Hi all,
>
>
>
> I was following the instructions from this doc to play with ZetaSQL
> https://beam.apache.org/documentation/dsls/sql/overview/
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Fdsls%2Fsql%2Foverview%2F&data=04%7C01%7Ctaol%40zillow.com%7Cde9c6a92756146a41b8308d8dda95de7%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637503066785410226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jv7rLyLR5pHlokEv1Ngnglfp%2Fvw6Ui5Mzn%2BfvJ4B104%3D&reserved=0>
>
>
>
> The query is really simple:
>
>
>
> options.as
> <https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Foptions.as%2F&data=04%7C01%7Ctaol%40zillow.com%7Cde9c6a92756146a41b8308d8dda95de7%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637503066785410226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=q0epUsinWTFpWWJ%2BjjtAFw5RRasgT2ivm5%2FG%2FrXU1Hg%3D&reserved=0>
> (BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
>
> input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))
>
>
>
> I am seeing this error with ZetaSQL  :
>
>
>
> Exception in thread "main" java.lang.UnsupportedOperationException:
> Unknown Calcite type: INTEGER
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
>
>                 at
> com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
>
>                 at
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
>
>                 at
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
>
>                 at
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)
>
>
>
> This query works fine when using Calcite (by just removing setPlannerName
> call). Am I missing anything here? For example I am specifying
> 'com.google.guava:guava:23.0' as the dependency.
>
>
>
> Thanks!
>
>
>
>
>
>

Re: A problem with ZetaSQL

Posted by Tao Li <ta...@zillow.com>.

Hi Brian,

Here is my code to create the PCollection<Row>.

PCollection<FileIO.ReadableFile> files = pipeline
                .apply(FileIO.match().filepattern(path))
                .apply(FileIO.readMatches());

PCollection<Row> input =  files
                .apply(ParquetIO.readFiles(avroSchema))
                .apply(MapElements
                        .into(TypeDescriptors.rows())
                        .via(AvroUtils.getGenericRecordToRowFunction(AvroUtils.toBeamSchema(avroSchema))))
                .setCoder(RowCoder.of(AvroUtils.toBeamSchema(avroSchema)));


From: Brian Hulette <bh...@google.com>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Tuesday, March 2, 2021 at 10:31 AM
To: user <us...@beam.apache.org>
Subject: Re: A problem with ZetaSQL

Thanks for reporting this Tao - could you share what the type of your input PCollection is?

On Tue, Mar 2, 2021 at 9:33 AM Tao Li <ta...@zillow.com>> wrote:
Hi all,

I was following the instructions from this doc to play with ZetaSQL  https://beam.apache.org/documentation/dsls/sql/overview/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Fdsls%2Fsql%2Foverview%2F&data=04%7C01%7Ctaol%40zillow.com%7Cde9c6a92756146a41b8308d8dda95de7%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637503066785410226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jv7rLyLR5pHlokEv1Ngnglfp%2Fvw6Ui5Mzn%2BfvJ4B104%3D&reserved=0>

The query is really simple:

options.as<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Foptions.as%2F&data=04%7C01%7Ctaol%40zillow.com%7Cde9c6a92756146a41b8308d8dda95de7%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637503066785410226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=q0epUsinWTFpWWJ%2BjjtAFw5RRasgT2ivm5%2FG%2FrXU1Hg%3D&reserved=0>(BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))

I am seeing this error with ZetaSQL  :

Exception in thread "main" java.lang.UnsupportedOperationException: Unknown Calcite type: INTEGER
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
                at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
                at org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
                at org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
                at org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
                at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)

This query works fine when using Calcite (by just removing setPlannerName call). Am I missing anything here? For example I am specifying 'com.google.guava:guava:23.0' as the dependency.

Thanks!

Re: A problem with ZetaSQL

Posted by Brian Hulette <bh...@google.com>.

Thanks for reporting this Tao - could you share what the type of your input
PCollection is?

On Tue, Mar 2, 2021 at 9:33 AM Tao Li <ta...@zillow.com> wrote:

> Hi all,
>
>
>
> I was following the instructions from this doc to play with ZetaSQL
> https://beam.apache.org/documentation/dsls/sql/overview/
>
>
>
> The query is really simple:
>
>
>
> options.as
> (BeamSqlPipelineOptions.class).setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner")
>
> input.apply(SqlTransform.query("SELECT * from PCOLLECTION"))
>
>
>
> I am seeing this error with ZetaSQL  :
>
>
>
> Exception in thread "main" java.lang.UnsupportedOperationException:
> Unknown Calcite type: INTEGER
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSqlCalciteTranslationUtils.toZetaSqlType(ZetaSqlCalciteTranslationUtils.java:114)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addFieldsToTable(SqlAnalyzer.java:359)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:350)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$1(SqlAnalyzer.java:225)
>
>                 at
> com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:225)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:102)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRelInternal(ZetaSQLQueryPlanner.java:180)
>
>                 at
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:168)
>
>                 at
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:114)
>
>                 at
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:140)
>
>                 at
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:86)
>
>
>
> This query works fine when using Calcite (by just removing setPlannerName
> call). Am I missing anything here? For example I am specifying
> 'com.google.guava:guava:23.0' as the dependency.
>
>
>
> Thanks!
>
>
>
>
>