You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kohki Nishio <ta...@gmail.com> on 2023/01/27 23:34:50 UTC
Spark SQL question
this SQL works
select 1 as *`data.group`* from tbl group by *data.group*
Since there's no such field as *data,* I thought the SQL has to look like
this
select 1 as *`data.group`* from tbl group by `*data.group`*
But that gives and error (cannot resolve '`data.group`') ... I'm no expert
in SQL, but feel like it's a strange behavior... does anybody have a good
explanation for it ?
Thanks
--
Kohki Nishio
SQL GROUP BY alias with dots, was: Spark SQL question
Posted by Enrico Minack <in...@enrico.minack.dev>.
Hi,
you are right, that is an interesting question.
Looks like GROUP BY is doing something funny / magic here (spark-shell
3.3.1 and 3.5.0-SNAPSHOT):
With an alias, it behaves as you have pointed out:
spark.range(3).createTempView("ids_without_dots")
spark.sql("SELECT * FROM ids_without_dots").show()
// works
spark.sql("SELECT id AS `an.id` FROM ids_without_dots GROUP BY
an.id").show()
// fails
spark.sql("SELECT id AS `an.id` FROM ids_without_dots GROUP BY
`an.id`").show()
Without an alias, it behaves as expected, which is the opposite of above
(a column with a dot exists, no alias used in SELECT):
spark.range(3).select($"id".as("an.id")).createTempView("ids_with_dots")
spark.sql("SELECT `an.id` FROM ids_with_dots").show()
// works
spark.sql("SELECT `an.id` FROM ids_with_dots GROUP BY `an.id`").show()
// fails
spark.sql("SELECT `an.id` FROM ids_with_dots GROUP BY an.id").show()
With a struct column, it also behaves as expected:
spark.range(3).select(struct($"id").as("an")).createTempView("ids_with_struct")
spark.sql("SELECT an.id FROM ids_with_struct").show()
// works
spark.sql("SELECT an.id FROM ids_with_struct GROUP BY an.id").show()
// fails
spark.sql("SELECT `an.id` FROM ids_with_struct GROUP BY an.id").show()
spark.sql("SELECT an.id FROM ids_with_struct GROUP BY `an.id`").show()
spark.sql("SELECT `an.id` FROM ids_with_struct GROUP BY `an.id`").show()
This does not feel very consistent.
Enrico
Am 28.01.23 um 00:34 schrieb Kohki Nishio:
> this SQL works
>
> select 1 as *`data.group`* from tbl group by *data.group*
>
>
> Since there's no such field as *data,* I thought the SQL has to look
> like this
>
> select 1 as *`data.group`* from tbl group by `*data.group`*
>
>
> But that gives and error (cannot resolve '`data.group`') ... I'm no
> expert in SQL, but feel like it's a strange behavior... does anybody
> have a good explanation for it ?
>
> Thanks
>
> --
> Kohki Nishio
Re: Spark SQL question
Posted by Bjørn Jørgensen <bj...@gmail.com>.
Hi Mich.
This is a Spark user group mailing list where people can ask *any*
questions about spark.
You know SQL and streaming, but I don't think it's necessary to start a
replay with "*LOL*" to the question that's being asked.
No questions are to stupid to be asked.
lør. 28. jan. 2023 kl. 09:22 skrev Mich Talebzadeh <
mich.talebzadeh@gmail.com>:
> LOL
>
> First one
>
> spark-sql> select 1 as `data.group` from abc group by data.group;
> 1
> Time taken: 0.198 seconds, Fetched 1 row(s)
>
> means that are assigning alias data.group to select and you are using that
> alias -> data.group in your group by statement
>
>
> This is equivalent to
>
>
> spark-sql> select 1 as `data.group` from abc group by 1;
>
> 1
>
> With regard to your second sql
>
>
> select 1 as *`data.group`* from tbl group by `*data.group`;*
>
>
> *will throw an error *
>
>
> *spark-sql> select 1 as `data.group` from abc group by `data.group`;*
>
> *Error in query: cannot resolve '`data.group`' given input columns:
> [spark_catalog.elayer.abc.keyword, spark_catalog.elayer.abc.occurence];
> line 1 pos 43;*
>
> *'Aggregate ['`data.group`], [1 AS data.group#225]*
>
> *+- SubqueryAlias spark_catalog.elayer.abc*
>
> * +- HiveTableRelation [`elayer`.`abc`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols:
> [keyword#226, occurence#227L], Partition Cols: []]*
>
> `data.group` with quotes is neither the name of the column or its alias
>
>
> *HTH*
>
>
>
> view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 27 Jan 2023 at 23:36, Kohki Nishio <ta...@gmail.com> wrote:
>
>> this SQL works
>>
>> select 1 as *`data.group`* from tbl group by *data.group*
>>
>>
>> Since there's no such field as *data,* I thought the SQL has to look
>> like this
>>
>> select 1 as *`data.group`* from tbl group by `*data.group`*
>>
>>
>> But that gives and error (cannot resolve '`data.group`') ... I'm no
>> expert in SQL, but feel like it's a strange behavior... does anybody have a
>> good explanation for it ?
>>
>> Thanks
>>
>> --
>> Kohki Nishio
>>
>
--
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge
+47 480 94 297
Re: Spark SQL question
Posted by Mich Talebzadeh <mi...@gmail.com>.
LOL
First one
spark-sql> select 1 as `data.group` from abc group by data.group;
1
Time taken: 0.198 seconds, Fetched 1 row(s)
means that are assigning alias data.group to select and you are using that
alias -> data.group in your group by statement
This is equivalent to
spark-sql> select 1 as `data.group` from abc group by 1;
1
With regard to your second sql
select 1 as *`data.group`* from tbl group by `*data.group`;*
*will throw an error *
*spark-sql> select 1 as `data.group` from abc group by `data.group`;*
*Error in query: cannot resolve '`data.group`' given input columns:
[spark_catalog.elayer.abc.keyword, spark_catalog.elayer.abc.occurence];
line 1 pos 43;*
*'Aggregate ['`data.group`], [1 AS data.group#225]*
*+- SubqueryAlias spark_catalog.elayer.abc*
* +- HiveTableRelation [`elayer`.`abc`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols:
[keyword#226, occurence#227L], Partition Cols: []]*
`data.group` with quotes is neither the name of the column or its alias
*HTH*
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
https://en.everybodywiki.com/Mich_Talebzadeh
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Fri, 27 Jan 2023 at 23:36, Kohki Nishio <ta...@gmail.com> wrote:
> this SQL works
>
> select 1 as *`data.group`* from tbl group by *data.group*
>
>
> Since there's no such field as *data,* I thought the SQL has to look like
> this
>
> select 1 as *`data.group`* from tbl group by `*data.group`*
>
>
> But that gives and error (cannot resolve '`data.group`') ... I'm no
> expert in SQL, but feel like it's a strange behavior... does anybody have a
> good explanation for it ?
>
> Thanks
>
> --
> Kohki Nishio
>