You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pablo Langa Blanco (Jira)" <ji...@apache.org> on 2021/08/18 23:15:00 UTC
[jira] [Commented] (SPARK-36488) "Invalid usage of '*' in
expression" error due to the feature of 'quotedRegexColumnNames' in some
scenarios.
[ https://issues.apache.org/jira/browse/SPARK-36488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401401#comment-17401401 ]
Pablo Langa Blanco commented on SPARK-36488:
--------------------------------------------
Hi [~merrily01] ,
I think these are not bugs. As you can review in the documentation ([https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html]) when you set the parameter spark.sql.parser.quotedRegexColumnNames=true “quoted identifiers (using backticks) in SELECT statement are interpreted as regular expressions and SELECT statement can take regex-based column specification”.
In case 1, this means that in the expression `tb_test`.`col_a` tb_test is treated as a regular expression that represents a column. And this syntaxis “column.field” is used to access structType columns. And in this case it is not allowed to use regular expressions.
In case 2, a regex can retrieve more than one column, for example `col_*` is resolved to col_a, col_b, so it does not make sense that the operators of a division are a list of columns, this is not allowed.
I’m going to open a PR trying to improve the error message to avoid confusion.
> "Invalid usage of '*' in expression" error due to the feature of 'quotedRegexColumnNames' in some scenarios.
> ------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-36488
> URL: https://issues.apache.org/jira/browse/SPARK-36488
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.4.8, 3.1.2
> Reporter: merrily01
> Priority: Major
>
> In some cases, the error happens when the following property is set.
> {code:java}
> spark.sql("set spark.sql.parser.quotedRegexColumnNames=true")
> {code}
> *case 1:*
> {code:java}
> spark-sql> create table tb_test as select 1 as col_a, 2 as col_b;
> spark-sql> select `tb_test`.`col_a` from tb_test;
> 1
> spark-sql> set spark.sql.parser.quotedRegexColumnNames=true;
> spark-sql> select `tb_test`.`col_a` from tb_test;
> Error in query: Invalid usage of '*' in expression 'unresolvedextractvalue'
> {code}
>
> *case 2:*
> {code:java}
> > select `col_a`/`col_b` as `col_c` from (select 3 as `col_a` , 3.14 as `col_b`);
> 0.955414
> spark-sql> set spark.sql.parser.quotedRegexColumnNames=true;
> spark-sql> select `col_a`/`col_b` as `col_c` from (select 3 as `col_a` , 3.14 as `col_b`);
> Error in query: Invalid usage of '*' in expression 'divide'
> {code}
>
> This problem exists in 3.X, 2.4.X and master versions.
>
> Related issue :
> https://issues.apache.org/jira/browse/SPARK-12139
> (As can be seen in the latest comments, some people have encountered the same problem)
>
> Similar problems:
> https://issues.apache.org/jira/browse/SPARK-28897
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org