You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pablo Langa Blanco (Jira)" <ji...@apache.org> on 2021/08/18 23:15:00 UTC

[jira] [Commented] (SPARK-36488) "Invalid usage of '*' in expression" error due to the feature of 'quotedRegexColumnNames' in some scenarios.

    [ https://issues.apache.org/jira/browse/SPARK-36488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401401#comment-17401401 ] 

Pablo Langa Blanco commented on SPARK-36488:
--------------------------------------------

Hi [~merrily01] ,

I think these are not bugs. As you can review in the documentation ([https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html]) when you set the parameter spark.sql.parser.quotedRegexColumnNames=true “quoted identifiers (using backticks) in SELECT statement are interpreted as regular expressions and SELECT statement can take regex-based column specification”.

In case 1, this means that in the expression `tb_test`.`col_a`  tb_test is treated as a regular expression that represents a column. And this syntaxis  “column.field” is used to access structType columns. And in this case it is not allowed to use regular expressions.

In case 2, a regex can retrieve more than one column, for example `col_*` is resolved to col_a, col_b, so it does not make sense that the operators of a division are a list of columns, this is not allowed.

I’m going to open a PR trying to improve the error message to avoid confusion.

> "Invalid usage of '*' in expression" error due to the feature of 'quotedRegexColumnNames' in some scenarios.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-36488
>                 URL: https://issues.apache.org/jira/browse/SPARK-36488
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.4.8, 3.1.2
>            Reporter: merrily01
>            Priority: Major
>
>  In some cases, the error happens when the following property is set.
> {code:java}
> spark.sql("set spark.sql.parser.quotedRegexColumnNames=true")
> {code}
> *case 1:* 
> {code:java}
> spark-sql> create table tb_test as select 1 as col_a, 2 as col_b;
> spark-sql> select `tb_test`.`col_a`  from tb_test;
> 1
> spark-sql> set spark.sql.parser.quotedRegexColumnNames=true;
> spark-sql> select `tb_test`.`col_a`  from tb_test;
> Error in query: Invalid usage of '*' in expression 'unresolvedextractvalue'
> {code}
>  
> *case 2:*
> {code:java}
>          > select `col_a`/`col_b` as `col_c` from (select 3 as `col_a` ,  3.14 as `col_b`);
> 0.955414
> spark-sql> set spark.sql.parser.quotedRegexColumnNames=true;
> spark-sql> select `col_a`/`col_b` as `col_c` from (select 3 as `col_a` ,  3.14 as `col_b`);
> Error in query: Invalid usage of '*' in expression 'divide'
> {code}
>  
> This problem exists in 3.X, 2.4.X and master versions. 
>  
> Related issue : 
> https://issues.apache.org/jira/browse/SPARK-12139
> (As can be seen in the latest comments, some people have encountered the same problem)
>  
> Similar problems:
> https://issues.apache.org/jira/browse/SPARK-28897
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org