You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bogdan Raducanu (JIRA)" <ji...@apache.org> on 2017/06/01 09:20:04 UTC

[jira] [Comment Edited] (SPARK-20744) Predicates with multiple columns do not work

    [ https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032679#comment-16032679 ] 

Bogdan Raducanu edited comment on SPARK-20744 at 6/1/17 9:19 AM:
-----------------------------------------------------------------

Array generally needs all components to be same type. Casts are added automatically but it's not always possible:

{code}
sql("select array(now(), 1)").show
{code}

{code}
org.apache.spark.sql.AnalysisException: cannot resolve 'array(current_timestamp(), 1)' due to data type mismatch: input to function array should all be the same type, but it's [timestamp, int]; line 1 pos 7;
{code}


was (Author: bograd):
Array generally needs all components to be same type. Casts are added automatically but it's not always possible:

```sql("select array(now(), 1)").show```

```org.apache.spark.sql.AnalysisException: cannot resolve 'array(current_timestamp(), 1)' due to data type mismatch: input to function array should all be the same type, but it's [timestamp, int]; line 1 pos 7;```

> Predicates with multiple columns do not work
> --------------------------------------------
>
>                 Key: SPARK-20744
>                 URL: https://issues.apache.org/jira/browse/SPARK-20744
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: Arguments must be same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>    +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 1),named_struct(col1, 2, col2, 2))
>    +- SubqueryAlias tab1
>       +- Project [id#47L AS a#50L, id#47L AS b#51L]
>          +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Other examples:
> {code}
> scala> sql("select * from tab1 where (a,b) =(1,1)").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' (struct<a:bigint,b:bigint> and struct<col1:int,col2:int>).; line 1 pos 25;
> 'Project [*]
> +- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
>    +- SubqueryAlias tab1
>       +- Project [id#47L AS a#50L, id#47L AS b#51L]
>          +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Expressions such as (1,1) are apparently read as structs and then the types do not match. Perhaps they should be arrays.
> The following code works:
> {code}
> sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
> {code}
> This also works, but requires the cast:
> {code}
> sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint), 'b', cast(1 as bigint)))").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org