You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/06/05 19:50:00 UTC

[jira] [Commented] (SPARK-31563) Failure of InSet.sql for UTF8String collection

    [ https://issues.apache.org/jira/browse/SPARK-31563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127051#comment-17127051 ] 

Apache Spark commented on SPARK-31563:
--------------------------------------

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28399

> Failure of InSet.sql for UTF8String collection
> ----------------------------------------------
>
>                 Key: SPARK-31563
>                 URL: https://issues.apache.org/jira/browse/SPARK-31563
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.5, 3.0.0, 3.1.0
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Major
>             Fix For: 2.4.6, 3.0.0
>
>
> The InSet expression works on collections of internal Catalyst's types. We can see this in the optimization when In is replaced by InSet, and In's collection is evaluated to internal Catalyst's values: [https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L253-L254]
> {code:scala}
>         if (newList.length > SQLConf.get.optimizerInSetConversionThreshold) {
>           val hSet = newList.map(e => e.eval(EmptyRow))
>           InSet(v, HashSet() ++ hSet)
>         }
> {code}
> The code existed before the optimization https://github.com/apache/spark/pull/25754 that made another wrong assumption about collection types.
> If InSet accepts only internal Catalyst's types, the following code shouldn't fail:
> {code:scala}
> InSet(Literal("a"), Set("a", "b").map(UTF8String.fromString)).sql
> {code}
> but it fails with the exception:
> {code}
> Unsupported literal type class org.apache.spark.unsafe.types.UTF8String a
> java.lang.RuntimeException: Unsupported literal type class org.apache.spark.unsafe.types.UTF8String a
> 	at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:88)
> 	at org.apache.spark.sql.catalyst.expressions.InSet.$anonfun$sql$2(predicates.scala:522)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org