You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/04/05 19:51:00 UTC

[jira] [Comment Edited] (SPARK-38790) Unexpected behaviour for "IN" operator in spark when null is involved in array

    [ https://issues.apache.org/jira/browse/SPARK-38790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517674#comment-17517674 ] 

Bruce Robbins edited comment on SPARK-38790 at 4/5/22 7:50 PM:
---------------------------------------------------------------

I think that might be correct. At least, it matches the documentation.

See [here|https://spark.apache.org/docs/3.2.1/sql-ref-null-semantics.html#innot-in-subquery-].
{quote}TRUE is returned when the non-NULL value in question is found in the list

FALSE is returned when the non-NULL value is not found in the list and the list does not contain NULL values

UNKNOWN is returned when the value is NULL, or the non-NULL value is not found in the list and the list contains at least one NULL value
{quote}
For the second query ({{{}SELECT 1 in (null, 2){}}}), you hit the UNKNOWN case, which is represented by NULL.


was (Author: bersprockets):
I think that might be correct. At least, it matches the documentation.

See [here|https://spark.apache.org/docs/3.2.1/sql-ref-null-semantics.html#innot-in-subquery-].

{quote}
TRUE is returned when the non-NULL value in question is found in the list

FALSE is returned when the non-NULL value is not found in the list and the list does not contain NULL values

UNKNOWN is returned when the value is NULL, or the non-NULL value is not found in the list and the list contains at least one NULL value
{quote}
For the second case ({{SELECT 1 in (null, 2)}}), you hit the UNKNOWN case, which is represented by NULL.


> Unexpected behaviour for "IN" operator in spark when null is involved in array
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-38790
>                 URL: https://issues.apache.org/jira/browse/SPARK-38790
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.2.1
>         Environment: Tested on pyspark 3.2.1
>            Reporter: Vishnu K Suman
>            Priority: Major
>         Attachments: image-2022-04-05-22-21-27-695.png
>
>
>  
> `IN`operator in spark.sql is giving unexpected results
> 1 in (null, 1) => true
> 1 in (null, 2) = null
> I would have expected the second piece of code to throw false.
>  
>  
> {code:java}
> >>> spark.sql('SELECT 1 in (null, 1)').show()
> +----------------+
> |(1 IN (NULL, 1))|
> +----------------+
> |            true|
> +----------------+
> >>> spark.sql('SELECT 1 in (null, 2)').show()
> +----------------+
> |(1 IN (NULL, 2))|
> +----------------+
> |            null|
> +----------------+
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org