You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shawn Lavelle (JIRA)" <ji...@apache.org> on 2017/02/24 20:26:44 UTC

[jira] [Created] (SPARK-19731) IN Operator should support arrays

Shawn Lavelle created SPARK-19731:
-------------------------------------

             Summary: IN Operator should support arrays
                 Key: SPARK-19731
                 URL: https://issues.apache.org/jira/browse/SPARK-19731
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.1.0, 2.0.0, 1.6.2
            Reporter: Shawn Lavelle
            Priority: Minor


When the column type and array member type match, the IN operator should still operate on the array. This is useful for UDFs and Predicate SubQueries that return arrays.  

(This isn't necessarily extensible to all collections, but certainly applies to arrays.)

Example:
select 5 in array(1,2,3) Should return false instead of parseException, since the type of the array and the type of the column match.

create table test (val int);
insert into test values (1);
select * from test;
+------+--+
| val  |
+------+--+
| 1    |
+------+--+
*select val from test where array_contains(array(1,2,3), val);*
+------+--+
| val  |
+------+--+
| 1    |
+------+--+

{panel}
*select val from test where val in (array(1,2,3));*
Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` IN (array(1, 2, 3)))' due to data type mismatch: Arguments must be same type; line 1 pos 31;
'Project ['val]
+- 'Filter val#433 IN (array(1, 2, 3))
   +- MetastoreRelation test (state=,code=0)
{panel}

{panel}
*select val from test where val in (select array(1,2,3));*
Error: org.apache.spark.sql.AnalysisException: cannot resolve '(test.`val` = `array(1, 2, 3)`)' due to data type mismatch: differing types in '(test.`val` = `array(1, 2, 3)`)' (int and array<int>).;;
'Project ['val]
+- 'Filter predicate-subquery#434 [(val#435 = array(1, 2, 3)#436)]
   :  +- Project [array(1, 2, 3) AS array(1, 2, 3)#436]
   :     +- OneRowRelation$
   +- MetastoreRelation test (state=,code=0)
{panel}
{panel}
*select val from test where val in (select explode(array(1,2,3)));*
+------+--+
| val  |
+------+--+
| 1    |
+------+--+

Note: See [SPARK-19730|https://issues.apache.org/jira/browse/SPARK-19730] for how a predicate subquery breaks when applied to the DataSourceAPI
{panel}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org