You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bogdan Raducanu (JIRA)" <ji...@apache.org> on 2017/06/27 13:51:00 UTC

[jira] [Created] (SPARK-21228) InSet.doCodeGen incorrect handling of structs

Bogdan Raducanu created SPARK-21228:
---------------------------------------

             Summary: InSet.doCodeGen incorrect handling of structs
                 Key: SPARK-21228
                 URL: https://issues.apache.org/jira/browse/SPARK-21228
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Bogdan Raducanu


In InSet it's possible that hset contains GenericInternalRows while child returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which will always be false in this case.

The following code reproduces the problem:
```
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 2L),named_struct('a', 3L, 'b', 3L))").show
+----+
|minA|
+----+
+----+
```
In.doCodeGen appears to be correct:
```
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it will not use InSet
+-----+
| minA|
+-----+
|[1,1]|
+-----+
```

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org