You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanzhe Xu (Jira)" <ji...@apache.org> on 2022/06/13 03:31:00 UTC

[jira] [Created] (SPARK-39454) failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate

Yanzhe Xu created SPARK-39454:
---------------------------------

             Summary: failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate
                 Key: SPARK-39454
                 URL: https://issues.apache.org/jira/browse/SPARK-39454
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.1
         Environment: Spark 3.2.1, Standalone mode.

 

Spark shell start:

```

SPARK_HOME=/spark-3.2.1-bin-hadoop3.2

 

$SPARK_HOME/bin/pyspark --master local[*] \
        --conf spark.executor.cores=12 \
        --driver-memory 40G  \
        --executor-memory 10G  \
        --conf spark.driver.maxResultSize=8G \
        --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \
        --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
        --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
        --conf spark.sql.catalog.spark_catalog.type=hadoop \
        --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
        --conf spark.sql.catalog.local.type=hadoop \
        --conf spark.sql.catalog.local.warehouse=$PWD/local-warehouse \
        --conf spark.sql.catalog.spark_catalog.warehouse=$PWD/spark-warehouse

```
            Reporter: Yanzhe Xu


When running a query with Iceberg:
```Python

spark.sql("drop table if exists catalog_returns")
spark.sql("drop table if exists catalog_sales")
spark.sql("drop table if exists date_dim")

 

spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")

 

spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")

spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")

spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")

spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True)

```

Spark gives the following error:

```Bash

{{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
	at scala.collection.immutable.List.map(List.scala:293)
	at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
	at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)}}

```

 

This error occurs on Spark 3.2.1, but not on Spark 3.1.2.

The data files are uploaded to Google drive: https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org