You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanzhe Xu (Jira)" <ji...@apache.org> on 2022/06/13 03:31:00 UTC
[jira] [Created] (SPARK-39454) failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate
Yanzhe Xu created SPARK-39454:
---------------------------------
Summary: failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate
Key: SPARK-39454
URL: https://issues.apache.org/jira/browse/SPARK-39454
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.2.1
Environment: Spark 3.2.1, Standalone mode.
Spark shell start:
```
SPARK_HOME=/spark-3.2.1-bin-hadoop3.2
$SPARK_HOME/bin/pyspark --master local[*] \
--conf spark.executor.cores=12 \
--driver-memory 40G \
--executor-memory 10G \
--conf spark.driver.maxResultSize=8G \
--packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
--conf spark.sql.catalog.spark_catalog.type=hadoop \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.local.type=hadoop \
--conf spark.sql.catalog.local.warehouse=$PWD/local-warehouse \
--conf spark.sql.catalog.spark_catalog.warehouse=$PWD/spark-warehouse
```
Reporter: Yanzhe Xu
When running a query with Iceberg:
```Python
spark.sql("drop table if exists catalog_returns")
spark.sql("drop table if exists catalog_sales")
spark.sql("drop table if exists date_dim")
spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")
spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")
spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")
spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True)
```
Spark gives the following error:
```Bash
{{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)}}
```
This error occurs on Spark 3.2.1, but not on Spark 3.1.2.
The data files are uploaded to Google drive: https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org