You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanzhe Xu (Jira)" <ji...@apache.org> on 2022/06/13 03:32:00 UTC
[jira] [Updated] (SPARK-39454) failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate
[ https://issues.apache.org/jira/browse/SPARK-39454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanzhe Xu updated SPARK-39454:
------------------------------
Description:
When running a query with Iceberg:
{code:java}
spark.sql("drop table if exists catalog_returns")
spark.sql("drop table if exists catalog_sales")
spark.sql("drop table if exists date_dim")
spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")
spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")
spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")
spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True) {code}
Spark gives the following error:
{code:java}
{{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)}} {code}
This error occurs on Spark 3.2.1, but not on Spark 3.1.2.
The data files are uploaded to Google drive: [https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd]
was:
When running a query with Iceberg:
```Python
spark.sql("drop table if exists catalog_returns")
spark.sql("drop table if exists catalog_sales")
spark.sql("drop table if exists date_dim")
spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")
spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")
spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")
spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True)
```
Spark gives the following error:
```Bash
{{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)}}
```
This error occurs on Spark 3.2.1, but not on Spark 3.1.2.
The data files are uploaded to Google drive: https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd
> failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate
> ------------------------------------------------------------------------------------
>
> Key: SPARK-39454
> URL: https://issues.apache.org/jira/browse/SPARK-39454
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.1
> Environment: Spark 3.2.1, Standalone mode.
>
> Spark shell start:
> ```
> SPARK_HOME=/spark-3.2.1-bin-hadoop3.2
>
> $SPARK_HOME/bin/pyspark --master local[*] \
> --conf spark.executor.cores=12 \
> --driver-memory 40G \
> --executor-memory 10G \
> --conf spark.driver.maxResultSize=8G \
> --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \
> --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
> --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
> --conf spark.sql.catalog.spark_catalog.type=hadoop \
> --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
> --conf spark.sql.catalog.local.type=hadoop \
> --conf spark.sql.catalog.local.warehouse=$PWD/local-warehouse \
> --conf spark.sql.catalog.spark_catalog.warehouse=$PWD/spark-warehouse
> ```
> Reporter: Yanzhe Xu
> Priority: Major
>
> When running a query with Iceberg:
> {code:java}
> spark.sql("drop table if exists catalog_returns")
> spark.sql("drop table if exists catalog_sales")
> spark.sql("drop table if exists date_dim")
>
> spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
> spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
> spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
>
> spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")
> spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")
> spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")
> spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True) {code}
> Spark gives the following error:
> {code:java}
> {{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
> at scala.collection.immutable.List.map(List.scala:293)
> at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
> at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)}} {code}
>
> This error occurs on Spark 3.2.1, but not on Spark 3.1.2.
> The data files are uploaded to Google drive: [https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd]
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org