You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanzhe Xu (Jira)" <ji...@apache.org> on 2022/06/13 03:32:00 UTC

[jira] [Updated] (SPARK-39454) failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate

     [ https://issues.apache.org/jira/browse/SPARK-39454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yanzhe Xu updated SPARK-39454:
------------------------------
    Description: 
When running a query with Iceberg:


{code:java}
spark.sql("drop table if exists catalog_returns")
spark.sql("drop table if exists catalog_sales")
spark.sql("drop table if exists date_dim")
 
spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
 
spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")
spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")
spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")

spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True) {code}
Spark gives the following error:
{code:java}
{{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)}} {code}
 

This error occurs on Spark 3.2.1, but not on Spark 3.1.2.

The data files are uploaded to Google drive: [https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd]

 

 

  was:
When running a query with Iceberg:
```Python

spark.sql("drop table if exists catalog_returns")
spark.sql("drop table if exists catalog_sales")
spark.sql("drop table if exists date_dim")

 

spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")

 

spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")

spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")

spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")

spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True)

```

Spark gives the following error:

```Bash

{{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
	at scala.collection.immutable.List.map(List.scala:293)
	at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
	at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)}}

```

 

This error occurs on Spark 3.2.1, but not on Spark 3.1.2.

The data files are uploaded to Google drive: https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd

 

 


> failed to convert LogicalPlan to SparkPlan when subquery exists after "IN" predicate
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-39454
>                 URL: https://issues.apache.org/jira/browse/SPARK-39454
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.1
>         Environment: Spark 3.2.1, Standalone mode.
>  
> Spark shell start:
> ```
> SPARK_HOME=/spark-3.2.1-bin-hadoop3.2
>  
> $SPARK_HOME/bin/pyspark --master local[*] \
>         --conf spark.executor.cores=12 \
>         --driver-memory 40G  \
>         --executor-memory 10G  \
>         --conf spark.driver.maxResultSize=8G \
>         --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1 \
>         --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
>         --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
>         --conf spark.sql.catalog.spark_catalog.type=hadoop \
>         --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
>         --conf spark.sql.catalog.local.type=hadoop \
>         --conf spark.sql.catalog.local.warehouse=$PWD/local-warehouse \
>         --conf spark.sql.catalog.spark_catalog.warehouse=$PWD/spark-warehouse
> ```
>            Reporter: Yanzhe Xu
>            Priority: Major
>
> When running a query with Iceberg:
> {code:java}
> spark.sql("drop table if exists catalog_returns")
> spark.sql("drop table if exists catalog_sales")
> spark.sql("drop table if exists date_dim")
>  
> spark.read.parquet("catalog_returns_repro").createOrReplaceTempView("temp_catalog_returns")
> spark.read.parquet("catalog_sales_repro").createOrReplaceTempView("temp_catalog_sales")
> spark.read.parquet("date_dim_repro").createOrReplaceTempView("temp_date_dim")
>  
> spark.sql("create table if not exists catalog_returns using iceberg partitioned by (cr_returned_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_returns")
> spark.sql("create table if not exists catalog_sales using iceberg partitioned by (cs_sold_date_sk) tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_catalog_sales")
> spark.sql("create table if not exists date_dim using iceberg tblproperties('write.parquet.compression-codec' = 'snappy') as select * from temp_date_dim")
> spark.sql("delete from catalog_returns where cr_order_number in (select cs_order_number from catalog_sales, date_dim where cs_sold_date_sk=d_date_sk and d_date between '2000-05-20' and '2000-05-21');").explain(True) {code}
> Spark gives the following error:
> {code:java}
> {{ java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.execution.SparkPlan
> at scala.collection.immutable.List.map(List.scala:293)
> at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:75)
> at org.apache.spark.sql.execution.SparkPlanInfo$.$anonfun$fromSparkPlan$3(SparkPlanInfo.scala:75)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> at scala.collection.Iterator.foreach(Iterator.scala:943)
> at scala.collection.Iterator.foreach$(Iterator.scala:943)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)}} {code}
>  
> This error occurs on Spark 3.2.1, but not on Spark 3.1.2.
> The data files are uploaded to Google drive: [https://drive.google.com/drive/folders/1kgZ3MjO5tSyUxzQl4oN2I0rH7SvBYbJd]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org