You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/19 17:48:14 UTC
[GitHub] [iceberg] maximethebault opened a new issue, #6224: Spark: regression / query failure with Iceberg 1.0.0 and UNION
maximethebault opened a new issue, #6224:
URL: https://github.com/apache/iceberg/issues/6224
### Apache Iceberg version
1.0.0 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
After upgrading to Iceberg 1.0.0 & Spark 3.3.1 (from 0.13.x & 3.2.x), some of our SQL queries stopped working.
We suspect it may be a Iceberg-related issue as we couldn't reproduce the issue with Hive tables.
### Stripped-down reproducer
Set-up tables & views
```
val table1 = Seq(("204")).toDF("id")
table1.createOrReplaceTempView("table1")
val table2_1 = Seq(("204")).toDF("id")
table2_1.writeTo("dev.table2_1").using("iceberg").createOrReplace()
val table2_2 = Seq(("204")).toDF("id")
table2_2.createOrReplaceTempView("table2_2")
val table2 = spark.table("dev.table2_1").union(spark.table("table2_2"))
table2.createOrReplaceTempView("table2")
```
Run query
```
SELECT
u.*
FROM
table1
LEFT JOIN
(
SELECT
id
FROM
table1
LEFT JOIN
table2
USING(id)
) u
USING(id)
```
Results in an exception:
```
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:268)
at org.apache.spark.sql.catalyst.plans.logical.View.<init>(basicLogicalOperators.scala:569)
at org.apache.spark.sql.catalyst.plans.logical.View.copy(basicLogicalOperators.scala:568)
at org.apache.spark.sql.catalyst.plans.logical.View.withNewChildInternal(basicLogicalOperators.scala:604)
at org.apache.spark.sql.catalyst.plans.logical.View.withNewChildInternal(basicLogicalOperators.scala:565)
at org.apache.spark.sql.catalyst.trees.UnaryLike.withNewChildrenInternal(TreeNode.scala:1242)
at org.apache.spark.sql.catalyst.trees.UnaryLike.withNewChildrenInternal$(TreeNode.scala:1240)
at org.apache.spark.sql.catalyst.plans.logical.View.withNewChildrenInternal(basicLogicalOperators.scala:565)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$withNewChildren$2(TreeNode.scala:462)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:461)
at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.org$apache$spark$sql$catalyst$analysis$Analyzer$AddMetadataColumns$$addMetadataCol(Analyzer.scala:975)
at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$addMetadataCol$1(Analyzer.scala:975)
```
### Further investigation
If I replace "USING" with classical "ON" clauses, the exception is not thrown.
I think this issue is caused by the fact I'm mixing Iceberg & non-Iceberg tables in the UNION clause.
If I inline table2 in the query, I get a different exception:
```
SELECT
u.*
FROM
table1
LEFT JOIN
(
SELECT
id
FROM
table1
LEFT JOIN
((SELECT id id FROM dev.table2_1 limit 1) UNION (SELECT id FROM table2_2))
USING(id)
) u
USING(id)
```
results in:
```
org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the same number of columns, but the first table has 6 columns and the second table has 1 columns;
'Project [id#1302]
+- 'Project [id#1302, id#1302]
+- 'Project [id#1302, id#998]
+- 'Join LeftOuter, (id#998 = id#1302)
:- SubqueryAlias table1
: +- View (`table1`, [id#998])
: +- Project [value#995 AS id#998]
: +- LocalRelation [value#995]
+- 'SubqueryAlias u
+- 'Project [id#1294, id#1302]
+- 'Project [id#1294, id#1302]
+- 'Join LeftOuter, (id#1302 = id#1294)
:- SubqueryAlias table1
: +- View (`table1`, [id#1302])
: +- Project [value#1296 AS id#1302]
: +- LocalRelation [value#1296]
+- 'SubqueryAlias __auto_generated_subquery_name
+- 'Distinct
+- 'Union false, false
:- GlobalLimit 1
: +- LocalLimit 1
: +- Project [_spec_id#1297, _partition#1298, _file#1299, _pos#1300L, _deleted#1301, id#1295 AS id#1294]
: +- SubqueryAlias spark_catalog.dev.table2_1
: +- RelationV2[id#1295, _spec_id#1297, _partition#1298, _file#1299, _pos#1300L, _deleted#1301] spark_catalog.dev.table2_1
+- Project [id#1011]
+- SubqueryAlias table2_2
+- View (`table2_2`, [id#1011])
+- Project [value#1008 AS id#1011]
+- LocalRelation [value#1008]
```
It looks like some Iceberg metadata columns are visible to Spark during the query analysis and I'm not sure they are supposed to.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] maximethebault commented on issue #6224: Spark: regression / query failure with Iceberg 1.0.0 and UNION
Posted by GitBox <gi...@apache.org>.
maximethebault commented on issue #6224:
URL: https://github.com/apache/iceberg/issues/6224#issuecomment-1356353314
Thanks for investigating this issue further!
I'll go ahead and close this issue since it isn't Iceberg-related. I'll make sure to keep an eye on the Spark issue you created.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] shardulm94 commented on issue #6224: Spark: regression / query failure with Iceberg 1.0.0 and UNION
Posted by GitBox <gi...@apache.org>.
shardulm94 commented on issue #6224:
URL: https://github.com/apache/iceberg/issues/6224#issuecomment-1356083642
Hey @maximethebault!
Thanks for the report. I investigated this and found that that it is actually a bug in Spark 3.3.1+. I have created [SPARK-41557](https://issues.apache.org/jira/browse/SPARK-41557) against the Spark project to track this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] maximethebault closed issue #6224: Spark: regression / query failure with Iceberg 1.0.0 and UNION
Posted by GitBox <gi...@apache.org>.
maximethebault closed issue #6224: Spark: regression / query failure with Iceberg 1.0.0 and UNION
URL: https://github.com/apache/iceberg/issues/6224
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org