You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxime Thébault (Jira)" <ji...@apache.org> on 2022/12/22 00:32:00 UTC
[jira] [Comment Edited] (SPARK-41557) Union of tables with and without metadata column fails when used in join

    [ https://issues.apache.org/jira/browse/SPARK-41557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651026#comment-17651026 ] 

Maxime Thébault edited comment on SPARK-41557 at 12/22/22 12:31 AM:
--------------------------------------------------------------------

Might be related to (and fixed by) SPARK-41660?

SPARK-41498 is also related to metadata columns + union


was (Author: JIRAUSER279874):
Might be related to (and fixed by) SPARK-41660?

> Union of tables with and without metadata column fails when used in join
> ------------------------------------------------------------------------
>
>                 Key: SPARK-41557
>                 URL: https://issues.apache.org/jira/browse/SPARK-41557
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.2, 3.4.0
>            Reporter: Shardul Mahadik
>            Priority: Major
>
> Here is a test case that can be added to {{MetadataColumnSuite}} to demonstrate the issue
> {code:scala}
>   test("SPARK-41557: Union of tables with and without metadata column should work") {
>     withTable(tbl) {
>       sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
>       checkAnswer(
>         spark.sql(
>           s"""
>             SELECT b.*
>             FROM RANGE(1)
>               LEFT JOIN (
>                 SELECT id FROM $tbl
>                 UNION ALL
>                 SELECT id FROM RANGE(10)
>               ) b USING(id)
>           """),
>         Seq(Row(0))
>       )
>     }
>   }
>  {code}
> Here a table with metadata columns {{$tbl}} is unioned with a table without metdata columns {{RANGE(10)}}. If this result is later used in a join, query analysis fails saying mismatch in the number of columns of the union caused by the metadata columns. However, here we can see that we explicitly project only one column during the union, so the union should be valid.
> {code}
> org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only be performed on inputs with the same number of columns, but the first input has 3 columns and the second input has 1 columns.; line 5 pos 16;
> 'Project [id#26L]
> +- 'Project [id#26L, id#26L]
>    +- 'Project [id#28L, id#26L]
>       +- 'Join LeftOuter, (id#28L = id#26L)
>          :- Range (0, 1, step=1, splits=None)
>          +- 'SubqueryAlias b
>             +- 'Union false, false
>                :- Project [id#26L, index#30, _partition#31]
>                :  +- SubqueryAlias testcat.t
>                :     +- RelationV2[id#26L, data#27, index#30, _partition#31] testcat.t testcat.t
>                +- Project [id#29L]
>                   +- Range (0, 10, step=1, splits=None)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org