You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shardul Mahadik (Jira)" <ji...@apache.org> on 2022/12/17 07:20:00 UTC

[jira] [Created] (SPARK-41557) Union of tables with and without metadata column fails when used in join

Shardul Mahadik created SPARK-41557:
---------------------------------------

             Summary: Union of tables with and without metadata column fails when used in join
                 Key: SPARK-41557
                 URL: https://issues.apache.org/jira/browse/SPARK-41557
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.2, 3.4.0
            Reporter: Shardul Mahadik


Here is a test case that can be added to {{MetadataColumnSuite}} to demonstrate the issue
{code:scala}
    test("SPARK-XXXXX: Union of tables with and without metadata column should work") {
    withTable(tbl) {
      sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
      checkAnswer(
        spark.sql(
          s"""
            SELECT b.*
            FROM RANGE(1)
              LEFT JOIN (
                SELECT id FROM $tbl
                UNION ALL
                SELECT id FROM RANGE(10)
              ) b USING(id)
          """),
        Seq(Row(0))
      )
    }
  }
 {code}

Here a table with metadata columns {{$tbl}} is unioned with a table without metdata columns {{RANGE(10)}}. If this result is later used in a join, query analysis fails saying mismatch in the number of columns of the union caused by the metadata columns. However, here we can see that we explicitly project only one column during the union, so the union should be valid.

{code}
org.apache.spark.sql.AnalysisException: [NUM_COLUMNS_MISMATCH] UNION can only be performed on inputs with the same number of columns, but the first input has 3 columns and the second input has 1 columns.; line 5 pos 16;
'Project [id#26L]
+- 'Project [id#26L, id#26L]
   +- 'Project [id#28L, id#26L]
      +- 'Join LeftOuter, (id#28L = id#26L)
         :- Range (0, 1, step=1, splits=None)
         +- 'SubqueryAlias b
            +- 'Union false, false
               :- Project [id#26L, index#30, _partition#31]
               :  +- SubqueryAlias testcat.t
               :     +- RelationV2[id#26L, data#27, index#30, _partition#31] testcat.t testcat.t
               +- Project [id#29L]
                  +- Range (0, 10, step=1, splits=None)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org