You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/09 04:56:53 UTC

[GitHub] [spark] cloud-fan commented on a diff in pull request #37419: [SPARK-39833][SQL] Remove partition columns from data schema in the case of overlapping columns to fix Parquet DSv1 incorrect count issue

cloud-fan commented on code in PR #37419:
URL: https://github.com/apache/spark/pull/37419#discussion_r940896030


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -2777,18 +2777,24 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
     }
   }
 
-  test("SPARK-22356: overlapped columns between data and partition schema in data source tables") {
+  test("SPARK-39833: overlapped columns between data and partition schema in data source tables") {
+    // SPARK-39833 changed behaviour of the column order in the case of overlapping columns between
+    // data and partition schemas: data schema does not include partition columns anymore and the
+    // overlapping columns would appear at the end of the schema together with other partition
+    // columns.
     withTempPath { path =>
       Seq((1, 1, 1), (1, 2, 1)).toDF("i", "p", "j")
         .write.mode("overwrite").parquet(new File(path, "p=1").getCanonicalPath)
       withTable("t") {
         sql(s"create table t using parquet options(path='${path.getCanonicalPath}')")
-        // We should respect the column order in data schema.
-        assert(spark.table("t").columns === Array("i", "p", "j"))
+        // MSCK command is required now to update partitions in the catalog.
+        sql(s"msck repair table t")
+
+        assert(spark.table("t").columns === Array("i", "j", "p"))
         checkAnswer(spark.table("t"), Row(1, 1, 1) :: Row(1, 1, 1) :: Nil)
         // The DESC TABLE should report same schema as table scan.
         assert(sql("desc t").select("col_name")
-          .as[String].collect().mkString(",").contains("i,p,j"))
+          .as[String].collect().mkString(",").contains("i,j,p"))

Review Comment:
   this is a behavior change (query schema change) that is hard to accept.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org