You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/25 01:02:53 UTC

[GitHub] [spark] c21 commented on a change in pull request #34103: [SPARK-32712][SQL] Support to write Hive bucketed table (Hive file formats with Hive hash)

c21 commented on a change in pull request #34103:
URL: https://github.com/apache/spark/pull/34103#discussion_r715967575



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala
##########
@@ -48,29 +49,37 @@ class BucketedWriteWithHiveSupportSuite extends BucketedWriteSuite with TestHive
     val table = "hive_bucketed_table"
 
     fileFormatsToTest.foreach { format =>
-      withTable(table) {
-        sql(
-          s"""
-             |CREATE TABLE IF NOT EXISTS $table (i int, j string)
-             |PARTITIONED BY(k string)
-             |CLUSTERED BY (i, j) SORTED BY (i) INTO 8 BUCKETS
-             |STORED AS $format
-           """.stripMargin)
+      Seq("true", "false").foreach { enableConvertMetastore =>
+        withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> enableConvertMetastore,
+          HiveUtils.CONVERT_METASTORE_ORC.key -> enableConvertMetastore) {
+          withTable(table) {
+            sql(
+              s"""
+                 |CREATE TABLE IF NOT EXISTS $table (i int, j string)
+                 |PARTITIONED BY(k string)
+                 |CLUSTERED BY (i, j) SORTED BY (i) INTO 8 BUCKETS
+                 |STORED AS $format
+               """.stripMargin)
 
-        val df =
-          (0 until 50).map(i => (i % 13, i.toString, i % 5)).toDF("i", "j", "k")
-        df.write.mode(SaveMode.Overwrite).insertInto(table)
+            val df =
+              (0 until 50).map(i => (i % 13, i.toString, i % 5)).toDF("i", "j", "k")
 
-        for (k <- 0 until 5) {
-          testBucketing(
-            new File(tableDir(table), s"k=$k"),
-            format,
-            8,
-            Seq("i", "j"),
-            Seq("i"),
-            df,
-            bucketIdExpression,
-            getBucketIdFromFileName)
+            withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") {

Review comment:
       This is added as Hive write code path enforces it - https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala#L161 .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org