You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/15 09:51:34 UTC

[GitHub] [spark] Yikf opened a new pull request #35527: Fail early if all the columns are partitioned columns when creating a Hive table

Yikf opened a new pull request #35527:
URL: https://github.com/apache/spark/pull/35527


   ### What changes were proposed in this pull request?
   In Hive the schema and partition columns must be disjoint sets, if hive table which all columns are partitioned columns, so that other columns is empty, it will fail when Hive create table, error msg as follow:
   
   `
   throw new HiveException(
   "at least one column must be specified for the table")
   `
   That's because we did the disjoint operation in `toHiveTable`
   
   So when creating a Hive table, fail early if all the columns are partitioned columns, 
   
   ### Why are the changes needed?
   unify analysis error msg when create table with all the columns are partitioned columns
   
   ### Does this PR introduce _any_ user-facing change?
   yes, but error msg only
   
   ### How was this patch tested?
   add ut


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35527:
URL: https://github.com/apache/spark/pull/35527#discussion_r808011859



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -3043,4 +3043,10 @@ class HiveDDLSuite
       assert(df1.schema.names.toSeq == Seq("A", "B"))
     }
   }
+
+  test("SPARK-38216: Fail early if all the columns are partitioned columns") {
+    assertAnalysisError(
+      "CREATE TABLE tab (c1 int) PARTITIONED BY (c1) STORED AS PARQUET",
+      "Cannot use all columns for partition columns")

Review comment:
       what was the result of this query before this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikf commented on a change in pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
Yikf commented on a change in pull request #35527:
URL: https://github.com/apache/spark/pull/35527#discussion_r807625071



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -319,15 +319,7 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       conf.resolver)
 
     if (schema.nonEmpty && normalizedPartitionCols.length == schema.length) {
-      if (DDLUtils.isHiveTable(table)) {

Review comment:
       There doesn't seem to be any relevant information in the commit history message, It seems we did this on purpose from comment.
   
   But in [HiveClientImpl.toHiveTable](https://github.com/apache/spark/blob/1ef5638177dcf06ebca4e9b0bc88401e0fce2ae8/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L1069-L1072), we partitioned the partition cols and other cols, If all columns are partitioned columns, `hivetabl.getFields` will get an empty result, so Hive will throw an exception cols has at least one column
   
   If Hive allows cols to inherit partitioned columns, we should not do `partition` in `toHiveTable`, if not, we should fail early, I'm sorry I'm not sure about that
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikf commented on pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
Yikf commented on pull request #35527:
URL: https://github.com/apache/spark/pull/35527#issuecomment-1040070240


   Could you please take a look when you have a time, thanks in advance @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35527:
URL: https://github.com/apache/spark/pull/35527#discussion_r807534608



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -319,15 +319,7 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       conf.resolver)
 
     if (schema.nonEmpty && normalizedPartitionCols.length == schema.length) {
-      if (DDLUtils.isHiveTable(table)) {

Review comment:
       Can we check the commit history here? It seems we intentionally exclude hive table here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikf commented on a change in pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
Yikf commented on a change in pull request #35527:
URL: https://github.com/apache/spark/pull/35527#discussion_r808619702



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -3043,4 +3043,10 @@ class HiveDDLSuite
       assert(df1.schema.names.toSeq == Seq("A", "B"))
     }
   }
+
+  test("SPARK-38216: Fail early if all the columns are partitioned columns") {
+    assertAnalysisError(
+      "CREATE TABLE tab (c1 int) PARTITIONED BY (c1) STORED AS PARQUET",
+      "Cannot use all columns for partition columns")

Review comment:
       Exception `new HiveException( "at least one column must be specified for the table")` thrown by `Hive`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #35527:
URL: https://github.com/apache/spark/pull/35527


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Yikf commented on pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
Yikf commented on pull request #35527:
URL: https://github.com/apache/spark/pull/35527#issuecomment-1040070240


   Could you please take a look when you have a time, thanks in advance @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35527:
URL: https://github.com/apache/spark/pull/35527#discussion_r808011271



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -319,15 +319,7 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       conf.resolver)
 
     if (schema.nonEmpty && normalizedPartitionCols.length == schema.length) {
-      if (DDLUtils.isHiveTable(table)) {

Review comment:
       @somani can you take a look?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] somani commented on a change in pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
somani commented on a change in pull request #35527:
URL: https://github.com/apache/spark/pull/35527#discussion_r808909380



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -319,15 +319,7 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       conf.resolver)
 
     if (schema.nonEmpty && normalizedPartitionCols.length == schema.length) {
-      if (DDLUtils.isHiveTable(table)) {

Review comment:
       Sorry just got to this.
   > If Hive allows cols to inherit partitioned columns, we should not do partition in toHiveTable, if not, we should fail early, I'm sorry I'm not sure about that
   ... what do we mean by "cols to inherit partitioned columns"?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -319,15 +319,7 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       conf.resolver)
 
     if (schema.nonEmpty && normalizedPartitionCols.length == schema.length) {
-      if (DDLUtils.isHiveTable(table)) {

Review comment:
       Sorry just got to this.
   > If Hive allows cols to inherit partitioned columns, we should not do partition in toHiveTable, if not, we should fail early, I'm sorry I'm not sure about that
   
   ... what do we mean by "cols to inherit partitioned columns"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35527:
URL: https://github.com/apache/spark/pull/35527#issuecomment-1042754517


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35527: [SPARK-38216][SQL] Fail early if all the columns are partitioned columns when creating a Hive table

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35527:
URL: https://github.com/apache/spark/pull/35527#issuecomment-1041132753


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org