You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/15 08:50:03 UTC

[GitHub] [spark] LantaoJin opened a new pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

LantaoJin opened a new pull request #28833:
URL: https://github.com/apache/spark/pull/28833


   ## What changes were proposed in this pull request?
   
   Spark-sql do not support for void column datatype of view
   
   Create a HIVE view:
   >hive> create table bad as select 1 x, null z from dual;
   
   Because there's no type, Hive gives it the VOID type:
   >hive> describe bad;
   OK
   x	int	
   z	void
   
   In Spark2.0.x, the behaviour to read this view is normal:
   >spark-sql> describe bad;
   x       int     NULL
   z       void    NULL
   Time taken: 4.431 seconds, Fetched 2 row(s)
   
   But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type string: void
   
   >spark-sql> describe bad;
   17/05/09 03:12:08 INFO execution.SparkSqlParser: Parsing command: describe bad
   17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: int
   17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: void
   17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
   org.apache.spark.SparkException: Cannot recognize hive type string: void
           at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789)
           at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)  
           at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)  
           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
           at scala.collection.Iterator$class.foreach(Iterator.scala:893)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
           at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
           at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
           at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
           at scala.collection.AbstractTraversable.map(Traversable.scala:104)
           at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365)
           at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361)
   Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
   DataType void() is not supported.(line 1, pos 0)
   == SQL ==  
   void       
   ^^^
           ... 61 more
   org.apache.spark.SparkException: Cannot recognize hive type string: void
   
   
   
   ## How was this patch tested?
   
   Add tests
   
   Also can manual tests
   >spark-sql> describe bad;
   x int NULL
   z null NULL
   Time taken: 0.486 seconds, Fetched 2 row(s)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-660464702


   I have created a [SPARK-32356/#29152](https://github.com/apache/spark/pull/29152) to forbid this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644277309


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644091950


   **[Test build #124046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124046/testReport)** for PR 28833 at commit [`3b8ddec`](https://github.com/apache/spark/commit/3b8ddecc7a1498e1e430be7de1ae76123b269454).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644460954


   **[Test build #124080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124080/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654359272


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r451248585



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2310,126 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for unknown column datatype") {
+    withTable("t") {
+      withView("tabUnknownType") {
+        hiveClient.runSqlHive("CREATE TABLE t (t1 int)")
+        hiveClient.runSqlHive("INSERT INTO t VALUES (3)")
+        hiveClient.runSqlHive("CREATE VIEW tabUnknownType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabUnknownType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabUnknownType").collect().toSeq
+        assert(desc.contains(Row("col", NullType.simpleString, null)))
+      }
+    }
+
+    // Forbid CTAS with unknown type
+    withTable("t1", "t2", "t3") {
+      val e1 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t1 USING PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e1.contains("Cannot create tables with unknown type"))
+
+      val e2 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t2 AS SELECT null as null_col")

Review comment:
       can we use `STORE AS` to create hive table explicitly?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654548138






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-657383898


   Hi, @ulysses-you . We already choose the plan. This is a step to forbid that gracefully.
   For `create view v1 as select null as col`, we can add an `AnalysisException` if you want.
   Could you file a JIRA for that?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654274016






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-648582548


   Can we have another PR to forbid creating tables with void type?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r451107365



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2310,126 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        hiveClient.runSqlHive("CREATE TABLE t (t1 int)")
+        hiveClient.runSqlHive("INSERT INTO t VALUES (3)")
+        hiveClient.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", NullType.simpleString, null)))
+      }
+    }
+
+    // Forbid CTAS with null type
+    withTable("t1", "t2", "t3") {
+      val e1 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t1 USING PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e1.contains("Cannot create tables with unknown type"))
+
+      val e2 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t2 AS SELECT null as null_col")
+      }.getMessage
+      assert(e2.contains("Cannot create tables with unknown type"))
+
+      val e3 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t3 STORED AS PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e3.contains("Cannot create tables with unknown type"))
+    }
+
+    // Forbid Replace table AS SELECT with null type
+    withTable("t") {
+      val v2Source = classOf[FakeV2Provider].getName
+      val e = intercept[AnalysisException] {
+        spark.sql(s"CREATE OR REPLACE TABLE t USING $v2Source AS SELECT null as null_col")
+      }.getMessage
+      assert(e.contains("Cannot create tables with unknown type"))
+    }
+
+    // Forbid creating table with VOID type in Spark

Review comment:
       For this line and the following, it looks correct because we parse `VOID` in AstBuilder.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654267118


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655234096


   Merged to master. Thank you for your patience, @LantaoJin .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-653933700


   **[Test build #124961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124961/testReport)** for PR 28833 at commit [`98d12fe`](https://github.com/apache/spark/commit/98d12fefd7957fc6880fc81f075f4cb77b7f5b62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448153666



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2211,6 +2211,8 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
         DecimalType(precision.getText.toInt, 0)
       case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) =>
         DecimalType(precision.getText.toInt, scale.getText.toInt)
+      case ("void", Nil) => NullType
+      case ("null", Nil) => NullType

Review comment:
       Em, you are right. I will remove this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-643996171






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644460954


   **[Test build #124080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124080/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448205714



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", "null", null)))
+      }
+    }
+
+    // Forbid CTAS with null type
+    withTable("t1", "t2", "t3") {
+      val e1 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t1 USING PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e1.contains("Cannot create tables with VOID type"))
+
+      val e2 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t2 AS SELECT null as null_col")
+      }.getMessage
+      assert(e2.contains("Cannot create tables with VOID type"))
+
+      val e3 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t3 STORED AS PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e3.contains("Cannot create tables with VOID type"))
+    }
+
+    // Forbid creating table with void/null type in Spark
+    Seq("void", "null").foreach { colType =>

Review comment:
       Symbol "null" is not a data type. `Seq("void", "null").foreach { colType =>` is incorrect, I have removed it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655277714


   Re: https://github.com/apache/spark/pull/28833#discussion_r448165084
   
   Sorry I read the comments just now. So the decision here is we allow to parse `void` as `NullType` but doesn't allow it in some commands like `CREATE TABLE`s.
   
   How about other cases when we directly use DDL-formatted string as its type? These simple type strings can be used in many places such as `from_csv`, `from_json`, `createDataFrame`, etc. However, `StructType.simpleString` cannot still be parsed as the valid types.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644866064






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652328201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-653933791






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r449937054



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       Yes. We throw exception in command like creating table.
   
   > One question, if we don't allow void/unknown type, is there good reason to accept in parser?
   
   VOID is a legacy Hive table type which may be read by Spark. So in #28935, I map `case ("void", Nil) => HiveVoidType` in Parser, but https://github.com/apache/spark/pull/28935/files#r446771337 suggested me to reuse `NullType`. So the syntax seems we accepted void/unknown in parser. Actually, first, we accept the legacy Hive type VOID in parser and two, we fobid creating table with void/null.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654274016






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440601141



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
##########
@@ -61,6 +61,7 @@ class DataTypeParserSuite extends SparkFunSuite {
   checkDataType("varchAr(20)", StringType)
   checkDataType("cHaR(27)", StringType)
   checkDataType("BINARY", BinaryType)
+  checkDataType("void", NullType)

Review comment:
       I don't know what is needed, in other word I don't know what is needed to be compatible with Hive. It may be the reason why this thing was left. 
   
   There are 2 different things between Spark and Hive.
   1. Hive support both table and view with `Void` type, like `create table t as select null as c`, `create view v as select null as c`. Spark doesn't support both.
   2. For null value type, Hive use `Void` and Spark use `NullType`
   
   As @cloud-fan said, we shouldn't support `NullType` for tables. But may compatible with Hive in some DDLs like `desc table`, `show create talbe` ?
   
   I think we should discuss what things to be compatible with Hive first and then move.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644574858


   **[Test build #124098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124098/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-659086163


   I guess we can forbid that too consistently as a continuation of this approach.
   BTW, until now, it's beyond of the scope because this PR was designed to prevent Hive void type.
   Since Apache Spark doesn't talk to Apache Hive Metastore in case of `in-memory` catalog, other PMC member may has a different opinion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652114014


   **[Test build #124706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124706/testReport)** for PR 28833 at commit [`5aa4c1a`](https://github.com/apache/spark/commit/5aa4c1ae4ab9f67d947dcc33f7e1ff07a1d22858).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652400790






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r613037609



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -268,6 +271,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ CreateTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       @LantaoJin do you have time to fix it? I think we can simply remove the null type check and add a few tests with both in-memory and hive catalog.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] bart-samwel commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

bart-samwel commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r613026568



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -268,6 +271,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ CreateTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       > @bart-samwel this makes sense, shall we also support `CREATE TABLE t(c VOID)`? Your case seems like CTAS only.
   
   I think the `CREATE TABLE` case with explicit types is not very useful, but it could be useful if there were tools that get a table's schema and then try to recreate it, e.g. for mocking purposes. Probably best to be orthogonal here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654287916






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654272985


   **[Test build #125091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125091/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655298519


   Thanks guys, sure. I will make a followup.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654188351


   **[Test build #125074 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125074/testReport)** for PR 28833 at commit [`98d12fe`](https://github.com/apache/spark/commit/98d12fefd7957fc6880fc81f075f4cb77b7f5b62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652421476






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652150842


   **[Test build #124727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124727/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652211295






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652160085


   **[Test build #124734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124734/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448156640



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -102,6 +105,7 @@ class ResolveSessionCatalog(
          nameParts @ SessionCatalogAndTable(catalog, tbl), _, _, _, _, _) =>
       loadTable(catalog, tbl.asIdentifier).collect {
         case v1Table: V1Table =>
+          a.dataType.foreach(failNullType)

Review comment:
       done

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/NullType.scala
##########
@@ -38,4 +38,12 @@ class NullType private() extends DataType {
  * @since 1.3.0
  */
 @Stable
-case object NullType extends NullType
+case object NullType extends NullType {
+
+  def containsNullType(dt: DataType): Boolean = dt match {

Review comment:
       done

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -270,6 +275,7 @@ class ResolveSessionCatalog(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
       val provider = c.provider.getOrElse(conf.defaultDataSourceName)
       if (!isV2Provider(provider)) {
+        assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440055230



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2211,6 +2211,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
         DecimalType(precision.getText.toInt, 0)
       case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) =>
         DecimalType(precision.getText.toInt, scale.getText.toInt)
+      case ("void", Nil) => NullType

Review comment:
       If change here and `DataType`, spark will also support `void` type for table. Is it needed ?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2211,6 +2211,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
         DecimalType(precision.getText.toInt, 0)
       case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) =>
         DecimalType(precision.getText.toInt, scale.getText.toInt)
+      case ("void", Nil) => NullType

Review comment:
       If change here and `NullType`, spark will also support `void` type for table. Is it needed ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r451106635



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2310,126 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {

Review comment:
       Could you adjust this test case according to the last commit? Specifically, for the following?
   - `void column datatype` -> `unknown column datatype`
   - `tabVoidType` -> `tabUnknownType`
   - `Forbid CTAS with null type` -> `Forbid CTAS with unknown type`
   - `Forbid Replace table AS SELECT with null type` -> `Forbid Replace table AS SELECT with unknown type`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652150894






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448111898



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", "null", null)))

Review comment:
       shall we change `NullType.toString` to use void? to match the parser side.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448115666



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", "null", null)))
+      }
+    }
+
+    // Forbid CTAS with null type
+    withTable("t1", "t2", "t3") {
+      val e1 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t1 USING PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e1.contains("Cannot create tables with VOID type"))
+
+      val e2 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t2 AS SELECT null as null_col")
+      }.getMessage
+      assert(e2.contains("Cannot create tables with VOID type"))
+
+      val e3 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t3 STORED AS PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e3.contains("Cannot create tables with VOID type"))
+    }
+
+    // Forbid creating table with void/null type in Spark
+    Seq("void", "null").foreach { colType =>
+      withTable("t1", "t2", "t3") {
+        val e1 = intercept[AnalysisException] {
+          spark.sql(s"CREATE TABLE t1 (v $colType) USING parquet")
+        }.getMessage
+        assert(e1.contains("Cannot create tables with VOID type"))
+        val e2 = intercept[AnalysisException] {
+          spark.sql(s"CREATE TABLE t2 (v $colType) USING hive")

Review comment:
       can we follow the CTAS test and use `STORED AS PARQUET`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448110252



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2211,6 +2211,8 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
         DecimalType(precision.getText.toInt, 0)
       case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) =>
         DecimalType(precision.getText.toInt, scale.getText.toInt)
+      case ("void", Nil) => NullType
+      case ("null", Nil) => NullType

Review comment:
       I'm not sure about this. `null` is also a literal syntax, and this may introduce ambiguity if `null` is also a type name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-653933187


   Retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652139718


   @cloud-fan could you help to review this PR again? I refator it to address all comments in #28935 , I think #28935 is no needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448110841



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -102,6 +105,7 @@ class ResolveSessionCatalog(
          nameParts @ SessionCatalogAndTable(catalog, tbl), _, _, _, _, _) =>
       loadTable(catalog, tbl.asIdentifier).collect {
         case v1Table: V1Table =>
+          a.dataType.foreach(failNullType)

Review comment:
       this can be done before the `loadTable` call.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-651994040


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124668/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644537057






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440161409



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/NullType.scala
##########
@@ -32,6 +32,11 @@ class NullType private() extends DataType {
   override def defaultSize: Int = 1
 
   private[spark] override def asNullable: NullType = this
+
+  /**
+   * Readable string representation for NULL type.
+   */
+  override def simpleString: String = "void"

Review comment:
       added, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655242039


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125270/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448173447



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       I will check some database system to see their behaviours, wait a minute.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448150865



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
##########
@@ -106,7 +107,7 @@ class ResolveHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] {
       } else {
         withStorage
       }
-
+      assertNoNullTypeInSchema(withSchema.schema)

Review comment:
       This can be removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448170235



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       @cloud-fan ask me to use VOID from NULL. Personally I agree to use VOID even through NullType is a Type but "null" has a literal syntax. Every time when we see a "null", we need to finger out it's a data type or literal. It's a little ambiguity. So @cloud-fan also suggests me to change the simpleString of `NullType` to "void". So make the symbol "null" only be literal.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654564158






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-659183541


   I don't think it's a good idea to diverge the behavior between in-memory and hive catalogs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654000747


   **[Test build #124961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124961/testReport)** for PR 28833 at commit [`98d12fe`](https://github.com/apache/spark/commit/98d12fefd7957fc6880fc81f075f4cb77b7f5b62).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-648610855


   I think I can reuse this PR to do that as this patch is no need anymore.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644864904


   **[Test build #124121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124121/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652400021


   **[Test build #124749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124749/testReport)** for PR 28833 at commit [`de08967`](https://github.com/apache/spark/commit/de08967458e9e1f12c84d9e95265684a47aa4789).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654267127


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125087/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-653933700


   **[Test build #124961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124961/testReport)** for PR 28833 at commit [`98d12fe`](https://github.com/apache/spark/commit/98d12fefd7957fc6880fc81f075f4cb77b7f5b62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-648515582


   Success in Hive:
   ```sql
   create table t (col1 struct<name:STRING, id: BIGINT>);
   create table t (col1 array<STRING>);
   ```
   Fail with `NoViableAltException` in Hive:
   ```sql
   create table t (col1 struct<name:VOID, id: BIGINT>);
   create table t (col1 array<VOID>);
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652328201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-657383898


   Hi, @ulysses-you . We already choose the plan. This is a step to forbid that gracefully.
   For `create view v1 as select null as col`, we can add an `AnalysisException` if you want. Could you try it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r449779092



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       `unknown` looks okay to me. Since `null` is mainly used to represent a literal in spark, I think its better to avoid using it for data types. Also, I think `void` is a word that is rarely used in relational databases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652138498






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448138143



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,17 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    if (NullType.containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")
+    }
+  }
+
+  def assertNoNullTypeInSchema(schema: StructType): Unit = {
+    schema.foreach { f =>
+      failNullType(CatalystSqlParser.parseDataType(schema.catalogString))

Review comment:
       Ah, yes. Remove `CatalystSqlParser.parseDataType(schema.catalogString)` first could also remove `case ("null", Nil) => NullType` then.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652159352


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-651992192


   **[Test build #124668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124668/testReport)** for PR 28833 at commit [`5aa4c1a`](https://github.com/apache/spark/commit/5aa4c1ae4ab9f67d947dcc33f7e1ff07a1d22858).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654252285






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654287916


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448171612



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       But I am not sure the behaviour in other database managements. Maybe `NULL` is a data type and `null` is a literal?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654359272






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448157276



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
##########
@@ -106,7 +107,7 @@ class ResolveHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] {
       } else {
         withStorage
       }
-
+      assertNoNullTypeInSchema(withSchema.schema)

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448110440



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,17 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    if (NullType.containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")
+    }
+  }
+
+  def assertNoNullTypeInSchema(schema: StructType): Unit = {
+    schema.foreach { f =>
+      failNullType(CatalystSqlParser.parseDataType(schema.catalogString))

Review comment:
       shouldn't this be `failNullType(f.dataType)`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654684533


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654644077


   **[Test build #125164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125164/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448170235



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       @cloud-fan ask me to use VOID from NULL. Personally I agree to use VOID even through NullType is a  data type class in Spark, but the symbol "null" has a literal syntax. Every time when we see a "null", we need to finger out it's a data type or literal. It's a little ambiguity. So @cloud-fan also suggests me to change the simpleString of `NullType` to "void". So make the symbol "null" only be literal.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655292960


   `NullType` is a stable public class, I don't think we can drop it.
   
   The intention is to only allow parsing `NullType` for the type string of legacy hive tables. But @HyukjinKwon is right that it also affects places like `from_csv`. Let's revert this part and think of a better solution.
   
   We don't document NullType in SQL reference. I think it's better to hide NullType from end-users. It's usually type-coercioned to other official types, and this PR forbids `NullType` if it leaks to the end (top columns). `df.show` is still OK to have `NullType` though. I agree that `NullType.simpleString` update can be put in a separate PR and discussed separately.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644123442






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448149646



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -292,6 +293,8 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       "in the table definition of " + table.identifier,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    assertNoNullTypeInSchema(schema)

Review comment:
       With out this, "CREATE TABLE t1 USING PARQUET AS SELECT null as null_col" will throw "Parquet data source does not support null data type." instead of "Cannot create tables with VOID type"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654363187






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654685313






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-653159448






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652138164


   **[Test build #124720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124720/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448175519



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       Thank you for checking, @LantaoJin . This is also confusing to me. From Hive side, `VOID` is a type name. But, Apache Spark side, 
   - We have `null` literal: https://spark.apache.org/docs/latest/sql-ref-literals.html#null-literal
   - We have only `nullable` types: https://spark.apache.org/docs/latest/sql-ref-datatypes.html




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448149646



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -292,6 +293,8 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       "in the table definition of " + table.identifier,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    assertNoNullTypeInSchema(schema)

Review comment:
       Without this, "CREATE TABLE t1 USING PARQUET AS SELECT null as null_col" will throw "Parquet data source does not support null data type." instead of "Cannot create tables with VOID type"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654924727






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440055230



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2211,6 +2211,7 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
         DecimalType(precision.getText.toInt, 0)
       case ("decimal" | "dec" | "numeric", precision :: scale :: Nil) =>
         DecimalType(precision.getText.toInt, scale.getText.toInt)
+      case ("void", Nil) => NullType

Review comment:
       If change here, spark will also support `void` type for table. Is it needed ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654685313






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652518646


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124786/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644680233


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652146290






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652381500


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652939092


   **[Test build #124894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124894/testReport)** for PR 28833 at commit [`98d12fe`](https://github.com/apache/spark/commit/98d12fefd7957fc6880fc81f075f4cb77b7f5b62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun closed pull request #28833:
URL: https://github.com/apache/spark/pull/28833


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644574980


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124098/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652518335


   **[Test build #124786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124786/testReport)** for PR 28833 at commit [`d6f1a4b`](https://github.com/apache/spark/commit/d6f1a4b59316db4f678554e90c795730dce3989c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448206284



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", "null", null)))
+      }
+    }
+
+    // Forbid CTAS with null type
+    withTable("t1", "t2", "t3") {
+      val e1 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t1 USING PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e1.contains("Cannot create tables with VOID type"))
+
+      val e2 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t2 AS SELECT null as null_col")
+      }.getMessage
+      assert(e2.contains("Cannot create tables with VOID type"))
+
+      val e3 = intercept[AnalysisException] {
+        spark.sql("CREATE TABLE t3 STORED AS PARQUET AS SELECT null as null_col")
+      }.getMessage
+      assert(e3.contains("Cannot create tables with VOID type"))
+    }
+
+    // Forbid creating table with void/null type in Spark
+    Seq("void", "null").foreach { colType =>

Review comment:
       Please refresh the code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448252184



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -292,6 +293,8 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       "in the table definition of " + table.identifier,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    assertNoNullTypeInSchema(schema)

Review comment:
       @cloud-fan 
   > Without this, "CREATE TABLE t1 USING PARQUET AS SELECT null as null_col" in Spark will throw `Parquet data source does not support null data type.` instead of `Cannot create tables with VOID type`
   
   Sorry, above description is incorrect. Without this, CTAS for Hive table `CREATE TABLE t2 AS SELECT null as null_col` will pass. No exception throws.
   
   Seems Hive table (non-parquet/orc format) doesn't go through `ResolveSessionCatalog`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-647900812


   @wangyum that said, only legacy Hive tables can have VOID column type?
   
   It's also good to list the current Spark behaviors. I think it makes sense to forbid creating tables with VOID column type, maybe we can do that with an analyzer rule.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-647957153


   Emmm, thanks @wangyum . I think we should keep the same behavior with Hive2.x. Throw more readable exceptions for below SQLs.
   ```sql
   create table t as select 1 x, null z from dual;
   create table t as select null as null_col
   create table t (v void);
   ```
   @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652160085


   **[Test build #124734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124734/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448154450



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", "null", null)))

Review comment:
       I mean `DataType.simpleString`.
   
   I think it looks better if DESC TABLE returns `Row("col", "void", null)` here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652114014


   **[Test build #124706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124706/testReport)** for PR 28833 at commit [`5aa4c1a`](https://github.com/apache/spark/commit/5aa4c1ae4ab9f67d947dcc33f7e1ff07a1d22858).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448149646



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -292,6 +293,8 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       "in the table definition of " + table.identifier,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    assertNoNullTypeInSchema(schema)

Review comment:
       With out this, `CREATE TABLE t1 USING PARQUET AS SELECT null as null_col` will throws `Parquet data source does not support null data type.` instead of `Cannot create tables with VOID type`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r449937054



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       Yes. We throw exception in command like creating table.
   
   > One question, if we don't allow void/unknown type, is there good reason to accept in parser?
   
   VOID is a legacy Hive table type which may be read by Spark. So in #28935, I map `case ("void", Nil) => HiveVoidType` in Parser, but @cloud-fan suggested me to re-use NullType. So the syntax seems we accepted void/unknown in parser. Actually, first, we accept the legacy Hive type VOID in parser and two, we fobid creating table with void/null.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-657382943


   @cloud-fan  doesn't work. We should choose a plan that forbid or support.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644071006


   cc @maropu, @cloud-fan, @dongjoon-hyun, @hvanhovell, @gatorsmile FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-651994032






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652381511


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124766/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652142129


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124720/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652518640






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654563887


   **[Test build #125164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125164/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r451238732



##########
File path: python/pyspark/sql/types.py
##########
@@ -116,6 +116,9 @@ class NullType(DataType):
 
     __metaclass__ = DataTypeSingleton
 
+    def simpleString(self):
+        return 'unknown'

Review comment:
       wait wait.. why is it known?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644683524


   **[Test build #124121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124121/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652235381


   **[Test build #124749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124749/testReport)** for PR 28833 at commit [`de08967`](https://github.com/apache/spark/commit/de08967458e9e1f12c84d9e95265684a47aa4789).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448226280



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -319,6 +323,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ ReplaceTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       Yes. `case CreateTableAsSelectStatement` is not cover either. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-648085998


   Does hive support inner void like `struct<v: void>`, `array<void>`, etc.?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644683524


   **[Test build #124121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124121/testReport)** for PR 28833 at commit [`479901d`](https://github.com/apache/spark/commit/479901db17d2e79d4a3001ae86f81faa545a5f4a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440539782



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,22 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype of view") {
+    withTable("t") {
+      withView("tabNullType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabNullType AS SELECT NULL AS col FROM t")

Review comment:
       Yes. table `t` is needed, otherwise `InvalidTableException: Table not found _dummy_table` exception throws.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440543326



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
##########
@@ -61,6 +61,7 @@ class DataTypeParserSuite extends SparkFunSuite {
   checkDataType("varchAr(20)", StringType)
   checkDataType("cHaR(27)", StringType)
   checkDataType("BINARY", BinaryType)
+  checkDataType("void", NullType)

Review comment:
       The result output above is `Row(null) :: Row(null) :: Row(null) :: Nil`.
   But I remember we shouldn't allow `create table t (v void)` https://github.com/apache/spark/pull/17953#issuecomment-306056712. So it needs additional PR to throw exception. #25198 is the PR which to do that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r451239237



##########
File path: python/pyspark/sql/types.py
##########
@@ -116,6 +116,9 @@ class NullType(DataType):
 
     __metaclass__ = DataTypeSingleton
 
+    def simpleString(self):
+        return 'unknown'

Review comment:
       As we discussed earlier above, I will take a separate look for PySpark side.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654562622


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644092814


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124046/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654564158






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655242033






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652150894


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652112555


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654002065


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124961/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655263937


   Thank you for all kindly review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] bart-samwel commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

bart-samwel commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r613003850



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -268,6 +271,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ CreateTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       > I don't know any database that supports creating tables with null/void type column, so this change is not for hive compatibility but for reasonable SQL semantic.
   > 
   > I agree this is a breaking change that should be at least put in the migration guide. A legacy config can also be added but I can't find a reasonable use case for a null type column.
   
   I think the main reason why you would want to support it is when people are using tables / views / temp tables to structure existing workloads. We support NullType type in CTEs, but in the case where people want to reuse the same CTE in multiple queries (i.e., multi-output workloads), they have no choice but to use views or temporary tables. (With DataFrames they'd still be able to reuse the same dataframe for multiple outputs, but in SQL that doesn't work.)
   
   One typical use case where you use CTEs to structure your code is if you have multiple sources with different structures that you then UNION ALL together into a single dataset. It is not uncommon for each of the sources to have certain columns that don't apply, and then you write explicit NULLs there. It would be pretty annoying if you had to write explicit casts of those NULLs to the right type in all of those cases.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448255880



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -319,6 +323,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ ReplaceTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       By my testing, `ReplaceTableAsSelectStatement` should also check the assertion. I will change it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652239258


   Does this close https://github.com/apache/spark/pull/28935 too?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654287926


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125091/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448159476



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", "null", null)))

Review comment:
       Add `def simpleString = "void"` in NullType will change many codes includes python. I revert this in commits/5aa4c1a. Now I think it is necessary. We should declare "null" is just a literal not a data type in Spark.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448111176



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -292,6 +293,8 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       "in the table definition of " + table.identifier,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    assertNoNullTypeInSchema(schema)

Review comment:
       Is this needed? I think the changes in `ResolveCatalogs` and `ResolveSessionCatalog` should cover all the commands.

##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
##########
@@ -106,7 +107,7 @@ class ResolveHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] {
       } else {
         withStorage
       }
-
+      assertNoNullTypeInSchema(withSchema.schema)

Review comment:
       ditto




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652138498






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644574966


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644092796






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655211012






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654002060






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644536578


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448176200



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       If we use `VOID` here in Apache Spark exception type, it looks to me like we declare `VOID` is one of Apache Spark type. But, please follow, @cloud-fan and @gatorsmile 's decision.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654644652


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125164/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644277309






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644537057






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440544556



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
##########
@@ -61,6 +61,7 @@ class DataTypeParserSuite extends SparkFunSuite {
   checkDataType("varchAr(20)", StringType)
   checkDataType("cHaR(27)", StringType)
   checkDataType("BINARY", BinaryType)
+  checkDataType("void", NullType)

Review comment:
       Sure, I will complete it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440300111



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,22 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype of view") {
+    withTable("t") {
+      withView("tabNullType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabNullType AS SELECT NULL AS col FROM t")

Review comment:
       We need the `t` table for this test? We cannot write `CREATE VIEW tabNullType AS SELECT NULL AS col`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652939092


   **[Test build #124894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124894/testReport)** for PR 28833 at commit [`98d12fe`](https://github.com/apache/spark/commit/98d12fefd7957fc6880fc81f075f4cb77b7f5b62).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652145932


   **[Test build #124727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124727/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654922361


   **[Test build #125200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125200/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-643995551


   **[Test build #124046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124046/testReport)** for PR 28833 at commit [`3b8ddec`](https://github.com/apache/spark/commit/3b8ddecc7a1498e1e430be7de1ae76123b269454).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448224429



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       So there are two points to be determinated.
   1. What is the exception message?
   `"Cannot create tables with VOID type."` or `"Cannot create tables with NULL type."` or `"Cannot create tables with UNKNOWN type."`
   2. What is the `simpleString` of class `NullType` ?
   `null` (current) or `void`(hive) or `unknown`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644276586


   **[Test build #124056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124056/testReport)** for PR 28833 at commit [`cf0db98`](https://github.com/apache/spark/commit/cf0db989206e2d79fe747439284c181c2575551b).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448836271



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       I'm in favor of UNKNOWN/unknown, as it indicates it's not a real data type. But I'm open to other options. cc @gatorsmile @HyukjinKwon @maropu @viirya @bart-samwel 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654688912


   **[Test build #125200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125200/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r449939043



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       Looks like `unknown` is accepted here. I am going to patch it. Any more comments @dongjoon-hyun @cloud-fan @gatorsmile 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448220808



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       I only tested in PostgreSQL and MySQL.
   No one can run
   `CREATE TABLE t (null_col NULL);`
   For 
   `CREATE TABLE t2 AS SELECT NULL as null_col`
   PostgreSQL shows
   ```
                   Table "public.t2"
     Column  | Type | Collation | Nullable | Default
   ----------+------+-----------+----------+---------
    null_col | text |           |          |
   ```
   And MYSQL shows
   ```
     Column  |    Type   | Nullable  | Default
   ----------+-----------+-----------+----------
    null_col | binary(0) |    YES    |  NULL
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654688912


   **[Test build #125200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125200/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-643995551


   **[Test build #124046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124046/testReport)** for PR 28833 at commit [`3b8ddec`](https://github.com/apache/spark/commit/3b8ddecc7a1498e1e430be7de1ae76123b269454).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652217228






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655278572


   If we're going to treat `void` as Hive legacy, let's don't support it at all and make the direction to deprecate and remove `NullType` away.
   
   If we'll still care and have `NullType`, let's make it a proper type in Spark.
   
   If we're not sure, let's don't change `simpleString` to something else for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-643996171






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654644641


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655291242


   First of all, it's not a good idea to add `NullType` into a new Spark datatype officially. If that is an official type, what can we do in Spark SQL world (https://spark.apache.org/docs/latest/sql-ref-datatypes.html)?
   > If we'll still care and have NullType, let's make it a proper type in Spark.
   
   Previously, this was supported until Apache Spark 2.0.0. After that, Apache Spark didn't support void. This PR also tried to forbid `VOID`. `AstBuilder` provides a way for graceful warning. Currently, we are very careful even in the error message, we didn't mention `void type`. We called it `unknown type`. I believe this PR is one way to implement your idea, too. Of course, we can add more messages, too.
   > If we're going to treat void as Hive legacy, let's don't support it at all and make the direction to deprecate and remove NullType away.
   
   In any way, since this is a legitimate suggestion from @HyukjinKwon , cc @gatorsmile , too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654362726


   **[Test build #125107 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125107/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448165084



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       Ur, are we going to expose `VOID` here? Maybe, `NULL` is better at this error message? Technically, this function is recognizing `NullType`, not `Void` type in Apache Spark Catalyst module.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654002060


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652217235


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124734/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r612917435



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -268,6 +271,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ CreateTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       I don't know any database that supports creating tables with null/void type column, so this change is not for hive compatibility but for reasonable SQL semantic.
   
   I agree this is a breaking change that should be at least put in the migration guide. A legacy config can also be added but I can't find a reasonable use case for a null type column.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644461275






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-650675131


   Before that, I think we still need to fix the problem describe in the description. https://github.com/apache/spark/pull/28833#pullrequestreview-435416974 is a good idea to handle it. I file #28935 as a new fixing. @maropu @cloud-fan @wangyum 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652150905


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124727/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654188670






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652146290






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448165084



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       Ur, are we going to expose `VOID` here? Maybe, `NULL` is better at this error message? Technically, this function is preventing `NullType`, not `Void` type in Apache Spark Catalyst module.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440519426



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
##########
@@ -61,6 +61,7 @@ class DataTypeParserSuite extends SparkFunSuite {
   checkDataType("varchAr(20)", StringType)
   checkDataType("cHaR(27)", StringType)
   checkDataType("BINARY", BinaryType)
+  checkDataType("void", NullType)

Review comment:
       Could you add end-2-end tests, too, just like this?
   ```
   scala> sql("create table t (v void)")
   scala> sql("insert into t values (null), (null), (null)")
   scala> sql("select * from t").show()
   // Checks result output
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654548138


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655234096


   Merged to master. Thank you for your patience, @LantaoJin .
   (The last commit is only about HiveDDLSuite. I tested it locally.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652381356


   **[Test build #124766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124766/testReport)** for PR 28833 at commit [`d6f1a4b`](https://github.com/apache/spark/commit/d6f1a4b59316db4f678554e90c795730dce3989c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654267118






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r451238732



##########
File path: python/pyspark/sql/types.py
##########
@@ -116,6 +116,9 @@ class NullType(DataType):
 
     __metaclass__ = DataTypeSingleton
 
+    def simpleString(self):
+        return 'unknown'

Review comment:
       wait wait.. why is it known?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655295006


   Could you make a follow-up(full revert or partial revert) as what you suggest, @HyukjinKwon ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon edited a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655277714


   Re: https://github.com/apache/spark/pull/28833#discussion_r448165084
   
   Sorry I read the comments just now. So the decision here is we allow to parse `void` as `NullType` but doesn't allow it in some commands like `CREATE TABLE`s.
   
   How about other cases when we directly use DDL-formatted string as its type? These simple type strings can be used in many places such as `from_csv` (schema as DDL formatted string), `from_json` (schema as DDL formatted string), `createDataFrame` (Python), etc. However, `StructType.simpleString` cannot still be parsed as the valid types.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652327470


   **[Test build #124766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124766/testReport)** for PR 28833 at commit [`d6f1a4b`](https://github.com/apache/spark/commit/d6f1a4b59316db4f678554e90c795730dce3989c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-651994032


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654272985


   **[Test build #125091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125091/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-653159448






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652518640


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644461527


   Btw, could you brush up the PR description for better commit logs? what's the proposal of this PR, what's a behaivor change before/after this PR, brabrabra... I feel the curren one looks a bit ambiougous...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652217228


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440544007



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala
##########
@@ -61,6 +61,7 @@ class DataTypeParserSuite extends SparkFunSuite {
   checkDataType("varchAr(20)", StringType)
   checkDataType("cHaR(27)", StringType)
   checkDataType("BINARY", BinaryType)
+  checkDataType("void", NullType)

Review comment:
       If so, could we fix the issue, too, in this PR?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644574966






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654188670






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655210553


   **[Test build #125270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125270/testReport)** for PR 28833 at commit [`9ad57d1`](https://github.com/apache/spark/commit/9ad57d17bac47ea0f801004ec0aba9197e631bc7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644713571


   To confirm: In Hive, people can't create tables with the void type (including void type inside struct/array/map). The only way is CTAS. Is this true?
   
   And how about Spark?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448210574



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client

Review comment:
       can just use `hiveClient`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644126758


   **[Test build #124056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124056/testReport)** for PR 28833 at commit [`cf0db98`](https://github.com/apache/spark/commit/cf0db989206e2d79fe747439284c181c2575551b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448145159



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")
+        checkAnswer(spark.table("tabVoidType"), Row(null))
+        // No exception shows
+        val desc = spark.sql("DESC tabVoidType").collect().toSeq
+        assert(desc.contains(Row("col", "null", null)))

Review comment:
       `NullType.toString` retruns "NullType". What's this comment meaning?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654252285






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644536775






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r440111404



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/NullType.scala
##########
@@ -32,6 +32,11 @@ class NullType private() extends DataType {
   override def defaultSize: Int = 1
 
   private[spark] override def asNullable: NullType = this
+
+  /**
+   * Readable string representation for NULL type.
+   */
+  override def simpleString: String = "void"

Review comment:
       We should also override `simpleString` at `NullType` in PySpark side manually https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L111 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652142088


   **[Test build #124720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124720/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652160470






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654287584


   **[Test build #125091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125091/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652939449






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652141485


   The description has updated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654266944


   **[Test build #125087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125087/testReport)** for PR 28833 at commit [`31ba0bf`](https://github.com/apache/spark/commit/31ba0bf71806b6cdedf840ec79d50e4199faeb02).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652142118


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448143565



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
+        client.runSqlHive("CREATE TABLE t (t1 int)")
+        client.runSqlHive("INSERT INTO t VALUES (3)")
+        client.runSqlHive("CREATE VIEW tabVoidType AS SELECT NULL AS col FROM t")

Review comment:
       `client.runSqlHive("CREATE TABLE tabVoidType AS SELECT NULL AS col FROM t")` will throw
   FAILED: SemanticException [Error 10305]: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field:  col




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644536775


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652142118






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654548145


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125107/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun edited a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655291242


   First of all, it's not a good idea to add `NullType` into a new Spark datatype officially. Not only the exposure causes more complexity, but also what can we do in Spark SQL world (https://spark.apache.org/docs/latest/sql-ref-datatypes.html) if that is an official type?
   > If we'll still care and have NullType, let's make it a proper type in Spark.
   
   Previously, this was supported until Apache Spark 2.0.0. After that, Apache Spark didn't support void. This PR also tried to forbid `VOID`. `AstBuilder` provides a way for graceful warning. Currently, we are very careful even in the error message, we didn't mention `void type`. We called it `unknown type`. I believe this PR is one way to implement your idea, too. Of course, we can add more messages, too.
   > If we're going to treat void as Hive legacy, let's don't support it at all and make the direction to deprecate and remove NullType away.
   
   In any way, since this is a legitimate suggestion from @HyukjinKwon , cc @gatorsmile , too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655242033


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448257790



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -319,6 +323,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ ReplaceTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r450158300



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       Let's go with `unknown` then




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] maropu commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

maropu commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654187102


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448110946



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -270,6 +275,7 @@ class ResolveSessionCatalog(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
       val provider = c.provider.getOrElse(conf.defaultDataSourceName)
       if (!isV2Provider(provider)) {
+        assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       ditto, this check can be done at the beginning.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652381500






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652400790


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652235853


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/29361/
   Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652217007


   **[Test build #124734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124734/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652138164


   **[Test build #124720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124720/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652145932


   **[Test build #124727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124727/testReport)** for PR 28833 at commit [`fdf57bf`](https://github.com/apache/spark/commit/fdf57bf2b6e4606c357965bc82126c82e1675ac5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644680800






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652235840






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448176200



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       If we use `VOID` here in Apache Spark exception type, it looks to me like we declare `VOID` is one of Apache Spark type. But, please follow @cloud-fan and @gatorsmile 's decision.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655211012






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448149646



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -292,6 +293,8 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       "in the table definition of " + table.identifier,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    assertNoNullTypeInSchema(schema)

Review comment:
       Without this, "CREATE TABLE t1 USING PARQUET AS SELECT null as null_col" will throw `Parquet data source does not support null data type.` instead of `Cannot create tables with VOID type`
   
   Comparing the error message from Hive `SemanticException [Error 10305]: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: col`, it's confused. So better to keep it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-659086163


   I guess we can forbid that too consistently as a continuation of this approach.
   BTW, until now, it's beyond of the scope because this PR was designed to prevent Hive void type.
   Since Apache Spark doesn't talk to Apache Hive Metastore in case of `in-memory` catalog, other PMC member may has a different idea.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448224429



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
##########
@@ -346,4 +346,23 @@ private[sql] object CatalogV2Util {
       }
     }
   }
+
+  def failNullType(dt: DataType): Unit = {
+    def containsNullType(dt: DataType): Boolean = dt match {
+      case ArrayType(et, _) => containsNullType(et)
+      case MapType(kt, vt, _) => containsNullType(kt) || containsNullType(vt)
+      case StructType(fields) => fields.exists(f => containsNullType(f.dataType))
+      case _ => dt.isInstanceOf[NullType]
+    }
+    if (containsNullType(dt)) {
+      throw new AnalysisException(
+        "Cannot create tables with VOID type.")

Review comment:
       So there are two points to be determinated.
   1. What is the exception message?
   `"Cannot create tables with VOID type."` or `"Cannot create tables with NULL type."` or "Cannot create tables with UNKNOWN type."
   2. What is the `simpleString` of class `NullType` ?
   `null` (current) or `void`(hive) or `unknown`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] shaneknapp commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

shaneknapp commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-654360013


   test this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644277313


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124056/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-655241849


   **[Test build #125270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125270/testReport)** for PR 28833 at commit [`9ad57d1`](https://github.com/apache/spark/commit/9ad57d17bac47ea0f801004ec0aba9197e631bc7).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652420727


   **[Test build #124786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124786/testReport)** for PR 28833 at commit [`d6f1a4b`](https://github.com/apache/spark/commit/d6f1a4b59316db4f678554e90c795730dce3989c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan edited a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan edited a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-648582548


   Can we have another PR to forbid creating tables with void type, via an analyzer rule?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448149646



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##########
@@ -292,6 +293,8 @@ case class PreprocessTableCreation(sparkSession: SparkSession) extends Rule[Logi
       "in the table definition of " + table.identifier,
       sparkSession.sessionState.conf.caseSensitiveAnalysis)
 
+    assertNoNullTypeInSchema(schema)

Review comment:
       Without this, "CREATE TABLE t1 USING PARQUET AS SELECT null as null_col" in Spark will throw `Parquet data source does not support null data type.` instead of `Cannot create tables with VOID type`
   
   Comparing the error message from Hive `SemanticException [Error 10305]: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: col`, it's confused. So better to keep it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652114420






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LantaoJin commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

LantaoJin commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r448257498



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
##########
@@ -2309,6 +2309,108 @@ class HiveDDLSuite
     }
   }
 
+  test("SPARK-20680: Spark-sql do not support for void column datatype") {
+    withTable("t") {
+      withView("tabVoidType") {
+        val client =
+          spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client

Review comment:
       updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r613011264



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -268,6 +271,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ CreateTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       @bart-samwel this makes sense, shall we also support `CREATE TABLE t(c VOID)`? Your case seems like CTAS only.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652939449






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644866064






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644092796


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-652421476






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28833: [SPARK-20680][SQL] Make null type in Spark sql to be compatible with Hive void datatype

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-644536779


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124080/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org