You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/14 07:29:34 UTC

[GitHub] [spark] bart-samwel commented on a change in pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

bart-samwel commented on a change in pull request #28833:
URL: https://github.com/apache/spark/pull/28833#discussion_r613003850



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -268,6 +271,7 @@ class ResolveSessionCatalog(
     // session catalog and the table provider is not v2.
     case c @ CreateTableStatement(
          SessionCatalogAndTable(catalog, tbl), _, _, _, _, _, _, _, _, _) =>
+      assertNoNullTypeInSchema(c.tableSchema)

Review comment:
       > I don't know any database that supports creating tables with null/void type column, so this change is not for hive compatibility but for reasonable SQL semantic.
   > 
   > I agree this is a breaking change that should be at least put in the migration guide. A legacy config can also be added but I can't find a reasonable use case for a null type column.
   
   I think the main reason why you would want to support it is when people are using tables / views / temp tables to structure existing workloads. We support NullType type in CTEs, but in the case where people want to reuse the same CTE in multiple queries (i.e., multi-output workloads), they have no choice but to use views or temporary tables. (With DataFrames they'd still be able to reuse the same dataframe for multiple outputs, but in SQL that doesn't work.)
   
   One typical use case where you use CTEs to structure your code is if you have multiple sources with different structures that you then UNION ALL together into a single dataset. It is not uncommon for each of the sources to have certain columns that don't apply, and then you write explicit NULLs there. It would be pretty annoying if you had to write explicit casts of those NULLs to the right type in all of those cases.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org