You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/13 16:19:05 UTC

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27902: [SPARK-31147][SQL] forbid CHAR/VARCHAR type in non-Hive tables

dongjoon-hyun commented on a change in pull request #27902: [SPARK-31147][SQL] forbid CHAR/VARCHAR type in non-Hive tables
URL: https://github.com/apache/spark/pull/27902#discussion_r392330746
 
 

 ##########
 File path: docs/sql-migration-guide.md
 ##########
 @@ -334,14 +334,16 @@ license: |
   - Since Spark 3.0, `SHOW CREATE TABLE` will always return Spark DDL, even when the given table is a Hive serde table. For generating Hive DDL, please use `SHOW CREATE TABLE AS SERDE` command instead.
 
   - Since Spark 3.0, we upgraded the built-in Hive from 1.2 to 2.3 and it brings following impacts:
-  
+
     - You may need to set `spark.sql.hive.metastore.version` and `spark.sql.hive.metastore.jars` according to the version of the Hive metastore you want to connect to.
   For example: set `spark.sql.hive.metastore.version` to `1.2.1` and `spark.sql.hive.metastore.jars` to `maven` if your Hive metastore version is 1.2.1.
-  
+
     - You need to migrate your custom SerDes to Hive 2.3 or build your own Spark with `hive-1.2` profile. See HIVE-15167 for more details.
 
     - The decimal string representation can be different between Hive 1.2 and Hive 2.3 when using `TRANSFORM` operator in SQL for script transformation, which depends on hive's behavior. In Hive 1.2, the string representation omits trailing zeroes. But in Hive 2.3, it is always padded to 18 digits with trailing zeroes if necessary.
 
+  - Since Spark 3.0, columns of CHAR/VARCHAR type are not allowed in non-Hive tables, and CREATE/ALTER TABLE commands will fail if CHAR/VARCHAR type is detected. In Spark version 2.4 and earlier, CHAR/VARCHAR type are treated as STRING type and the length parameter is simply ignored.
 
 Review comment:
   BTW, `VARCHAR` is a little different and have more official documents. Could you check them together?
   ```
   $ git grep 'CHAR'
   sql-data-sources-jdbc.md:     The database column data types to use instead of the defaults, when creating the table. Data type information should be specified in the same format as CREATE TABLE columns syntax (e.g: <code>"name CHAR(64), comments VARCHAR(1024)")</code>. The specified types should be valid spark sql data types. This option applies only to writing.
   sql-ref-syntax-aux-describe-table.md:    state VARCHAR(20),
   sql-ref-syntax-aux-show-columns.md:  name VARCHAR(100),
   sql-ref-syntax-aux-show-tblproperties.md:CREATE TABLE customer(cust_code INT, name VARCHAR(100), cust_addr STRING)
   sql-ref-syntax-dml-insert-into.md: CREATE TABLE students (name VARCHAR(64), address VARCHAR(64), student_id INT)
   sql-ref-syntax-dml-load.md: CREATE TABLE test_load (name VARCHAR(64), address VARCHAR(64), student_id INT);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org