You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/05/04 02:23:32 UTC

[GitHub] spark pull request #21186: [SPARK-22279][SPARK-24112] Enable `convertMetasto...

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21186#discussion_r185980350
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1812,6 +1812,9 @@ working with timestamps in `pandas_udf`s to get the best performance, see
       - Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set `true` to `spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the previous behavior. This option will be removed in Spark 3.0.
       - Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception.
       - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a tempora
 ry workaround.
    +  - Since Spark 2.4, Spark uses its own ORC support by default instead of Hive SerDe for better performance during Hive metastore table access. To set `false` to `spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
    +  - Since Spark 2.4, Spark supports table properties while converting Parquet/ORC Hive tables. To set `false` to `spark.sql.hive.convertMetastoreTableProperty` restores the previous behavior.
    --- End diff --
    
    please polish the migration guide w.r.t. https://issues.apache.org/jira/browse/SPARK-24175


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org