You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2020/08/25 13:22:54 UTC

[spark] branch branch-3.0 updated: [SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 21ac7e2  [SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F
21ac7e2 is described below

commit 21ac7e2b696a6fc6958fdb84ae2b960892c7349e
Author: Kent Yao <ya...@hotmail.com>
AuthorDate: Tue Aug 25 13:17:03 2020 +0000

    [SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F
    
    ### What changes were proposed in this pull request?
    
    This PR fixes the doc error and add a migration guide for datetime pattern.
    
    ### Why are the changes needed?
    This is a bug of the doc that we inherited from JDK https://bugs.openjdk.java.net/browse/JDK-8169482
    
    The SimpleDateFormatter(**F Day of week in month**) we used in 2.x and the DatetimeFormatter(**F week-of-month**) we use now both have the opposite meanings to what they declared in the java docs. And unfortunately, this also leads to silent data change in Spark too.
    
    The `week-of-month` is actually the pattern `W` in DatetimeFormatter, which is banned to use in Spark 3.x.
    
    If we want to keep pattern `F`, we need to accept the behavior change with proper migration guide and fix the doc in Spark
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, doc changed
    
    ### How was this patch tested?
    
    passing ci doc generating job
    
    Closes #29538 from yaooqinn/SPARK-32683.
    
    Authored-by: Kent Yao <ya...@hotmail.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
    (cherry picked from commit 1f3bb5175749816be1f0bc793ed5239abf986000)
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 docs/sql-migration-guide.md      | 2 ++
 docs/sql-ref-datetime-pattern.md | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index ce5655a..532ac70 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -173,6 +173,8 @@ license: |
 
   - Since Spark 3.0, when using `EXTRACT` expression to extract the second field from date/timestamp values, the result will be a `DecimalType(8, 6)` value with 2 digits for second part, and 6 digits for the fractional part with microsecond precision. e.g. `extract(second from to_timestamp('2019-09-20 10:10:10.1'))` results `10.100000`.  In Spark version 2.4 and earlier, it returns an `IntegerType` value and the result for the former example is `10`.
 
+  - In Spark 3.0, datetime pattern letter `F` is **aligned day of week in month** that represents the concept of the count of days within the period of a week where the weeks are aligned to the start of the month. In Spark version 2.4 and earlier, it is **week of month** that represents the concept of the count of weeks within the month where weeks start on a fixed day-of-week, e.g. `2020-07-30` is 30 days (4 weeks and 2 days) after the first day of the month, so `date_format(date '2020- [...]
+
 ### Data Sources
 
   - In Spark version 2.4 and below, when reading a Hive SerDe table with Spark native data sources(parquet/orc), Spark infers the actual file schema and update the table schema in metastore. In Spark 3.0, Spark doesn't infer the schema anymore. This should not cause any problems to end users, but if it does, set `spark.sql.hive.caseSensitiveInferenceMode` to `INFER_AND_SAVE`.
diff --git a/docs/sql-ref-datetime-pattern.md b/docs/sql-ref-datetime-pattern.md
index d0299e5..4b02cda 100644
--- a/docs/sql-ref-datetime-pattern.md
+++ b/docs/sql-ref-datetime-pattern.md
@@ -37,7 +37,7 @@ Spark uses pattern letters in the following table for date and timestamp parsing
 |**d**|day-of-month|number(3)|28|
 |**Q/q**|quarter-of-year|number/text|3; 03; Q3; 3rd quarter|
 |**E**|day-of-week|text|Tue; Tuesday|
-|**F**|week-of-month|number(1)|3|
+|**F**|aligned day of week in month|number(1)|3|
 |**a**|am-pm-of-day|am-pm|PM|
 |**h**|clock-hour-of-am-pm (1-12)|number(2)|12|
 |**K**|hour-of-am-pm (0-11)|number(2)|0|


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org