You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kent Yao (Jira)" <ji...@apache.org> on 2020/08/24 03:50:00 UTC
[jira] [Comment Edited] (SPARK-32683) Datetime Pattern F not
working as expected
[ https://issues.apache.org/jira/browse/SPARK-32683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182932#comment-17182932 ]
Kent Yao edited comment on SPARK-32683 at 8/24/20, 3:49 AM:
------------------------------------------------------------
This is a bug of the doc that we inherited from JDK https://bugs.openjdk.java.net/browse/JDK-8169482
The SimpleDateFormatter(*F Day of week in month*) we used in 2.x and the DatetimeFormatter(*F week-of-month*) we use now both have the opposite meanings to what they declared in the java docs. And unfortunately, this also leads to silent data change in Spark too.
The *`week-of-month`* is actually the pattern `W` in DatetimeFormatter, which is banned to use in Spark 3.x
If we want to keep pattern `F`, we need to accept the behavior change and fix the doc in Spark. cc [~cloud_fan]
was (Author: qin yao):
This is a bug of the doc that we inherited from JDK https://bugs.openjdk.java.net/browse/JDK-8169482
The SimpleDateFormatter(*_F Day of week in month_*) we used in 2.x and the DatetimeFormatter(*_F week-of-month_*) we use now both have the opposite meanings to what they declared in the java docs. And unfortunately, this also leads to silent data change in Spark too.
The *`week-of-month`* is actually the pattern `W` in DatetimeFormatter, which is banned to use in Spark 3.x
If we want to keep pattern `F`, we need to accept the behavior change and fix the doc in Spark. cc [~cloud_fan]
> Datetime Pattern F not working as expected
> ------------------------------------------
>
> Key: SPARK-32683
> URL: https://issues.apache.org/jira/browse/SPARK-32683
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Environment: Windows 10 Pro
> * with Jupyter Lab - Docker Image
> ** jupyter/all-spark-notebook:f1811928b3dd
> *** spark 3.0.0
> *** python 3.8.5
> *** openjdk 11.0.8
> Reporter: Daeho Ro
> Priority: Major
> Attachments: comment.png
>
>
> h3. Background
> From the [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html], the pattern F should give a week of the month.
> |*Symbol*|*Meaning*|*Presentation*|*Example*|
> |F|week-of-month|number(1)|3|
> h3. Test Data
> Here is my test data, that is a csv file.
> {code:java}
> date
> 2020-08-01
> 2020-08-02
> 2020-08-03
> 2020-08-04
> 2020-08-05
> 2020-08-06
> 2020-08-07
> 2020-08-08
> 2020-08-09
> 2020-08-10 {code}
> h3. Steps to the bug
> I have tested in the scala spark 3.0.0 and pyspark 3.0.0:
> {code:java}
> // Spark
> df.withColumn("date", to_timestamp('date, "yyyy-MM-dd"))
> .withColumn("week", date_format('date, "F")).show
> +-------------------+----+
> | date|week|
> +-------------------+----+
> |2020-08-01 00:00:00| 1|
> |2020-08-02 00:00:00| 2|
> |2020-08-03 00:00:00| 3|
> |2020-08-04 00:00:00| 4|
> |2020-08-05 00:00:00| 5|
> |2020-08-06 00:00:00| 6|
> |2020-08-07 00:00:00| 7|
> |2020-08-08 00:00:00| 1|
> |2020-08-09 00:00:00| 2|
> |2020-08-10 00:00:00| 3|
> +-------------------+----+
> # pyspark
> df.withColumn('date', to_timestamp('date', 'yyyy-MM-dd')) \
> .withColumn('week', date_format('date', 'F')) \
> .show(10, False)
> +-------------------+----+
> |date |week|
> +-------------------+----+
> |2020-08-01 00:00:00|1 |
> |2020-08-02 00:00:00|2 |
> |2020-08-03 00:00:00|3 |
> |2020-08-04 00:00:00|4 |
> |2020-08-05 00:00:00|5 |
> |2020-08-06 00:00:00|6 |
> |2020-08-07 00:00:00|7 |
> |2020-08-08 00:00:00|1 |
> |2020-08-09 00:00:00|2 |
> |2020-08-10 00:00:00|3 |
> +-------------------+----+{code}
> h3. Expected result
> The `week` column is not the week of the month. It is a day of the week as a number.
> !comment.png!
> From my calendar, the first day of August should have 1 for the week-of-month and from 2nd to 8th should have 2 and so on.
> {code:java}
> +-------------------+----+
> |date |week|
> +-------------------+----+
> |2020-08-01 00:00:00|1 |
> |2020-08-02 00:00:00|2 |
> |2020-08-03 00:00:00|2 |
> |2020-08-04 00:00:00|2 |
> |2020-08-05 00:00:00|2 |
> |2020-08-06 00:00:00|2 |
> |2020-08-07 00:00:00|2 |
> |2020-08-08 00:00:00|2 |
> |2020-08-09 00:00:00|3 |
> |2020-08-10 00:00:00|3 |
> +-------------------+----+{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org