You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/06/20 18:54:00 UTC
[jira] [Commented] (SPARK-39536) to_date function is returning incorrect value
[ https://issues.apache.org/jira/browse/SPARK-39536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556519#comment-17556519 ]
Bruce Robbins commented on SPARK-39536:
---------------------------------------
Seems like your date format string ('mm/dd/yyyy') is not correct.
With your format:
{noformat}
>>> newDf = (df.withColumn('new_date',to_date(col('date_str'),'mm/dd/yyyy')))
newDf = (df.withColumn('new_date',to_date(col('date_str'),'mm/dd/yyyy')))
>>> newDf.show(truncate=False)
newDf.show(truncate=False)
+----------+----------+
|date_str |new_date |
+----------+----------+
|11/25/1991|1991-01-25|
|1/2/1991 |1991-01-02|
|11/30/1991|1991-01-30|
+----------+----------+
{noformat}
With corrected format:
{noformat}
>>> newDf = (df.withColumn('new_date',to_date(col('date_str'),'MM/dd/yyyy')))
newDf = (df.withColumn('new_date',to_date(col('date_str'),'MM/dd/yyyy')))
>>> newDf.show(truncate=False)
newDf.show(truncate=False)
+----------+----------+
|date_str |new_date |
+----------+----------+
|11/25/1991|1991-11-25|
|1/2/1991 |1991-01-02|
|11/30/1991|1991-11-30|
+----------+----------+
{noformat}
> to_date function is returning incorrect value
> ---------------------------------------------
>
> Key: SPARK-39536
> URL: https://issues.apache.org/jira/browse/SPARK-39536
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.2.1
> Environment: I'm facing this issue in databricks community edition. I'm using DBR 10.4 LTS.
> Reporter: Sridhar Varanasi
> Priority: Major
> Attachments: to_date_issue.PNG
>
>
> Hi,
>
> I have a dataframe which has a column containing dates in string format. Now while converting this to date type using to_date , it's giving incorrect date format values. Following is the example code.
>
>
> df = spark.createDataFrame(
> [("11/25/1991",), ("1/2/1991",), ("11/30/1991",)],
> ['date_str']
> )
>
> spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
>
> df = (df
> .withColumn('new_date'
> ,to_date(col('date_str'),'mm/dd/yyyy')))
> display(df)
>
>
> In the above dataframe we get the date converted correctly for the 2nd row but for 1st and 3rd row we are getting incorrect dates post conversion.
>
>
> Could you please look into this issue?
>
> Thanks,
> Sridhar
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org