You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/06/20 18:54:00 UTC
[jira] [Commented] (SPARK-39536) to_date function is returning incorrect value

    [ https://issues.apache.org/jira/browse/SPARK-39536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556519#comment-17556519 ] 

Bruce Robbins commented on SPARK-39536:
---------------------------------------

Seems like your date format string ('mm/dd/yyyy') is not correct.

With your format:
{noformat}
>>> newDf = (df.withColumn('new_date',to_date(col('date_str'),'mm/dd/yyyy')))
newDf = (df.withColumn('new_date',to_date(col('date_str'),'mm/dd/yyyy')))
>>> newDf.show(truncate=False)
newDf.show(truncate=False)
+----------+----------+
|date_str  |new_date  |
+----------+----------+
|11/25/1991|1991-01-25|
|1/2/1991  |1991-01-02|
|11/30/1991|1991-01-30|
+----------+----------+
{noformat}
With corrected format:
{noformat}
>>> newDf = (df.withColumn('new_date',to_date(col('date_str'),'MM/dd/yyyy')))
newDf = (df.withColumn('new_date',to_date(col('date_str'),'MM/dd/yyyy')))
>>> newDf.show(truncate=False)
newDf.show(truncate=False)
+----------+----------+
|date_str  |new_date  |
+----------+----------+
|11/25/1991|1991-11-25|
|1/2/1991  |1991-01-02|
|11/30/1991|1991-11-30|
+----------+----------+
{noformat}

> to_date function is returning incorrect value
> ---------------------------------------------
>
>                 Key: SPARK-39536
>                 URL: https://issues.apache.org/jira/browse/SPARK-39536
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.1
>         Environment: I'm facing this issue in databricks community edition. I'm using DBR 10.4 LTS.
>            Reporter: Sridhar Varanasi
>            Priority: Major
>         Attachments: to_date_issue.PNG
>
>
> Hi,
>  
> I have a dataframe which has a column containing dates in string format. Now while converting this to date type using to_date , it's giving incorrect date format values. Following is the example code.
>  
>  
> df = spark.createDataFrame(
>     [("11/25/1991",), ("1/2/1991",), ("11/30/1991",)], 
>     ['date_str']
> )
>  
> spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
>  
> df = (df
>                  .withColumn('new_date'
>                              ,to_date(col('date_str'),'mm/dd/yyyy')))
> display(df)
>  
>  
> In the above dataframe we get the date converted correctly for the 2nd row but for 1st and 3rd row we are getting incorrect dates post conversion.
>  
>  
> Could you please look into this issue?
>  
> Thanks,
> Sridhar



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org