You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Luis (Jira)" <ji...@apache.org> on 2019/08/26 08:36:00 UTC

[jira] [Updated] (SPARK-28874) Pyspark bug in date_format

     [ https://issues.apache.org/jira/browse/SPARK-28874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luis updated SPARK-28874:
-------------------------
    Description: 
Pyspark date_format add one years in the last days off year :

Example :
{code:python}
from datetime import datetime
from dateutil.relativedelta import relativedelta
from pyspark.sql.functions i

start_date = datetime(2010,1,1)

end_date = datetime(2055,1,1)

indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), end_date.strftime('%m/%d/%Y'), freq='D')

data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in indx_ts.values ]

from pyspark.sql.types import *

df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), True)]))
df_string = df_p.withColumn("date_string"
 ,date_format(col("d"), "YYYY-MM-dd"))

df_string.filter("d!=date_string").show(1000)

{code}
 
|2010-12-26|2011-12-26| |

|2010-12-27|2011-12-27|

|2010-12-28|2011-12-28| |

|2010-12-29|2011-12-29| |

|2010-12-30|2011-12-30| |

|2010-12-31|2011-12-31| |

|2012-12-30| 2013-12-30| |2012-12-31| 2013-12-31|

  was:
Pyspark date_format add one years in the last days off year :

Example :

{code:python}

from datetime import datetime
from dateutil.relativedelta import relativedelta
from pyspark.sql.functions i

start_date = datetime(2010,1,1)

end_date = datetime(2055,1,1)

indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), end_date.strftime('%m/%d/%Y'), freq='D')

data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in indx_ts.values ]

from pyspark.sql.types import *

df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), True)]))
df_string = df_p.withColumn("date_string"
 ,date_format(col("d"), "YYYY-MM-dd"))

df_string.filter("d!=date_string").show(1000)

{code}

 

+----------+-----------+ | d|date_string| +----------+-----------+ |2010-12-26| 2011-12-26| |2010-12-27| 2011-12-27| |2010-12-28| 2011-12-28| |2010-12-29| 2011-12-29| |2010-12-30| 2011-12-30| |2010-12-31| 2011-12-31| |2012-12-30| 2013-12-30| |2012-12-31| 2013-12-31| |2013-12-29| 2014-12-29| |2013-12-30| 2014-12-30| |2013-12-31| 2014-12-31


> Pyspark bug in date_format
> --------------------------
>
>                 Key: SPARK-28874
>                 URL: https://issues.apache.org/jira/browse/SPARK-28874
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0, 2.3.0
>            Reporter: Luis
>            Priority: Major
>
> Pyspark date_format add one years in the last days off year :
> Example :
> {code:python}
> from datetime import datetime
> from dateutil.relativedelta import relativedelta
> from pyspark.sql.functions i
> start_date = datetime(2010,1,1)
> end_date = datetime(2055,1,1)
> indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), end_date.strftime('%m/%d/%Y'), freq='D')
> data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in indx_ts.values ]
> from pyspark.sql.types import *
> df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), True)]))
> df_string = df_p.withColumn("date_string"
>  ,date_format(col("d"), "YYYY-MM-dd"))
> df_string.filter("d!=date_string").show(1000)
> {code}
>  
> |2010-12-26|2011-12-26| |
> |2010-12-27|2011-12-27|
> |2010-12-28|2011-12-28| |
> |2010-12-29|2011-12-29| |
> |2010-12-30|2011-12-30| |
> |2010-12-31|2011-12-31| |
> |2012-12-30| 2013-12-30| |2012-12-31| 2013-12-31|



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org