You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Luis (Jira)" <ji...@apache.org> on 2019/08/26 08:36:00 UTC
[jira] [Updated] (SPARK-28874) Pyspark bug in date_format
[ https://issues.apache.org/jira/browse/SPARK-28874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luis updated SPARK-28874:
-------------------------
Description:
Pyspark date_format add one years in the last days off year :
Example :
{code:python}
from datetime import datetime
from dateutil.relativedelta import relativedelta
from pyspark.sql.functions i
start_date = datetime(2010,1,1)
end_date = datetime(2055,1,1)
indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), end_date.strftime('%m/%d/%Y'), freq='D')
data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in indx_ts.values ]
from pyspark.sql.types import *
df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), True)]))
df_string = df_p.withColumn("date_string"
,date_format(col("d"), "YYYY-MM-dd"))
df_string.filter("d!=date_string").show(1000)
{code}
|2010-12-26|2011-12-26| |
|2010-12-27|2011-12-27|
|2010-12-28|2011-12-28| |
|2010-12-29|2011-12-29| |
|2010-12-30|2011-12-30| |
|2010-12-31|2011-12-31| |
|2012-12-30| 2013-12-30| |2012-12-31| 2013-12-31|
was:
Pyspark date_format add one years in the last days off year :
Example :
{code:python}
from datetime import datetime
from dateutil.relativedelta import relativedelta
from pyspark.sql.functions i
start_date = datetime(2010,1,1)
end_date = datetime(2055,1,1)
indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), end_date.strftime('%m/%d/%Y'), freq='D')
data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in indx_ts.values ]
from pyspark.sql.types import *
df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), True)]))
df_string = df_p.withColumn("date_string"
,date_format(col("d"), "YYYY-MM-dd"))
df_string.filter("d!=date_string").show(1000)
{code}
+----------+-----------+ | d|date_string| +----------+-----------+ |2010-12-26| 2011-12-26| |2010-12-27| 2011-12-27| |2010-12-28| 2011-12-28| |2010-12-29| 2011-12-29| |2010-12-30| 2011-12-30| |2010-12-31| 2011-12-31| |2012-12-30| 2013-12-30| |2012-12-31| 2013-12-31| |2013-12-29| 2014-12-29| |2013-12-30| 2014-12-30| |2013-12-31| 2014-12-31
> Pyspark bug in date_format
> --------------------------
>
> Key: SPARK-28874
> URL: https://issues.apache.org/jira/browse/SPARK-28874
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.1.0, 2.3.0
> Reporter: Luis
> Priority: Major
>
> Pyspark date_format add one years in the last days off year :
> Example :
> {code:python}
> from datetime import datetime
> from dateutil.relativedelta import relativedelta
> from pyspark.sql.functions i
> start_date = datetime(2010,1,1)
> end_date = datetime(2055,1,1)
> indx_ts = pd.date_range(start_date.strftime('%m/%d/%Y'), end_date.strftime('%m/%d/%Y'), freq='D')
> data_date = [ \{"d":datetime.utcfromtimestamp(x.tolist()/1e9)} for x in indx_ts.values ]
> from pyspark.sql.types import *
> df_p = spark.createDataFrame(data_date,StructType([StructField('d', DateType(), True)]))
> df_string = df_p.withColumn("date_string"
> ,date_format(col("d"), "YYYY-MM-dd"))
> df_string.filter("d!=date_string").show(1000)
> {code}
>
> |2010-12-26|2011-12-26| |
> |2010-12-27|2011-12-27|
> |2010-12-28|2011-12-28| |
> |2010-12-29|2011-12-29| |
> |2010-12-30|2011-12-30| |
> |2010-12-31|2011-12-31| |
> |2012-12-30| 2013-12-30| |2012-12-31| 2013-12-31|
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org