You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stefaan Lippens (Jira)" <ji...@apache.org> on 2022/10/27 10:46:00 UTC
[jira] [Created] (SPARK-40934) pyspark.pandas.read_csv parses dates, but docs state otherwise
Stefaan Lippens created SPARK-40934:
---------------------------------------
Summary: pyspark.pandas.read_csv parses dates, but docs state otherwise
Key: SPARK-40934
URL: https://issues.apache.org/jira/browse/SPARK-40934
Project: Spark
Issue Type: Bug
Components: Pandas API on Spark
Affects Versions: 3.3.1
Reporter: Stefaan Lippens
from [https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.read_csv.html] :
{quote}parse_dates:
boolean or list of ints or names or list of lists or dict, default False.
Currently only False is allowed.
{quote}
This documentation suggests that dates are never parsed, but apparently they are always parsed (and it can not be disabled):
{code:python}
import pyspark.pandas
df = pyspark.pandas.read_csv("data.csv", parse_dates=False)
print(df)
print(df.dtypes)
{code}
with this data
{code:java}
date,feature_index,band_0,band_1,band_2
2021-01-05T01:00:00.000+01:00,2,5.0,4.5,3.75
2021-01-05T01:00:00.000+01:00,0,5.0,1.0,2.25
2021-01-05T01:00:00.000+01:00,1,5.0,3.5,4.0
2021-01-15T01:00:00.000+01:00,2,15.0,4.5,3.75
2021-01-15T01:00:00.000+01:00,0,15.0,1.0,2.25
{code}
gives
{code:java}
date feature_index band_0 band_1 band_2
0 2021-01-05 01:00:00 2 5.0 4.5 3.75
1 2021-01-05 01:00:00 0 5.0 1.0 2.25
2 2021-01-05 01:00:00 1 5.0 3.5 4.00
3 2021-01-15 01:00:00 2 15.0 4.5 3.75
4 2021-01-15 01:00:00 0 15.0 1.0 2.25
date datetime64[ns]
feature_index int32
band_0 float64
band_1 float64
band_2 float64
dtype: object
{code}
Notice how the dates are parsed (e.g. dtype {{datetime64[ns]}} for {{date}})
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org