You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/07/20 06:37:00 UTC
[jira] [Assigned] (SPARK-39821) DatetimeIndex error during pyspark session

     [ https://issues.apache.org/jira/browse/SPARK-39821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-39821:
------------------------------------

    Assignee: Apache Spark

> DatetimeIndex error during pyspark session
> ------------------------------------------
>
>                 Key: SPARK-39821
>                 URL: https://issues.apache.org/jira/browse/SPARK-39821
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.2
>         Environment: OS: ubuntu
> Python version: 3.8.13
>            Reporter: bo zhao
>            Assignee: Apache Spark
>            Priority: Minor
>
> {code:java}
> Using Python version 3.8.13 (default, Jun 29 2022 11:50:19)
> Spark context Web UI available at http://172.25.179.45:4042
> Spark context available as 'sc' (master = local[*], app id = local-1658283215853).
> SparkSession available as 'spark'.
> >>> from pyspark import pandas as ps
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
> >>> ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01'])
> /home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
>   fields = [
> /home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
>   for column, series in pdf.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
>   for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
>   for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
>   for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
>   for item in s.iteritems():
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 2770, in __repr__
>     pindex = self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
>   File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12780, in _get_or_create_repr_pandas_cache
>     self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
>   File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12775, in _to_internal_pandas
>     return self._internal.to_pandas_frame
>   File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in wrapped_lazy_property
>     setattr(self, attr_name, fn(self))
>   File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1056, in to_pandas_frame
>     pdf = sdf.toPandas()
>   File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line 248, in toPandas
>     series = series.astype(t, copy=False)
>   File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in astype
>     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
>   File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 386, in astype
>     return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
>   File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 308, in apply
>     applied = getattr(b, f)(**kwargs)
>   File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line 526, in astype
>     new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
>   File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
>     new_values = astype_array(values, dtype, copy=copy)
>   File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227, in astype_array
>     values = values.astype(dtype, copy=copy)
>   File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line 631, in astype
>     return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
>   File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py", line 504, in astype
>     raise TypeError(msg)
> TypeError: Cannot cast DatetimeArray to dtype datetime64
>  {code}
> I exec pyspark, and insert the ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) in the session.
> But it don't raise error like below
> {code:java}
> a = ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) {code}
> It will raise error when I call a in the session, such as
> {code:java}
> >>> a
> {code}
> So, it would be in trouch in the __repr__ function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org