You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/07/20 06:37:00 UTC
[jira] [Assigned] (SPARK-39821) DatetimeIndex error during pyspark session
[ https://issues.apache.org/jira/browse/SPARK-39821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-39821:
------------------------------------
Assignee: Apache Spark
> DatetimeIndex error during pyspark session
> ------------------------------------------
>
> Key: SPARK-39821
> URL: https://issues.apache.org/jira/browse/SPARK-39821
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.2.2
> Environment: OS: ubuntu
> Python version: 3.8.13
> Reporter: bo zhao
> Assignee: Apache Spark
> Priority: Minor
>
> {code:java}
> Using Python version 3.8.13 (default, Jun 29 2022 11:50:19)
> Spark context Web UI available at http://172.25.179.45:4042
> Spark context available as 'sc' (master = local[*], app id = local-1658283215853).
> SparkSession available as 'spark'.
> >>> from pyspark import pandas as ps
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It is required to set this environment variable to '1' in both driver and executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it does not work if there is a Spark context already launched.
> >>> ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01'])
> /home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
> fields = [
> /home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
> for column, series in pdf.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
> for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
> for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
> for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
> for item in s.iteritems():
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 2770, in __repr__
> pindex = self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
> File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12780, in _get_or_create_repr_pandas_cache
> self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
> File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12775, in _to_internal_pandas
> return self._internal.to_pandas_frame
> File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in wrapped_lazy_property
> setattr(self, attr_name, fn(self))
> File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1056, in to_pandas_frame
> pdf = sdf.toPandas()
> File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line 248, in toPandas
> series = series.astype(t, copy=False)
> File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in astype
> new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
> File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 386, in astype
> return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
> File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 308, in apply
> applied = getattr(b, f)(**kwargs)
> File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line 526, in astype
> new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
> File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
> new_values = astype_array(values, dtype, copy=copy)
> File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227, in astype_array
> values = values.astype(dtype, copy=copy)
> File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line 631, in astype
> return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
> File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py", line 504, in astype
> raise TypeError(msg)
> TypeError: Cannot cast DatetimeArray to dtype datetime64
> {code}
> I exec pyspark, and insert the ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) in the session.
> But it don't raise error like below
> {code:java}
> a = ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) {code}
> It will raise error when I call a in the session, such as
> {code:java}
> >>> a
> {code}
> So, it would be in trouch in the __repr__ function.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org