You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "hengoren (via GitHub)" <gi...@apache.org> on 2023/05/22 12:52:45 UTC
[GitHub] [spark] hengoren commented on pull request #37232: [SPARK-39821][PYTHON][PS] Fix error during using DatetimeIndex
hengoren commented on PR #37232:
URL: https://github.com/apache/spark/pull/37232#issuecomment-1557166208
With the release of pandas 2.0, I think this is PR should be re-opened, right?
I can recreate the issue originally described with
```python
Python 3.9.16 (main, May 3 2023, 09:54:39)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> pyspark.__version__
'3.4.0'
>>> import pandas
>>> pandas.__version__
'2.0.1'
>>> import pyspark.pandas as ps
>>> ps.DatetimeIndex(["1970-01-01", "1970-01-02", "1970-01-03"])
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/05/18 21:07:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/18 21:07:31 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/indexes/base.py", line 2705, in __repr__
pindex = self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line 13347, in _get_or_create_repr_pandas_cache
self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/frame.py", line 13342, in _to_internal_pandas
return self._internal.to_pandas_frame
File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/utils.py", line 588, in wrapped_lazy_property
setattr(self, attr_name, fn(self))
File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/pandas/internal.py", line 1056, in to_pandas_frame
pdf = sdf.toPandas()
File "/home/ubuntu/.local/lib/python3.9/site-packages/pyspark/sql/pandas/conversion.py", line 251, in toPandas
if (t is not None and not all([is_timedelta64_dtype(t),is_datetime64_dtype(t)])) or should_check_timedelta:
File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 6324, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 451, in astype
return self.apply(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 352, in apply
applied = getattr(b, f)(**kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 511, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", line 242, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/dtypes/astype.py", line 184, in astype_array
values = values.astype(dtype, copy=copy)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 694, in astype
raise TypeError(
TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org