You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/20 15:21:34 UTC
[GitHub] [arrow] kszucs commented on pull request #7805: ARROW-9528: [Python] Honor tzinfo when converting from datetime
kszucs commented on pull request #7805:
URL: https://github.com/apache/arrow/pull/7805#issuecomment-661106422
My main concern with this solution is while it resolves the pandas roundtrip, the intermediate array values are different.
People may "rely" on the previous buggy behavior, and I'm afraid that it'll cause more post release trouble than we expect.
Running the following snippet on three different revisions:
```py
import pytz
from datetime import datetime
import pyarrow as pa
now_at_budapest = datetime.now(pytz.timezone('Europe/Budapest'))
arr = pa.array([now_at_budapest], type=pa.timestamp('s', tz='Europe/Budapest'))
try:
pa.show_versions()
except AttributeError:
print("Arrow version: {}".format(pa.__version__))
print(arr)
print(arr.to_pandas())
```
### 0.17.1
```py
Arrow version: 0.17.1
[
2020-07-20 17:01:11
]
0 2020-07-20 19:01:11+02:00
dtype: datetime64[ns, Europe/Budapest]
```
### Master
```py
pyarrow version info
--------------------
Package kind: not indicated
Arrow C++ library version: 1.0.0-SNAPSHOT
Arrow C++ compiler: AppleClang 11.0.3.11030032
Arrow C++ compiler flags: -Qunused-arguments -fcolor-diagnostics -ggdb -O0
Arrow C++ git revision: 210d3609f027ef9ed83911c2d1132cb9cbb2dc06
Arrow C++ git description: apache-arrow-0.17.0-756-g210d3609f
[
2020-07-20 17:10:11
]
0 2020-07-20 19:10:11+02:00
dtype: datetime64[ns, Europe/Budapest]
```
### This patch
```py
pyarrow version inf
--------------------
Package kind: not indicated
Arrow C++ library version: 1.0.0-SNAPSHOT
Arrow C++ compiler: AppleClang 11.0.3.11030032
Arrow C++ compiler flags: -Qunused-arguments -fcolor-diagnostics -ggdb -O0
Arrow C++ git revision: a5b2a51665ab1383fb371ecd76bb3c20c4bf8726
Arrow C++ git description: apache-arrow-0.17.0-761-ga5b2a5166
[
2020-07-20 15:01:12
]
0 2020-07-20 17:01:12+02:00
dtype: datetime64[ns, Europe/Budapest]
```
While the current master works for this example and the [spark patch](https://github.com/apache/arrow/pull/7804) fixes the spark integration test, it breaks the nested roundtrip example discussed in the ML thread.
@emkornfield @BryanCutler thoughts?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org