You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/20 15:22:23 UTC

[GitHub] [arrow] kszucs edited a comment on pull request #7805: ARROW-9528: [Python] Honor tzinfo when converting from datetime

kszucs edited a comment on pull request #7805:
URL: https://github.com/apache/arrow/pull/7805#issuecomment-661106422


   My main concern with this solution is while it resolves the pandas roundtrip, the intermediate array values are different.
   People may "rely" on the previous buggy behavior, and I'm afraid that it'll cause more post release trouble than we expect.
   
   Running the following snippet on three different revisions:
   
   ```py
   import pytz
   from datetime import datetime
   
   import pyarrow as pa
   
   now_at_budapest = datetime.now(pytz.timezone('Europe/Budapest'))
   arr = pa.array([now_at_budapest], type=pa.timestamp('s', tz='Europe/Budapest'))
   
   try:
       pa.show_versions()
   except AttributeError:
       print("Arrow version: {}".format(pa.__version__))
   
   print(arr)
   print(arr.to_pandas())
   ```
   
   ### 0.17.1
   
   ```py
   Arrow version: 0.17.1
   [
       2020-07-20 17:01:11
   ]
   0   2020-07-20 19:01:11+02:00
   dtype: datetime64[ns, Europe/Budapest]
   ```
   
   ### Master
   
   ```py
   pyarrow version info
   --------------------
   Package kind: not indicated
   Arrow C++ library version: 1.0.0-SNAPSHOT
   Arrow C++ compiler: AppleClang 11.0.3.11030032
   Arrow C++ compiler flags:  -Qunused-arguments -fcolor-diagnostics -ggdb -O0
   Arrow C++ git revision: 210d3609f027ef9ed83911c2d1132cb9cbb2dc06
   Arrow C++ git description: apache-arrow-0.17.0-756-g210d3609f
   [
       2020-07-20 17:10:11
   ]
   0   2020-07-20 19:10:11+02:00
   dtype: datetime64[ns, Europe/Budapest]
   ```
   
   ### This patch
   
   ```py
   pyarrow version inf
   --------------------                                                        
   Package kind: not indicated                                                 
   Arrow C++ library version: 1.0.0-SNAPSHOT                                   
   Arrow C++ compiler: AppleClang 11.0.3.11030032                              
   Arrow C++ compiler flags:  -Qunused-arguments -fcolor-diagnostics -ggdb -O0 
   Arrow C++ git revision: a5b2a51665ab1383fb371ecd76bb3c20c4bf8726            
   Arrow C++ git description: apache-arrow-0.17.0-761-ga5b2a5166               
   [                                                                           
     2020-07-20 15:01:12                                                       
   ]                                                                           
   0   2020-07-20 17:01:12+02:00                                               
   dtype: datetime64[ns, Europe/Budapest]                                      
   ```
   
   While the current master works for this example and the [spark patch](https://github.com/apache/arrow/pull/7804) fixes the spark integration test, it breaks the nested roundtrip [example](https://gist.github.com/kszucs/26c58e794d30b7d783bc8484b67d860a) discussed in the ML thread.
   
   @emkornfield @BryanCutler thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org