You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/06/08 12:59:27 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #35656: GH-33321: [Python] Support converting to non-nano datetime64 for pandas >= 2.0

jorisvandenbossche commented on PR #35656:
URL: https://github.com/apache/arrow/pull/35656#issuecomment-1582537127

   >  For (2), I just need to add support, but it's going to grow this PR even larger unfortunately..
   
   If PR size is a concern, this is also something that could be done as a precursor. It's actually already an issue that shows in conversion to numpy as well:
   
   ```
   # no timezone -> this preserves the unit
   >>> pa.array([1, 2, 3], pa.timestamp('us')).to_numpy()
   array(['1970-01-01T00:00:00.000001', '1970-01-01T00:00:00.000002',
          '1970-01-01T00:00:00.000003'], dtype='datetime64[us]')
   
   # with timezone -> always converts to nanoseconds
   >>> pa.array([1, 2, 3], pa.timestamp('us', tz="Europe/Brussels")).to_numpy()
   ...
   ArrowInvalid: Needed to copy 1 chunks with 0 nulls, but zero_copy_only was True
   
   >>> pa.array([1, 2, 3], pa.timestamp('us', tz="Europe/Brussels")).to_numpy(zero_copy_only=False)
   array(['1970-01-01T00:00:00.000001000', '1970-01-01T00:00:00.000002000',
          '1970-01-01T00:00:00.000003000'], dtype='datetime64[ns]')
   ```
   
   While this could also be perfectly zero-copy to microseconds in the case with a timezone (we just return the underlying UTC values anyway)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org