You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/06/15 13:31:55 UTC

[GitHub] [arrow] jorisvandenbossche opened a new issue, #36096: [Python] Array vs ChunkedArray to_pandas discrepancy for pandas extension dtypes implementing __from_arrow__

jorisvandenbossche opened a new issue, #36096:
URL: https://github.com/apache/arrow/issues/36096

   When you have a arrow type that maps to a pandas extension dtype implementing `__from_arrow__`, the `ChunkedArray.to_pandas` conversion will call the `pd_dtype.__from_arrow__`, but the `Array.to_pandas` version does not.
   
   Typically, if only encounter such dtypes this if you have an pyarrow ExtensionArray as well, and in `ExtensionArray.to_pandas`, we also check for the dtype having `__from_arrow__`:
   
   https://github.com/apache/arrow/blob/475b5b9463b64bec4e03a47e3277076db246bd35/python/pyarrow/array.pxi#L3094-L3104
   
   But the base class `Array.to_pandas` doesn't do this. And recently, one of the pandas dtypes that map to a non-extension array on the pyarrow side (i.e. DatetimeTZDtype, mapping to timestamp with tz) added `__from_arrow__`. Which means that for this dtype, the conversion takes a different code path. 
   
   ```
   from datetime import datetime
   import pyarrow as pa
   
   arr = pa.array([datetime(1, 1, 1)], pa.timestamp("s", tz="America/New_York"))
   table = pa.table({'a': arr})
   # doesn't call DatetimeTZDtype.__from_arrow__
   arr.to_pandas()
   # ChunkedArray does call that
   table["a"].to_pandas()
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche closed issue #36096: [Python] Array vs ChunkedArray to_pandas discrepancy for pandas extension dtypes implementing __from_arrow__

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche closed issue #36096: [Python] Array vs ChunkedArray to_pandas discrepancy for pandas extension dtypes implementing __from_arrow__
URL: https://github.com/apache/arrow/issues/36096


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org