You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "danepitkin (via GitHub)" <gi...@apache.org> on 2023/06/26 22:27:33 UTC

[GitHub] [arrow] danepitkin opened a new pull request, #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

danepitkin opened a new pull request, #36314:
URL: https://github.com/apache/arrow/pull/36314

   ### Rationale for this change
   
   Array.to_pandas should mimic ChunkedArray.to_pandas implementation. Notably, there is a missing call to `__from_arrow__` if the attribute exists.
   
   ### Are these changes tested?
   
   Requires manually kicking off integration tests to test with pandas nightly (can only be done by a committer).
   
   ### Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1608410768

   Revision: 2ef5a7b5a6ab32477ad0269e10cc357475763de8
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-f4df154d40](https://github.com/ursacomputing/crossbow/branches/all?query=actions-f4df154d40)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.10-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/5383683845/jobs/9770734296)|
   |test-conda-python-3.10-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/5383681879/jobs/9770730586)|
   |test-conda-python-3.10-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.10-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5383682064/jobs/9770730755)|
   |test-conda-python-3.10-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.10-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/5383683600/jobs/9770733864)|
   |test-conda-python-3.10-spark-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.10-spark-master)](https://github.com/ursacomputing/crossbow/actions/runs/5383682954/jobs/9770732732)|
   |test-conda-python-3.11-dask-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5383684585/jobs/9770735732)|
   |test-conda-python-3.11-dask-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5383684127/jobs/9770734776)|
   |test-conda-python-3.11-pandas-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.11-pandas-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5383682680/jobs/9770732158)|
   |test-conda-python-3.8-pandas-1.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.8-pandas-1.0)](https://github.com/ursacomputing/crossbow/actions/runs/5383683373/jobs/9770733482)|
   |test-conda-python-3.8-spark-v3.1.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.8-spark-v3.1.2)](https://github.com/ursacomputing/crossbow/actions/runs/5383684431/jobs/9770735424)|
   |test-conda-python-3.9-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.9-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5383682537/jobs/9770731761)|
   |test-conda-python-3.9-spark-v3.2.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-f4df154d40-github-test-conda-python-3.9-spark-v3.2.0)](https://github.com/ursacomputing/crossbow/actions/runs/5383682300/jobs/9770731272)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1246734790


##########
python/pyarrow/array.pxi:
##########
@@ -1475,6 +1475,22 @@ cdef class Array(_PandasConvertible):
         return self.take(indices)
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
+        pandas_dtype = None
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif self.type.id == _Type_EXTENSION:
+            try:
+                pandas_dtype = self.type.to_pandas_dtype()
+            except NotImplementedError:
+                pass
+
+        # Only call __from_arrow__ for Arrow extension types or when explicitly
+        # overridden via types_mapper
+        if hasattr(pandas_dtype, '__from_arrow__'):
+            arr = pandas_dtype.__from_arrow__(self)
+            return pandas_api.series(arr, copy=False)

Review Comment:
   +1 on code deduplication, I'll give it a shot. The only differences currently are the parameters passed into `pandas_api.series(arr)`. Arrays want copy=False, and ChunkedArray has enough info to add name=self._name. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1611070229

   Hmm, so the failing tests point out an issue with this approach: we have some keywords to control the conversion, notably `timestamp_as_object` in this case. 
   And if the user passes this, and we just call `pandas_dtype.__from_arrow__` nonetheless, this keyword gets ignored.
   
   But, we also already have this problem, as this already happens for the ChunkedArray conversion:
   
   ```
   In [8]: from datetime import datetime
      ...: import pyarrow as pa
      ...: 
      ...: arr = pa.array([datetime(2001, 1, 1)], pa.timestamp("s", tz="America/New_York"))
      ...: table = pa.table({'a': arr})
   
   In [9]: arr.to_pandas(timestamp_as_object=True)
   Out[9]: 
   0    2000-12-31 19:00:00-05:00
   dtype: object
   
   In [10]: table["a"].to_pandas(timestamp_as_object=True)
   Out[10]: 
   0   2000-12-31 19:00:00-05:00
   Name: a, dtype: datetime64[ns, America/New_York]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1245832149


##########
python/pyarrow/array.pxi:
##########
@@ -3104,12 +3109,18 @@ cdef class ExtensionArray(Array):
 
     def _to_pandas(self, options, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
+        types_mapper = kwargs.get('types_mapper', None)
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   The only difference here now is calling `self.type.to_pandas_dtype()` vs `self.storage.type.to_pandas_dtype()` (storage gets passed to the base Array function). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1611085273

   One option could be to decide that we _only_ want to use this `pandas_dtype.__from_arrow__` if you actually have an extension type on the arrow side (and so don't do this for timestamp data, also not for the ChunkedArray case), or when you specified a pandas extension dtype with `types_mapper`. Then for all default cases, we (pyarrow) just own the conversion and can ensure the keywords are honored.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1613543571

   Revision: 42f1f278a9e1d9d2957f2335a35367e60303b9d2
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-8c684146d9](https://github.com/ursacomputing/crossbow/branches/all?query=actions-8c684146d9)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.10-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/5414888031/jobs/9842584175)|
   |test-conda-python-3.10-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/5414887095/jobs/9842581702)|
   |test-conda-python-3.10-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.10-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5414889282/jobs/9842586710)|
   |test-conda-python-3.10-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.10-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/5414886612/jobs/9842580349)|
   |test-conda-python-3.10-spark-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.10-spark-master)](https://github.com/ursacomputing/crossbow/actions/runs/5414886352/jobs/9842579931)|
   |test-conda-python-3.11-dask-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5414887186/jobs/9842581995)|
   |test-conda-python-3.11-dask-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5414888676/jobs/9842585407)|
   |test-conda-python-3.11-pandas-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.11-pandas-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5414888353/jobs/9842584734)|
   |test-conda-python-3.8-pandas-1.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.8-pandas-1.0)](https://github.com/ursacomputing/crossbow/actions/runs/5414888916/jobs/9842585949)|
   |test-conda-python-3.8-spark-v3.1.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.8-spark-v3.1.2)](https://github.com/ursacomputing/crossbow/actions/runs/5414886108/jobs/9842579449)|
   |test-conda-python-3.9-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.9-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5414887843/jobs/9842583515)|
   |test-conda-python-3.9-spark-v3.2.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-8c684146d9-github-test-conda-python-3.9-spark-v3.2.0)](https://github.com/ursacomputing/crossbow/actions/runs/5414887658/jobs/9842583144)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1613540007

   @github-actions crossbow submit -g integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1608408240

   @github-actions crossbow submit -g integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1245837037


##########
python/pyarrow/array.pxi:
##########
@@ -3104,12 +3109,18 @@ cdef class ExtensionArray(Array):
 
     def _to_pandas(self, options, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
+        types_mapper = kwargs.get('types_mapper', None)
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   I did remove the issubclass check in a recent refactor!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1245832149


##########
python/pyarrow/array.pxi:
##########
@@ -3104,12 +3109,18 @@ cdef class ExtensionArray(Array):
 
     def _to_pandas(self, options, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
+        types_mapper = kwargs.get('types_mapper', None)
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   The only difference here now is calling `self.type.to_pandas_dtype()` vs `self.storage.type.to_pandas_dtype()` when checking if `__from_arrow__` attr exists(storage gets passed to the base Array function). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1623002914

   Conbench analyzed the 6 benchmark runs on commit `d6a1968c`.
   
   There was 1 benchmark result indicating a performance regression:
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-07-04 04:02:33Z](http://conbench.ursa.dev/compare/runs/4b18cf8015274569a7c1e2c6fba44715...16d4038f515e41b48c612c91438f8ae8/)
     - [params=DecodeArrow_Dense/32768, source=cpp-micro, suite=parquet-encoding-benchmark](http://conbench.ursa.dev/compare/benchmarks/064a3708e7a575178000bc4045e04c7e...064a39a42b467e7080009b81bec81389)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14815567584) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1246417026


##########
python/pyarrow/array.pxi:
##########
@@ -3104,12 +3109,18 @@ cdef class ExtensionArray(Array):
 
     def _to_pandas(self, options, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
+        types_mapper = kwargs.get('types_mapper', None)
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   Currently we are indeed passing down `self.storage`, but I seem to remember that we added similar logic to the underlying C++ ConvertArrayToPandas to also (by default) use the storage array to do the actual conversion. So it _might_ actually work by just removing the overridden method completely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1246232081


##########
python/pyarrow/array.pxi:
##########
@@ -1475,6 +1475,22 @@ cdef class Array(_PandasConvertible):
         return self.take(indices)
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
+        pandas_dtype = None
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif self.type.id == _Type_EXTENSION:
+            try:
+                pandas_dtype = self.type.to_pandas_dtype()
+            except NotImplementedError:
+                pass
+
+        # Only call __from_arrow__ for Arrow extension types or when explicitly
+        # overridden via types_mapper
+        if hasattr(pandas_dtype, '__from_arrow__'):
+            arr = pandas_dtype.__from_arrow__(self)
+            return pandas_api.series(arr, copy=False)

Review Comment:
   Given that this code is both here and in the ChunkedArray version, and both methods also end up calling `_array_like_to_pandas`, we can maybe move this logic there to avoid duplicating it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1246872075


##########
python/pyarrow/array.pxi:
##########
@@ -1475,6 +1475,22 @@ cdef class Array(_PandasConvertible):
         return self.take(indices)
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
+        pandas_dtype = None
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif self.type.id == _Type_EXTENSION:
+            try:
+                pandas_dtype = self.type.to_pandas_dtype()
+            except NotImplementedError:
+                pass
+
+        # Only call __from_arrow__ for Arrow extension types or when explicitly
+        # overridden via types_mapper
+        if hasattr(pandas_dtype, '__from_arrow__'):
+            arr = pandas_dtype.__from_arrow__(self)
+            return pandas_api.series(arr, copy=False)

Review Comment:
   From my testing, `Array` does always have a `None` "column" name. Good to know `ChunkedArray` could benefit from `copy=False`! I thought maybe it was left out on purpose since the memory is not guaranteed to be contiguous.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1612150293

   Revision: 3d0a2df0fdec0c38688465cd0ca23584a02f2b9f
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-565dfc9bdc](https://github.com/ursacomputing/crossbow/branches/all?query=actions-565dfc9bdc)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.10-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/5406152632/jobs/9822678167)|
   |test-conda-python-3.10-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/5406153622/jobs/9822680441)|
   |test-conda-python-3.10-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.10-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5406153855/jobs/9822680977)|
   |test-conda-python-3.10-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.10-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/5406152827/jobs/9822678501)|
   |test-conda-python-3.10-spark-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.10-spark-master)](https://github.com/ursacomputing/crossbow/actions/runs/5406153211/jobs/9822679506)|
   |test-conda-python-3.11-dask-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5406152992/jobs/9822678948)|
   |test-conda-python-3.11-dask-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5406152389/jobs/9822677534)|
   |test-conda-python-3.11-pandas-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.11-pandas-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5406151667/jobs/9822676066)|
   |test-conda-python-3.8-pandas-1.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.8-pandas-1.0)](https://github.com/ursacomputing/crossbow/actions/runs/5406152266/jobs/9822677298)|
   |test-conda-python-3.8-spark-v3.1.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.8-spark-v3.1.2)](https://github.com/ursacomputing/crossbow/actions/runs/5406151829/jobs/9822676272)|
   |test-conda-python-3.9-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.9-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5406153400/jobs/9822679957)|
   |test-conda-python-3.9-spark-v3.2.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-565dfc9bdc-github-test-conda-python-3.9-spark-v3.2.0)](https://github.com/ursacomputing/crossbow/actions/runs/5406152048/jobs/9822676719)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1246903039


##########
python/pyarrow/array.pxi:
##########
@@ -3104,12 +3109,18 @@ cdef class ExtensionArray(Array):
 
     def _to_pandas(self, options, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
+        types_mapper = kwargs.get('types_mapper', None)
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   Looks like you are right! https://github.com/apache/arrow/blob/0cea12ffc87915e3808d153d4d8e5f115e48d487/python/pyarrow/src/arrow/python/arrow_to_pandas.cc#L2455-L2458
   
   Running pytests locally seems to agree as well. I'll run the integration tests with this change to verify.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1612191177

   Revision: 946040dc198931273b914b12e8b5ef91bec16203
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-811fffe409](https://github.com/ursacomputing/crossbow/branches/all?query=actions-811fffe409)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.10-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/5406373024/jobs/9823180476)|
   |test-conda-python-3.10-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/5406373442/jobs/9823181272)|
   |test-conda-python-3.10-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.10-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5406371395/jobs/9823176593)|
   |test-conda-python-3.10-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.10-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/5406371270/jobs/9823176287)|
   |test-conda-python-3.10-spark-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.10-spark-master)](https://github.com/ursacomputing/crossbow/actions/runs/5406371850/jobs/9823177936)|
   |test-conda-python-3.11-dask-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5406372003/jobs/9823178292)|
   |test-conda-python-3.11-dask-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5406372553/jobs/9823179577)|
   |test-conda-python-3.11-pandas-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.11-pandas-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5406372399/jobs/9823179255)|
   |test-conda-python-3.8-pandas-1.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.8-pandas-1.0)](https://github.com/ursacomputing/crossbow/actions/runs/5406372794/jobs/9823180040)|
   |test-conda-python-3.8-spark-v3.1.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.8-spark-v3.1.2)](https://github.com/ursacomputing/crossbow/actions/runs/5406372194/jobs/9823178791)|
   |test-conda-python-3.9-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.9-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5406373213/jobs/9823180853)|
   |test-conda-python-3.9-spark-v3.2.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-811fffe409-github-test-conda-python-3.9-spark-v3.2.0)](https://github.com/ursacomputing/crossbow/actions/runs/5406371662/jobs/9823177399)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1608406037

   :warning: GitHub issue #36096 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1247512173


##########
python/pyarrow/array.pxi:
##########
@@ -1475,6 +1475,22 @@ cdef class Array(_PandasConvertible):
         return self.take(indices)
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
+        pandas_dtype = None
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif self.type.id == _Type_EXTENSION:
+            try:
+                pandas_dtype = self.type.to_pandas_dtype()
+            except NotImplementedError:
+                pass
+
+        # Only call __from_arrow__ for Arrow extension types or when explicitly
+        # overridden via types_mapper
+        if hasattr(pandas_dtype, '__from_arrow__'):
+            arr = pandas_dtype.__from_arrow__(self)
+            return pandas_api.series(arr, copy=False)

Review Comment:
   > I thought maybe it was left out on purpose since the memory is not guaranteed to be contiguous.
   
   ChunkedArray itself can indeed consist of multiple chunks, but at the moment we reach the `pandas_api.series(arr, copy=False)` code, we already converted the ChunkedArray to a single contiguous numpy array (or pandas ExtensionArray). And anyway, this `copy=False` is mostly to indicate to pandas they don't need to do an extra copy (https://github.com/apache/arrow/pull/34593).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #36314:
URL: https://github.com/apache/arrow/pull/36314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1612147914

   @github-actions crossbow submit -g integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1246770844


##########
python/pyarrow/array.pxi:
##########
@@ -1475,6 +1475,22 @@ cdef class Array(_PandasConvertible):
         return self.take(indices)
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
+        pandas_dtype = None
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif self.type.id == _Type_EXTENSION:
+            try:
+                pandas_dtype = self.type.to_pandas_dtype()
+            except NotImplementedError:
+                pass
+
+        # Only call __from_arrow__ for Arrow extension types or when explicitly
+        # overridden via types_mapper
+        if hasattr(pandas_dtype, '__from_arrow__'):
+            arr = pandas_dtype.__from_arrow__(self)
+            return pandas_api.series(arr, copy=False)

Review Comment:
   `copy=False` should actually also be added to the ChunkedArray version, so that's fine (would be an improvement)
   
   For `self._name`, that's also already handled inside `_array_like_to_pandas`, so I think that should work fine for both Array and ChunkedArray (I assume that for Array, it's just always None)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1612010190

   Revision: 07ea7c769ad72de878b5f284134382b52759e7e8
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-bb4b259bdd](https://github.com/ursacomputing/crossbow/branches/all?query=actions-bb4b259bdd)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.10-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.10-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions/runs/5405225495/jobs/9820475444)|
   |test-conda-python-3.10-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.10-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions/runs/5405225658/jobs/9820475858)|
   |test-conda-python-3.10-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.10-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5405227905/jobs/9820480840)|
   |test-conda-python-3.10-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.10-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions/runs/5405225364/jobs/9820475240)|
   |test-conda-python-3.10-spark-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.10-spark-master)](https://github.com/ursacomputing/crossbow/actions/runs/5405227140/jobs/9820479289)|
   |test-conda-python-3.11-dask-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.11-dask-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5405226893/jobs/9820478726)|
   |test-conda-python-3.11-dask-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.11-dask-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5405226667/jobs/9820478193)|
   |test-conda-python-3.11-pandas-upstream_devel|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.11-pandas-upstream_devel)](https://github.com/ursacomputing/crossbow/actions/runs/5405225828/jobs/9820476306)|
   |test-conda-python-3.8-pandas-1.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.8-pandas-1.0)](https://github.com/ursacomputing/crossbow/actions/runs/5405227660/jobs/9820480328)|
   |test-conda-python-3.8-spark-v3.1.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.8-spark-v3.1.2)](https://github.com/ursacomputing/crossbow/actions/runs/5405227607/jobs/9820480275)|
   |test-conda-python-3.9-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.9-pandas-latest)](https://github.com/ursacomputing/crossbow/actions/runs/5405226035/jobs/9820476831)|
   |test-conda-python-3.9-spark-v3.2.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-bb4b259bdd-github-test-conda-python-3.9-spark-v3.2.0)](https://github.com/ursacomputing/crossbow/actions/runs/5405226235/jobs/9820477271)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1612007513

   @github-actions crossbow submit -g integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #36314:
URL: https://github.com/apache/arrow/pull/36314#issuecomment-1612189265

   @github-actions crossbow submit -g integration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1244957996


##########
python/pyarrow/array.pxi:
##########
@@ -1475,6 +1475,17 @@ cdef class Array(_PandasConvertible):
         return self.take(indices)
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
+        pandas_dtype = None
+        try:
+            pandas_dtype = self.type.to_pandas_dtype()

Review Comment:
   Something that we should also do the the other places where this logic lives (eg ChunkedArray._to_pandas), but I think we should give precedence to `types_mapper`, if specified. Because as a user, if you specify that keyword, you want to be able to override the default conversion.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1245809241


##########
python/pyarrow/array.pxi:
##########
@@ -1476,12 +1476,17 @@ cdef class Array(_PandasConvertible):
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
 
-        # pandas ExtensionDtype that implements conversion from pyarrow
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   ```suggestion
           elif self.type.id == _Type_EXTENSION:
   ```



##########
python/pyarrow/array.pxi:
##########
@@ -3104,12 +3109,18 @@ cdef class ExtensionArray(Array):
 
     def _to_pandas(self, options, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
+        types_mapper = kwargs.get('types_mapper', None)
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   This check isn't needed here, since this is in the ExtensionArray subclass. 
   
   But more generally, I am wondering if this method is actually still needed now we are adding similar logic to the base class' `to_pandas`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1245843262


##########
python/pyarrow/array.pxi:
##########
@@ -1476,12 +1476,17 @@ cdef class Array(_PandasConvertible):
 
     def _to_pandas(self, options, types_mapper=None, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
 
-        # pandas ExtensionDtype that implements conversion from pyarrow
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   Much cleaner! I'll add to ChunkedArray as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36314: GH-36096: [Python] Call __from_arrow__ in Array.to_pandas

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36314:
URL: https://github.com/apache/arrow/pull/36314#discussion_r1246903039


##########
python/pyarrow/array.pxi:
##########
@@ -3104,12 +3109,18 @@ cdef class ExtensionArray(Array):
 
     def _to_pandas(self, options, **kwargs):
         pandas_dtype = None
-        try:
-            pandas_dtype = self.type.to_pandas_dtype()
-        except NotImplementedError:
-            pass
+        types_mapper = kwargs.get('types_mapper', None)
+
+        if types_mapper:
+            pandas_dtype = types_mapper(self.type)
+        elif issubclass(type(self.type), ExtensionType):

Review Comment:
   Looks like you are right! https://github.com/apache/arrow/blob/0cea12ffc87915e3808d153d4d8e5f115e48d487/python/pyarrow/src/arrow/python/arrow_to_pandas.cc#L2457
   
   Running pytests locally seems to agree as well. I'll run the integration tests with this change to verify.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org