You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/01 12:54:29 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12178: ARROW-9664: [Python] Array/ChunkedArray.to_pandas do not support types_mapper keyword

jorisvandenbossche commented on a change in pull request #12178:
URL: https://github.com/apache/arrow/pull/12178#discussion_r796442124



##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -4082,6 +4082,66 @@ def test_array_to_pandas():
         # tm.assert_series_equal(result, expected)
 
 
+def test_to_pandas_types_mapper():

Review comment:
       ```suggestion
   def test_array_to_pandas_types_mapper():
   ```
   
   (to differentiate from table.to_pandas)

##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -4082,6 +4082,66 @@ def test_array_to_pandas():
         # tm.assert_series_equal(result, expected)
 
 
+def test_to_pandas_types_mapper():
+    # https://issues.apache.org/jira/browse/ARROW-9664
+    if Version(pd.__version__) < Version("1.0.0"):
+        pytest.skip("ExtensionDtype to_pandas method missing")
+
+    data = pa.array([1, 2, 3], pa.int64())
+
+    # Test with mapper function
+    types_mapper = {pa.int64(): pd.Int64Dtype()}.get
+    result = data.to_pandas(types_mapper=types_mapper)
+    assert result.dtype == types_mapper(data.type)
+
+    # Test mapper function returning None
+    types_mapper = {pa.int64(): None}.get
+    result = data.to_pandas(types_mapper=types_mapper)
+    assert result.dtype == data.type.to_pandas_dtype()
+
+    # Test mapper function not containing the dtype
+    types_mapper = {pa.float64(): pd.Float64Dtype()}.get
+    result = data.to_pandas(types_mapper=types_mapper)
+    assert result.dtype == data.type.to_pandas_dtype()
+
+    # Test for the interval extension dtype
+    # -> ignores mapping and uses default conversion
+    types_mapper = {pa.float64(): pd.IntervalDtype()}.get
+    result = data.to_pandas(types_mapper=types_mapper)

Review comment:
       For this test, I think it would be good to see if we can actually roundtrip a pandas intervaldtype (now this test is basically the same as the case above). So if we start from 
   
   ```
   interval = pd.Series(pd.interval_range(0, 5, 5))
   data = pa.array(interval)
   ```
   
   Can we do `data.to_pandas(..)` in some way to get back the pandas `interval` series? 
   
   This might actually not be super straightforward, as you need to know the exact struct type that has been created .. (which makes me wonder if we should change the interface a bit: while `types_mapper` makes sense for Table conversion where you can have many columns of a certain type, for Array there is simply one result dtype you might want to enforce).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org