You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/29 08:42:13 UTC

[GitHub] [arrow] milesgranger opened a new issue, #14759: [C++][Python] Construct variable list array with dtype=object in arrow_to_pandas

milesgranger opened a new issue, #14759:
URL: https://github.com/apache/arrow/issues/14759

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   The following raises a `numpy.VisibleDeprecationWarning`
   
   ```python
   import pyarrow as pa
   import pandas as pd
   
   s = (pa.array([[[1, 2, 3], [4]], None],
                         type=pa.large_list(pa.large_list(pa.int64())))
                .to_pandas())
   
   # Here, suspect it's due to how the numpy array is created
   # inside of `arrow_to_pandas.cc`?
   s.equals(pd.Series([[[1, 2, 3], [4]], None], dtype=object))
   ```
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on issue #14759: [Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
milesgranger commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330474544

   Okay, makes sense now. I suppose we can close this now. Thank you :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #14759: [C++][Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330280164

   The warning only happens in the `Series.equals` method, right? In that case it is probably more a pandas issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #14759: [C++][Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330303622

   Ah, that seems to be because it's not a Series of lists, but of arrays:
   
   ```
   In [9]: s.values
   Out[9]: 
   array([array([array([1, 2, 3]), array([4])], dtype=object), None],
         dtype=object)
   ```
   
   So if calling `equals` with two Series with list elements doesn't trigger the warning:
   
   ```
   In [6]: s1 = pd.Series([[[1, 2, 3], [4]], None], dtype=object)
   
   In [7]: s2 = pd.Series([[[1, 2, 3], [4]], None], dtype=object)
   
   In [8]: s1.equals(s2)
   Out[8]: True
   ```
   
   but with one using arrays as elements, you see it:
   
   ```
   In [10]: s1 = pd.Series([np.array([[1, 2, 3], [4]], dtype=object), None], dtype=object)
   
   In [11]: s1.equals(s2)
   /home/joris/miniconda3/envs/arrow-dev/lib/python3.10/site-packages/pandas/core/dtypes/missing.py:575: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
     return lib.array_equivalent_object(
   Out[11]: False
   ```
   
   So that's a pandas "bug" in the `equals` method. Now, this warning was just turned into an error in the recently release numpy 1.24, so I don't know if that is still worth fixing for pandas. 
   
   (in any case we can just ignore that warning for our test suite, since it is strictly coming from pandas)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #14759: [Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330358651

   No idea, we should try with numpy 1.24 to see what this does (however I would assume some of our CI is already using it, and that didn't start failing)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #14759: [Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330436673

   > it seems the VisibleDeprecationWarning goes away altogether
   
   Yes, that's the intent. The deprecation was enforced in 1.24.0 (see second bullet point in the "Expired deprecations" section in https://github.com/numpy/numpy/releases/tag/v1.24.0rc1). So that means that some things will now raise an error instead of automatically becoming object dtype. But I suppose the pandas testing code is handling that correctly then if the error doesn't bubble up and it still correctly returns True/False for `equals`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on issue #14759: [Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
milesgranger commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330368523

   Just tried with `1.24.0-rc1` and it seems to go fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on issue #14759: [Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
milesgranger commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330356463

   Ah, sharp eye. :eagle: 
   Since it's an error in numpy 1.24, is it still worth raising an issue in Pandas for it, or will it work it itself out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger closed issue #14759: [Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
milesgranger closed issue #14759: [Python] Construct variable list array with dtype=object in arrow_to_pandas
URL: https://github.com/apache/arrow/issues/14759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on issue #14759: [C++][Python] Construct variable list array with dtype=object in arrow_to_pandas

Posted by GitBox <gi...@apache.org>.
milesgranger commented on issue #14759:
URL: https://github.com/apache/arrow/issues/14759#issuecomment-1330283288

   That could be. I cannot seem to reproduce it otherwise using 'normal' Series construction. If `s` in the example is created directly the warning isn't produced. Therefore expected that this was due to arrow side somehow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org