You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Brandon B. Miller (Jira)" <ji...@apache.org> on 2020/08/17 16:48:00 UTC

[jira] [Created] (ARROW-9772) Optionally allow for to_pandas to return writeable pandas objects

Brandon B. Miller created ARROW-9772:
----------------------------------------

             Summary: Optionally allow for to_pandas to return writeable pandas objects
                 Key: ARROW-9772
                 URL: https://issues.apache.org/jira/browse/ARROW-9772
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
    Affects Versions: 0.17.1
            Reporter: Brandon B. Miller


In cuDF, I'd like to leverage pyarrow to facilitate the conversion from cuDF series and dataframe objects into the equivalent pandas objects. Concretely I'd like something like this to work:

 

`pandas_object = cudf_object.to_arrow().to_pandas()`. 

 

This allows us to stay consistent with the way the rest of the pyarrow ecosystem handles nulls, dtype conversions and the like without having to reinvent the wheel. However I noticed that in some zero copy scenarios, pyarrow doesn't seem to fully release the underlying buffers when converting `to_pandas()`. The resulting objects are immutable and if one tries to mutate the data they will encounter 

 

`ValueError: assignment destination is read-only`

 

This creates a slightly strange situation where a user might encounter issues that subtly stem from the fact that arrow was used to construct the offending pandas object. It would be nice to be able to toggle this behavior using a kwarg or something similar. I suspect this could come up in other situations where libraries want to convert back and forth between equivalent python objects through arrow and expect the final object they get to behave as if it were constructed via other means. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)