You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/06 07:32:56 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #13316: ValueError: assignment destination is read-only after converting pyarrow table to pandas with split_blocks=True

jorisvandenbossche commented on issue #13316:
URL: https://github.com/apache/arrow/issues/13316#issuecomment-1147141293

   @zhangyingmath that is expected, although I see that this is not properly documented in the docstring (https://arrow.apache.org/docs/dev/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas). It is mentioned in the user guide at https://arrow.apache.org/docs/dev/python/pandas.html#memory-usage-and-zero-copy (emphasis mine):
   
   > `split_blocks=True`, when enabled `Table.to_pandas` produces one internal DataFrame “block” for each column, skipping the “consolidation” step. Note that many pandas operations will trigger consolidation anyway, but the peak memory use may be less than the worst case scenario of a full memory doubling. **As a result of this option, we are able to do zero copy conversions of columns in the same cases where we can do zero copy with Array and ChunkedArray.**
   
   Currently using `split_blocks=True` automatically also implicates that pyarrow will try to do the conversion zero-copy. And when it is zero-copy, the resulting numpy arrays are read-only, because in pyarrow the data is immutable. 
   If we want to mutate the data afterwards, I think the best option is to do make a copy of the resulting dataframe (although that defeats the purpose of using `split_blocks=True` to avoid copies).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org