You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/06 02:27:17 UTC

[GitHub] [arrow] zhangyingmath opened a new issue, #13316: ValueError: assignment destination is read-only after converting pyarrow table to pandas with split_blocks=True

zhangyingmath opened a new issue, #13316:
URL: https://github.com/apache/arrow/issues/13316

   ```tbl = pa.Table.from_arrays([pa.array([1,2]), pa.array([1.0, 2.0])], names=['f', 'g'])
   df1 = tbl.to_pandas(split_blocks=True)
   df1.loc[0, 'f'] = 100
   ---------------------------------------------------------------------------
   ValueError                                Traceback (most recent call last)
   [<ipython-input-11-3de6d06fe21c>](https://localhost:8080/#) in <module>()
   ----> 1 df1.loc[0, 'f'] = 100
   
   7 frames
   [/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py](https://localhost:8080/#) in set_inplace(self, locs, values)
       358         create a new array and always creates a new Block.
       359         """
   --> 360         self.values[locs] = values
       361 
       362     def delete(self, loc) -> None:
   
   ValueError: assignment destination is read-only
   ```
   I tried to modify column 'g' (float type) and got the same error as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] zhangyingmath commented on issue #13316: ValueError: assignment destination is read-only after converting pyarrow table to pandas with split_blocks=True

Posted by GitBox <gi...@apache.org>.

zhangyingmath commented on issue #13316:
URL: https://github.com/apache/arrow/issues/13316#issuecomment-1148135630

   I see, thanks for the explanation! Yes it may be good to add the zero-copy effect into the doc string of Table.to_pandas, especially if this will be "permanent behavior". If it is still in flux, that's ok and nothing needs to be done. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorisvandenbossche commented on issue #13316: ValueError: assignment destination is read-only after converting pyarrow table to pandas with split_blocks=True

Posted by GitBox <gi...@apache.org>.

jorisvandenbossche commented on issue #13316:
URL: https://github.com/apache/arrow/issues/13316#issuecomment-1147141293

   @zhangyingmath that is expected, although I see that this is not properly documented in the docstring (https://arrow.apache.org/docs/dev/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas). It is mentioned in the user guide at https://arrow.apache.org/docs/dev/python/pandas.html#memory-usage-and-zero-copy (emphasis mine):
   
   > `split_blocks=True`, when enabled `Table.to_pandas` produces one internal DataFrame “block” for each column, skipping the “consolidation” step. Note that many pandas operations will trigger consolidation anyway, but the peak memory use may be less than the worst case scenario of a full memory doubling. **As a result of this option, we are able to do zero copy conversions of columns in the same cases where we can do zero copy with Array and ChunkedArray.**
   
   Currently using `split_blocks=True` automatically also implicates that pyarrow will try to do the conversion zero-copy. And when it is zero-copy, the resulting numpy arrays are read-only, because in pyarrow the data is immutable. 
   If we want to mutate the data afterwards, I think the best option is to do make a copy of the resulting dataframe (although that defeats the purpose of using `split_blocks=True` to avoid copies).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] zhangyingmath closed issue #13316: ValueError: assignment destination is read-only after converting pyarrow table to pandas with split_blocks=True

Posted by GitBox <gi...@apache.org>.

zhangyingmath closed issue #13316: ValueError: assignment destination is read-only after converting pyarrow table to pandas with split_blocks=True
URL: https://github.com/apache/arrow/issues/13316


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org