You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/02/21 09:31:23 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #34239: [Python] Conversion of pyArrow table to pandas fails with error: ArrowException: Unknown error: Wrapping Q� failed

jorisvandenbossche commented on issue #34239:
URL: https://github.com/apache/arrow/issues/34239#issuecomment-1438146677

   @DorotaDR checking your data, the last element of the last column is the one that is failing:
   
   ```python
   >>> loaded_table["Embarked"]
   <pyarrow.lib.ChunkedArray object at 0x7f6af61c7a60>
   [
     [
       "S",
       "C",
       "S",
       "S",
       "S",
       ...
       "S",
       "S",
       "S",
       "C",
       "Q�"
     ]
   ]
   # get this last element
   >>> value = loaded_table["Embarked"][-1]
   >>> type(value)
   pyarrow.lib.StringScalar
   # converting to a python string fails
   >>> value.as_py()
   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
   # inspecting just the bytes behind the value
   >>> bytes(value.as_buffer())
   b'Q\xff'
   ```
   
   Did you read the CSV file using arrow as well? I suppose the CSV file is not valid UTF8 encoded (or something went wrong while reading it in)
   
   If I read the CSV file you linked above using pyarrow.csv, this last element is a plain "Q" without the invalid bytes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org