You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yunbo Deng (Jira)" <ji...@apache.org> on 2022/07/14 21:31:00 UTC

[jira] [Created] (ARROW-17077) Unicode character issue with pyarrow

Yunbo Deng created ARROW-17077:
----------------------------------

             Summary: Unicode character issue with pyarrow
                 Key: ARROW-17077
                 URL: https://issues.apache.org/jira/browse/ARROW-17077
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Yunbo Deng


When running code using databricks SQL connector for Python, it hit a unicode character issue in pyarrow library. The customer has to put a workaround in the client code, something like
"SELECT decode(string(unbase64(value)), 'utf8')"
 
Exception in the main script No data fetched using SQL-statement: SELECT * FROM parquet.`abfss://XXXX@XXXX.xxx.net/structXXXXXXX`. Exception: Unknown error: Wrapping TP H�kan  Sweater failed Traceback (most recent call last):  
File "/home/xxxx/yy/allo/yy/db/sql_reader.py", line 53, in query     rows = cursor.fetchmany(self.MAX_ROWS)  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", line 401, in fetchmany     return self.active_result_set.fetchmany(size)  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", line 630, in fetchmany     return self._convert_arrow_table(self.fetchmany_arrow(size))  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/databricks/sql/client.py", line 563, in _convert_arrow_table     df = table_renamed.to_pandas(  
File "pyarrow/array.pxi", line 822, in pyarrow.lib._PandasConvertible.to_pandas  
File "pyarrow/table.pxi", line 3889, in pyarrow.lib.Table._to_pandas  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 803, in table_to_blockmanager     blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 1155, in _table_to_blocks     return [_reconstruct_block(item, columns, extension_columns)  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 1155, in <listcomp>     return [_reconstruct_block(item, columns, extension_columns)  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 763, in _reconstruct_block     pd_ext_arr = pandas_dtype.__from_arrow__(arr)  
File "/home/xxxx/yy/.venv/lib/python3.10/site-packages/pandas/core/arrays/string_.py", line 217, in __from_arrow__     str_arr = StringArray._from_sequence(np.array(arr))  
File "pyarrow/array.pxi", line 1395, in pyarrow.lib.Array.__array__  
File "pyarrow/array.pxi", line 1441, in pyarrow.lib.Array.to_numpy  
File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status pyarrow.lib.ArrowException: Unknown error: Wrapping TP H�kan  Sweater failed During handling of the above exception, another exception occurred: Traceback (most recent call last):  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)