You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "SHIMA Tatsuya (Jira)" <ji...@apache.org> on 2022/10/21 10:52:00 UTC
[jira] [Created] (ARROW-18123) [Python] Cannot use multi-byte characters in file names
SHIMA Tatsuya created ARROW-18123:
-------------------------------------
Summary: [Python] Cannot use multi-byte characters in file names
Key: ARROW-18123
URL: https://issues.apache.org/jira/browse/ARROW-18123
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 9.0.0
Reporter: SHIMA Tatsuya
Error when specifying a file path containing multi-byte characters in {{pyarrow.parquet.write_table}}.
For example, use {{例.parquet}} as the file path.
{code:python}
Python 3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> import pyarrow as pa
>>> df = pd.DataFrame({'one': [-1, np.nan, 2.5],
... 'two': ['foo', 'bar', 'baz'],
... 'three': [True, False, True]},
... index=list('abc'))
>>> table = pa.Table.from_pandas(df)
>>> import pyarrow.parquet as pq
>>> pq.write_table(table, '例.parquet')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
line 2920, in write_table
with ParquetWriter(
File
"/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
line 911, in __init__
filesystem, path = _resolve_filesystem_and_path(
File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/fs.py", line
184, in _resolve_filesystem_and_path
filesystem, path = FileSystem.from_uri(path)
File "pyarrow/_fs.pyx", line 463, in pyarrow._fs.FileSystem.from_uri
File "pyarrow/error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Cannot parse URI: '例.parquet'
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)