You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "SHIMA Tatsuya (Jira)" <ji...@apache.org> on 2022/10/21 10:52:00 UTC

[jira] [Created] (ARROW-18123) [Python] Cannot use multi-byte characters in file names

SHIMA Tatsuya created ARROW-18123:
-------------------------------------

             Summary: [Python] Cannot use multi-byte characters in file names
                 Key: ARROW-18123
                 URL: https://issues.apache.org/jira/browse/ARROW-18123
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 9.0.0
            Reporter: SHIMA Tatsuya


Error when specifying a file path containing multi-byte characters in {{pyarrow.parquet.write_table}}.

For example, use {{例.parquet}} as the file path.

{code:python}
Python 3.10.7 (main, Oct  5 2022, 14:33:54) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> import pyarrow as pa
>>> df = pd.DataFrame({'one': [-1, np.nan, 2.5],
...                    'two': ['foo', 'bar', 'baz'],
...                    'three': [True, False, True]},
...                    index=list('abc'))
>>> table = pa.Table.from_pandas(df)
>>> import pyarrow.parquet as pq
>>> pq.write_table(table, '例.parquet')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
line 2920, in write_table
    with ParquetWriter(
  File
"/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
line 911, in __init__
    filesystem, path = _resolve_filesystem_and_path(
  File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/fs.py", line
184, in _resolve_filesystem_and_path
    filesystem, path = FileSystem.from_uri(path)
  File "pyarrow/_fs.pyx", line 463, in pyarrow._fs.FileSystem.from_uri
  File "pyarrow/error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Cannot parse URI: '例.parquet'
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)