You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/12/07 13:42:00 UTC
[jira] [Resolved] (ARROW-18123) [Python] Cannot use multi-byte characters in file names in write_table
[ https://issues.apache.org/jira/browse/ARROW-18123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche resolved ARROW-18123.
-------------------------------------------
Resolution: Fixed
Issue resolved by pull request 14764
https://github.com/apache/arrow/pull/14764
> [Python] Cannot use multi-byte characters in file names in write_table
> ----------------------------------------------------------------------
>
> Key: ARROW-18123
> URL: https://issues.apache.org/jira/browse/ARROW-18123
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 9.0.0
> Reporter: SHIMA Tatsuya
> Assignee: Miles Granger
> Priority: Critical
> Labels: pull-request-available
> Fix For: 11.0.0
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> Error when specifying a file path containing multi-byte characters in {{pyarrow.parquet.write_table}}.
> For example, use {{例.parquet}} as the file path.
> {code:python}
> Python 3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pandas as pd
> >>> import numpy as np
> >>> import pyarrow as pa
> >>> df = pd.DataFrame({'one': [-1, np.nan, 2.5],
> ... 'two': ['foo', 'bar', 'baz'],
> ... 'three': [True, False, True]},
> ... index=list('abc'))
> >>> table = pa.Table.from_pandas(df)
> >>> import pyarrow.parquet as pq
> >>> pq.write_table(table, '例.parquet')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
> line 2920, in write_table
> with ParquetWriter(
> File
> "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/parquet/__init__.py",
> line 911, in __init__
> filesystem, path = _resolve_filesystem_and_path(
> File "/home/vscode/.local/lib/python3.10/site-packages/pyarrow/fs.py", line
> 184, in _resolve_filesystem_and_path
> filesystem, path = FileSystem.from_uri(path)
> File "pyarrow/_fs.pyx", line 463, in pyarrow._fs.FileSystem.from_uri
> File "pyarrow/error.pxi", line 144, in
> pyarrow.lib.pyarrow_internal_check_status
> File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Cannot parse URI: '例.parquet'
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)