You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/04/05 13:19:00 UTC

[jira] [Assigned] (ARROW-15982) [Python] parquet.read_table fails to parse home directory path

     [ https://issues.apache.org/jira/browse/ARROW-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche reassigned ARROW-15982:
---------------------------------------------

    Assignee: Colin Jermain

> [Python] parquet.read_table fails to parse home directory path
> --------------------------------------------------------------
>
>                 Key: ARROW-15982
>                 URL: https://issues.apache.org/jira/browse/ARROW-15982
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 7.0.0
>            Reporter: Colin Jermain
>            Assignee: Colin Jermain
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{pyarrow.parquet.read_table}} fails to parse a path with the home directory in it. For example {{"~/test.parquet"}} returns a {{{}FileNotFoundError{}}}, while {{"/home/user/test.parquet"}} reads the file correctly.
> {code:java}
> $ python -c "import pyarrow.parquet; pyarrow.parquet.read_table('~/test.parquet')"  
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
>   File ".../lib/python3.8/site-packages/pyarrow/parquet.py", line 1960, in read_table
>     dataset = _ParquetDatasetV2(
>   File ".../lib/python3.8/site-packages/pyarrow/parquet.py", line 1781, in __init__
>     self._dataset = ds.dataset(path_or_paths, filesystem=filesystem,
>   File ".../lib/python3.8/site-packages/pyarrow/dataset.py", line 667, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File ".../lib/python3.8/site-packages/pyarrow/dataset.py", line 412, in _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
>   File ".../lib/python3.8/site-packages/pyarrow/dataset.py", line 388, in _ensure_single_source
>     raise FileNotFoundError(path)
> FileNotFoundError: ~/test.parquet
> {code}
> The fix for this issue should be as simple as applying {{os.path.expanduser}} in the right places.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)