You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Ilya Orson Sandoval (JIRA)" <ji...@apache.org> on 2019/04/04 23:29:00 UTC
[jira] [Created] (ARROW-5122) pyarrow.parquet.read_table raises
non-file path error when given a windows path to a directory
Ilya Orson Sandoval created ARROW-5122:
------------------------------------------
Summary: pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory
Key: ARROW-5122
URL: https://issues.apache.org/jira/browse/ARROW-5122
Project: Apache Arrow
Issue Type: Bug
Affects Versions: 0.12.1
Environment: Windows
Reporter: Ilya Orson Sandoval
I think this might be a small bug with the read_table interface when used to load a directory full of parquets in Windows. It works just fine if I use directly a ParquetDataset object to read the table represented by the directory, or if I use {{read_table}} in a linux terminal.
Apparently the problem comes from the {{_make_manifest()}} method in {{parquet.py}}, I think around line ~1045. Either {{_is_path_like()}} or the FileSystem method {{isdir()}} fail to recognize the path as a valid directory (I tested with a raw Windows path and a {{pathlib.WindowsPath}} object).
I hope this helps a little.
P.D. Thank you for your effort developing this package!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)