You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Ilya Orson Sandoval (JIRA)" <ji...@apache.org> on 2019/04/04 23:29:00 UTC

[jira] [Created] (ARROW-5122) pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory

Ilya Orson Sandoval created ARROW-5122:
------------------------------------------

             Summary: pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory
                 Key: ARROW-5122
                 URL: https://issues.apache.org/jira/browse/ARROW-5122
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 0.12.1
         Environment: Windows
            Reporter: Ilya Orson Sandoval


I think this might be a small bug with the read_table interface when used to load a directory full of parquets in Windows. It works just fine if I use directly a ParquetDataset object to read the table represented by the directory, or if I use {{read_table}} in a linux terminal.

Apparently the problem comes from the {{_make_manifest()}} method in {{parquet.py}}, I think around line ~1045. Either {{_is_path_like()}} or the FileSystem method {{isdir()}} fail to recognize the path as a valid directory (I tested with a raw Windows path and a {{pathlib.WindowsPath}} object).

I hope this helps a little.

P.D. Thank you for your effort developing this package!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)