You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Antoine Pitrou (JIRA)" <ji...@apache.org> on 2019/04/17 15:12:00 UTC
[jira] [Commented] (ARROW-5122) [Python] pyarrow.parquet.read_table
raises non-file path error when given a windows path to a directory
[ https://issues.apache.org/jira/browse/ARROW-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820177#comment-16820177 ]
Antoine Pitrou commented on ARROW-5122:
---------------------------------------
[~IlyaOrson], is it possible for you to test again on 0.13.0? We have changed some of the filesystem handling code and this might have been fixed already.
> [Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory
> -------------------------------------------------------------------------------------------------------
>
> Key: ARROW-5122
> URL: https://issues.apache.org/jira/browse/ARROW-5122
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.12.1
> Environment: Windows
> Reporter: Ilya Orson Sandoval
> Priority: Minor
> Fix For: 0.14.0
>
>
> I think this might be a small bug with the read_table interface when used to load a directory full of parquets in Windows. It works just fine if I use directly a ParquetDataset object to read the table represented by the directory, or if I use {{read_table}} in a linux terminal.
> Apparently the problem comes from the {{_make_manifest()}} method in {{parquet.py}}, I think around line ~1045. Either {{_is_path_like()}} or the FileSystem method {{isdir()}} fail to recognize the path as a valid directory (I tested with a raw Windows path and a {{pathlib.WindowsPath}} object).
> I hope this helps a little.
> P.D. Thank you for your effort developing this package!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)