You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/05/04 23:43:00 UTC

[jira] [Assigned] (ARROW-5310) [Python] better error message on creating ParquetDataset from empty directory

     [ https://issues.apache.org/jira/browse/ARROW-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney reassigned ARROW-5310:
-----------------------------------

    Assignee: Joris Van den Bossche

> [Python] better error message on creating ParquetDataset from empty directory
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-5310
>                 URL: https://issues.apache.org/jira/browse/ARROW-5310
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Assignee: Joris Van den Bossche
>            Priority: Minor
>              Labels: dataset, dataset-parquet-read, parquet
>
> Currently, you get when {{path}} is an existing but empty directory:
> {code:python}
> >>> dataset = pq.ParquetDataset(path)
> ---------------------------------------------------------------------------
> IndexError                                Traceback (most recent call last)
> <ipython-input-16-346f72ae525e> in <module>
> ----> 1 dataset = pq.ParquetDataset(path)
> ~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads, memory_map)
>     989 
>     990         if validate_schema:
> --> 991             self.validate_schemas()
>     992 
>     993         if filters is not None:
> ~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self)
>    1025                 self.schema = self.common_metadata.schema
>    1026             else:
> -> 1027                 self.schema = self.pieces[0].get_metadata().schema
>    1028         elif self.schema is None:
>    1029             self.schema = self.metadata.schema
> IndexError: list index out of range
> {code}
> That could be a nicer error message. 
> Unless we actually want to allow this? (although I am not sure there are good use cases of empty directories to support this, because from an empty directory we cannot get any schema or metadata information?) 
> It is only failing when validating the schemas, so with {{validate_schema=False}} it actually returns a ParquetDataset object, just with an empty list for {{pieces}} and no schema. So it would be easy to not error when validating the schemas as well for this empty-directory case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)