You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/05/04 23:43:00 UTC
[jira] [Updated] (ARROW-5310) [Python] better error message on
creating ParquetDataset from empty directory
[ https://issues.apache.org/jira/browse/ARROW-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-5310:
--------------------------------
Fix Version/s: 1.0.0
> [Python] better error message on creating ParquetDataset from empty directory
> -----------------------------------------------------------------------------
>
> Key: ARROW-5310
> URL: https://issues.apache.org/jira/browse/ARROW-5310
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Minor
> Labels: dataset, dataset-parquet-read, parquet
> Fix For: 1.0.0
>
>
> Currently, you get when {{path}} is an existing but empty directory:
> {code:python}
> >>> dataset = pq.ParquetDataset(path)
> ---------------------------------------------------------------------------
> IndexError Traceback (most recent call last)
> <ipython-input-16-346f72ae525e> in <module>
> ----> 1 dataset = pq.ParquetDataset(path)
> ~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads, memory_map)
> 989
> 990 if validate_schema:
> --> 991 self.validate_schemas()
> 992
> 993 if filters is not None:
> ~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self)
> 1025 self.schema = self.common_metadata.schema
> 1026 else:
> -> 1027 self.schema = self.pieces[0].get_metadata().schema
> 1028 elif self.schema is None:
> 1029 self.schema = self.metadata.schema
> IndexError: list index out of range
> {code}
> That could be a nicer error message.
> Unless we actually want to allow this? (although I am not sure there are good use cases of empty directories to support this, because from an empty directory we cannot get any schema or metadata information?)
> It is only failing when validating the schemas, so with {{validate_schema=False}} it actually returns a ParquetDataset object, just with an empty list for {{pieces}} and no schema. So it would be easy to not error when validating the schemas as well for this empty-directory case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)