You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Brecht Machiels (JIRA)" <ji...@apache.org> on 2017/08/29 12:59:00 UTC
[jira] [Comment Edited] (ARROW-1429) Error loading parquet file
with _metadata from HDFS (pyarrow.lib.ArrowIOError: Failed to open local
file)
[ https://issues.apache.org/jira/browse/ARROW-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145237#comment-16145237 ]
Brecht Machiels edited comment on ARROW-1429 at 8/29/17 12:58 PM:
------------------------------------------------------------------
For the last table (IndexError), this is because the directory on HDFS is empty (so, no _metadata file either). PyArrow could raise an exception with a more helpful message.
was (Author: brechtm):
For the last table (IndexError), this is because the directory on HDFS is empty. PyArrow could raise an exception with a more helpful message.
> Error loading parquet file with _metadata from HDFS (pyarrow.lib.ArrowIOError: Failed to open local file)
> ---------------------------------------------------------------------------------------------------------
>
> Key: ARROW-1429
> URL: https://issues.apache.org/jira/browse/ARROW-1429
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.6.0
> Environment: RHEL 6.8, Python 3.5.4 (Anaconda), Hadoop 2.6.0-cdh5.8.3
> Reporter: Brecht Machiels
>
> I can open tables stored on HDFS as long as there is no _metadata file besides the parquet files.
> For two tables with a _metadata file I get the following traceback:
> {code}
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 199, in read_table
> pq_table = read_hdfs_parquet(hdfs_path, columns)
> File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 251, in read_hdfs_parquet
> return HDFS_CONNECTION.read_parquet(hdfs_path, columns)
> File "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/filesystem.py", line 168, in read_parquet
> filesystem=self)
> File "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py", line 535, in __init__
> self.common_metadata = ParquetFile(self.metadata_path).metadata
> File "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py", line 54, in __init__
> self.reader.open(source, metadata=metadata)
> File "_parquet.pyx", line 398, in pyarrow._parquet.ParquetReader.open
> File "io.pxi", line 705, in pyarrow.lib.get_reader
> File "io.pxi", line 472, in pyarrow.lib.memory_map
> File "io.pxi", line 451, in pyarrow.lib.MemoryMappedFile._open
> File "error.pxi", line 72, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Failed to open local file: hdfs://nameservice1/path/to/table/_metadata
> {code}
> For another table with a _metadata file:
> {code}
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 199, in read_table
> pq_table = read_hdfs_parquet(hdfs_path, columns)
> File "/home/bmachie/Documents/ml_irissearch/python/util.py", line 251, in read_hdfs_parquet
> return HDFS_CONNECTION.read_parquet(hdfs_path, columns)
> File "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/filesystem.py", line 168, in read_parquet
> filesystem=self)
> File "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py", line 548, in __init__
> self.validate_schemas()
> File "/data/data01/dev/edl/infra/mstr/landing/condaenvs/ml_irissearch/lib/python3.5/site-packages/pyarrow/parquet.py", line 557, in validate_schemas
> self.schema = self.pieces[0].get_metadata(open_file).schema
> IndexError: list index out of range
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)