You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2023/01/06 19:00:00 UTC

[jira] [Commented] (PARQUET-2219) ParquetFileReader throws a runtime exception when a file contains only headers and now row data

    [ https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655552#comment-17655552 ] 

Micah Kornfield commented on PARQUET-2219:
------------------------------------------

I'm not aware of anything in the specification that prevents zero length row groups.  We can try to prevent writing them out but I think readers should be robust to this if it isn't disallowed in the specification.   For the iterator case, it seems like the rowgroup should just be discarded and the next one checked?

> ParquetFileReader throws a runtime exception when a file contains only headers and now row data
> -----------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2219
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2219
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.12.1
>            Reporter: chris stockton
>            Priority: Minor
>
> Google BigQuery has an option to export table data to Parquet-formatted files, but some of these files are written with header data only.  When this happens and these files are opened with the ParquetFileReader, an exception is thrown:
> {{RuntimeException("Illegal row group of 0 rows");}}
> It seems like the ParquetFileReader should not throw an exception when it encounters such a file.
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L949



--
This message was sent by Atlassian Jira
(v8.20.10#820010)