You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by yu feng <ol...@gmail.com> on 2017/07/17 06:36:01 UTC
Error of impala query empty parquet file
Hi all,
I always have a query error when I query a parquet table and the table
have a empty parquet file, which means the files only have footer
information and do not have any row group.
I check the code and find the code:
if (file_metadata_.row_groups.empty()) {
return Status(
Substitute("Invalid file. This file: $0 has no row groups",
filename()));
}
I want to modify the logic, If find a no-row-group file, I want to skip the
scan range and do not return any row-batch from the parquet-scanner, Is it
right to doing like this, and do you have some another suggestion about
the situation?
Thanks a lots
Re: Error of impala query empty parquet file
Posted by Tim Armstrong <ta...@cloudera.com>.
Hi Yu Feng,
It looks like we already changed Impala to accept valid files with no row
groups: https://issues.apache.org/jira/browse/IMPALA-3943
That error should only be hit if the file metadata reports that it has rows:
// IMPALA-3943: Do not throw an error for empty files for backwards
compatibility.
if (file_metadata_.num_rows == 0) return Status::OK();
// Parse out the created by application version string
if (file_metadata_.__isset.created_by) {
file_version_ = ParquetFileVersion(file_metadata_.created_by);
}
if (file_metadata_.row_groups.empty()) {
return Status(
Substitute("Invalid file. This file: $0 has no row groups",
filename()));
}
On Sun, Jul 16, 2017 at 11:36 PM, yu feng <ol...@gmail.com> wrote:
> Hi all,
>
> I always have a query error when I query a parquet table and the table
> have a empty parquet file, which means the files only have footer
> information and do not have any row group.
>
> I check the code and find the code:
>
> if (file_metadata_.row_groups.empty()) {
> return Status(
> Substitute("Invalid file. This file: $0 has no row groups",
> filename()));
> }
>
> I want to modify the logic, If find a no-row-group file, I want to skip the
> scan range and do not return any row-batch from the parquet-scanner, Is it
> right to doing like this, and do you have some another suggestion about
> the situation?
>
> Thanks a lots
>