You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by yu feng <ol...@gmail.com> on 2017/07/17 06:36:01 UTC

Error of impala query empty parquet file

Hi all,

   I always have a query error when I query a parquet table and the table
have a empty parquet file, which means the files only have footer
information and do not have any row group.

I check the code and find the code:

  if (file_metadata_.row_groups.empty()) {
    return Status(
        Substitute("Invalid file. This file: $0 has no row groups",
filename()));
  }

I want to modify the logic, If find a no-row-group file, I want to skip the
scan range and do not return any row-batch from the parquet-scanner, Is it
right to doing like this, and do you have some another suggestion about
the situation?

Thanks a lots

Re: Error of impala query empty parquet file

Posted by Tim Armstrong <ta...@cloudera.com>.
Hi Yu Feng,
  It looks like we already changed Impala to accept valid files with no row
groups: https://issues.apache.org/jira/browse/IMPALA-3943

That error should only be hit if the file metadata reports that it has rows:

  // IMPALA-3943: Do not throw an error for empty files for backwards
compatibility.
  if (file_metadata_.num_rows == 0) return Status::OK();

  // Parse out the created by application version string
  if (file_metadata_.__isset.created_by) {
    file_version_ = ParquetFileVersion(file_metadata_.created_by);
  }
  if (file_metadata_.row_groups.empty()) {
    return Status(
        Substitute("Invalid file. This file: $0 has no row groups",
filename()));
  }

On Sun, Jul 16, 2017 at 11:36 PM, yu feng <ol...@gmail.com> wrote:

> Hi all,
>
>    I always have a query error when I query a parquet table and the table
> have a empty parquet file, which means the files only have footer
> information and do not have any row group.
>
> I check the code and find the code:
>
>   if (file_metadata_.row_groups.empty()) {
>     return Status(
>         Substitute("Invalid file. This file: $0 has no row groups",
> filename()));
>   }
>
> I want to modify the logic, If find a no-row-group file, I want to skip the
> scan range and do not return any row-batch from the parquet-scanner, Is it
> right to doing like this, and do you have some another suggestion about
> the situation?
>
> Thanks a lots
>