You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/05/14 19:15:00 UTC

[jira] [Resolved] (IMPALA-7612) Parquet file with no rows in it causing WARNING in explain

     [ https://issues.apache.org/jira/browse/IMPALA-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-7612.
-----------------------------------
    Resolution: Won't Fix

I think it's best not to add any special handling for this case - well-behaving ingest tools should not create empty files. Yes, the warning is not 100% accurate, but it is flagging that something is weird.

> Parquet file with no rows in it causing WARNING in explain
> ----------------------------------------------------------
>
>                 Key: IMPALA-7612
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7612
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0
>            Reporter: Zsombor Fedor
>            Priority: Minor
>         Attachments: part-m-00000.parquet
>
>
> An empty Parquet file, with no rows in it causing a warning in explain:
> {code:java}
> WARNING: The following tables have potentially corrupt table statistics. Drop and re-compute statistics to resolve this problem. {code}
> This Warning is showing even after
> {code:java}
> compute stats tp;{code}
> because:
> {code:java}
> partitions=1/1 files=1 size=220B{code}
> but numRows = 0.
> A simple reproduction:
> {code:java}
> create table tp (a int) stored as parquet;{code}
> create and empty.csv file
> create parquet file from the csv with a simple MR job:
> [https://github.com/tomwhite/hadoop-book/blob/master/ch13-parquet/src/main/java/TextToParquetWithAvro.java]
> using the following schema:
> {code:java}
> "{\n" +
>  " \"type\": \"record\",\n" + 
>  " \"name\": \"tp\",\n" +
>  " \"doc\": \"Avro schema for table tp\",\n" +
>  " \"fields\":\n" + 
>  " [\n" + 
>  " {\"name\": \"a\", \"type\": \"int\"}\n"+
>  " ]\n"+
>  "}\n");{code}
> Put the output Parquet file (PFA) onto the HDFS, then
> {code:java}
> compute stats tp;
> explain select * from tp;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)