You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/05/03 00:19:00 UTC

[jira] [Created] (IMPALA-6964) Track stats about column and page sizes in Parquet reader

Tim Armstrong created IMPALA-6964:
-------------------------------------

             Summary: Track stats about column and page sizes in Parquet reader
                 Key: IMPALA-6964
                 URL: https://issues.apache.org/jira/browse/IMPALA-6964
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Tim Armstrong


It would be good to have stats for scanned parquet data about page sizes. We currently can't tell much about the "shape" of the parquet pages from the profile. Some questions that are interesting:

* How big is each column? I.e. total compressed and decompressed size read.
* How big are pages on average? Either compressed or decompressed size
* What is the compression ratio for pages? Could be inferred from the above two.

I think storing all the stats in the profile per-column would be too much data, but we could probably 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)